refactor: introduce controller-based architecture for Qt UI async operations#979
Conversation
| self.default_storage_profile_box.clear_list() | ||
| self.refresh() | ||
| # Trigger cascading refresh - queues will load, then storage profiles | ||
| self._awaiting_queues_for_cascade = True |
There was a problem hiding this comment.
Could there be a race condition? Can the flags properly track which cascade corresponds to which user action? Similar for line 852, and line 233-235.
There was a problem hiding this comment.
What's the race condition that you are seeing? The updating is handled by the controller class and they should cancel requests that would overlap.
There was a problem hiding this comment.
Thanks for the explanation. I understand the controller cancels overlapping requests, but my concern is specifically about the boolean flags like _awaiting_queues_for_cascade. If a user rapidly switches profiles
twice, could the flag set by action 1 still be true when action 2's callback fires, causing it to incorrectly trigger a cascade? Would a request ID or counter be more robust than a boolean?
There was a problem hiding this comment.
The AsyncTaskRunnner will cancel the previous call when a new call comes in so data corruption shouldn't happen. There is a potential race condition here that could be solved with the call id or generation counter that would prevent the data from flipping on the user if they are changing the selection very rapidly.
I'm not sure if the additional complexity is warented here because this is a very edge case scenario but if you feel the change is needed for the optimal experience then I'd be fine with adding it in.
| self._active_tasks: Dict[str, AsyncTask] = {} | ||
| self._operation_counter = 0 | ||
|
|
||
| def run( |
There was a problem hiding this comment.
What if tasks hang? Will they be cleaned up?
There was a problem hiding this comment.
If tasks hang, they will just lock up the thread forever. That is no different than the previous implementation and how threads work in general. You could force terminate the thread but that could have other consequences and isn't generally advisable.
There was a problem hiding this comment.
Fair point that this matches previous behavior. Would it be worth adding a debug log when a task exceeds some threshold (e.g., 30s)? That would help with diagnosing issues in the field without changing the threading model.
There was a problem hiding this comment.
Additional logging could be helpful but not sure if it's needed in this case. Threaded logging if we aren't logging to individual logs can get messy and we weren't doing it before. This isn't complex threading logic for a long running application but just getting API responses from the service so that feels like additional complexity that probably won't add much. We will already get the logging from boto3 if the call to the API fails.
| self.confirmation_requested.emit(message, default_response) | ||
|
|
||
| # Block until main thread responds or cancellation | ||
| self._confirmation_event.wait() |
There was a problem hiding this comment.
So if the main thread is busy or (if possible) the dialog is closed, the worker could hang forever?
There was a problem hiding this comment.
This is only done when we are waiting on user input so this is desired behavior. If the main thread is blocked for some reason and not showing the dialog then the application will show as hung and get killed. Any potentially blocking operation shouldn't be being done in the main thread already so this case shouldn't come up.
There was a problem hiding this comment.
I understand this is intentional blocking for user input. My concern is if the dialog gets closed unexpectedly (crash, force quit, etc.), the worker thread would hang indefinitely. Would adding a timeout with a
check for dialog validity be a reasonable safeguard, or is that overkill for this use case?
There was a problem hiding this comment.
The way this is parented in Qt, if the dialog gets closed this will get terminated and cleaned up. It is owned by the dialog. It's using the Qt constructs appropriately so it should unblock on deletion.
68c23c6 to
45d654c
Compare
7d654e9 to
33f2900
Compare
33f2900 to
48ad4e0
Compare
| # OpenJD model library for job template parsing in mock backend | ||
| openjd-model >= 0.8.0; python_version >= '3.9' | ||
|
|
||
| # GUI testing dependencies |
There was a problem hiding this comment.
Why only Windows and Darwin, do we not perform GUI tests in Linux?
There was a problem hiding this comment.
The Linux test runners don't have any windowing system or UI libraries installed so Qt will crash the second it tries to create the event queue. Adding and setting up those deps on the linux runners would be non-trivial and the difference between our three platforms is essential none in the UI.
I think our time would be better spent implementing validation testing of the UI elements instead of worrying about unit tests here.
c7753d7 to
24f5baa
Compare
…rations Replace direct Python threading with DeadlineUIController singleton pattern for async AWS API operations. Adds AsyncTaskRunner for automatic cancellation of superseded requests, JobSubmissionWorker for job submission, and dedicated DeadlineThreadPool. Refactors deadline_config_dialog, submit_job_progress_dialog, and shared_job_settings_tab to use the new controller pattern, improving thread safety and simplifying cancellation handling. Signed-off-by: Justin Sawatzky <132946620+justinsaws@users.noreply.github.com>
24f5baa to
6bcbdff
Compare
| on_create_job_bundle_callback: OnCreateJobBundleCallback, | ||
| parent: Optional[QWidget] = None, | ||
| f: Qt.WindowFlags = Qt.WindowFlags(), | ||
| f: Any = Qt.WindowFlags(), |
There was a problem hiding this comment.
nit - this likely could've remained Qt.WindowFlags
There was a problem hiding this comment.
Yeah, I don't know why it would have changed this. I'll swap that back.
| logger.info("Canceling submission...") | ||
| self.status_label.setText(tr("Canceling submission...")) | ||
| # Wait for worker to finish with event processing | ||
| while self._worker.isRunning(): |
There was a problem hiding this comment.
Nit: When this busy wait is running while submitting, will this freeze up the UI? Would it make more sense to add a QThread.wait(timeout) with a fallback?
There was a problem hiding this comment.
Instead of adding a timeout here, we could get around this issue all together with a signal based approach like we are doing elsewhere with the AsyncTaskRunner since JobSubmissionWorker is a qt thread, right?
There was a problem hiding this comment.
The line below is what prevents this from blocking the UI. It releases control to the event queue to process background events and then comes back to re-evaluate this condition. Timeouts don't make sense here because we don't want to force terminate threads. That can leave the application in an undefined state and isn't advised on desktop applications.
A signal based approach wouldn't work here because this is on a destruction/close event. This object is going away so it needs to clean-up now. The threads are just making API calls so I think the risk here is low. Also, this isn't a long running application.
There was a problem hiding this comment.
Makes sense, thanks for clarifying.
|



What was the problem/requirement? (What/Why)
There were a number of threading bugs in the UI code that would sometimes cause cascading error dialogs to appear or cause the app to freeze up. This was due to using Python's threading module intermixed with the Qt UI code and not taking the appropriate steps to update UI code in a thread safe manner. There was also a large amount of business code in our shared UI constructs that could be in a central location for better error handling and refreshing.
What was the solution? (How)
Replace direct Python threading with DeadlineUIController singleton pattern for async AWS API operations. Adds AsyncTaskRunner for automatic cancellation of superseded requests, JobSubmissionWorker for job submission, and dedicated DeadlineThreadPool.
Refactors deadline_config_dialog, submit_job_progress_dialog, and shared_job_settings_tab to use the new controller pattern, improving thread safety and simplifying cancellation handling.
What is the impact of this change?
Old behavior:
Screen.Recording.2026-01-22.at.4.14.11.PM.mov
New behavior:
Screen.Recording.2026-01-22.at.4.18.27.PM.mov
How was this change tested?
downloadorasset_syncmodules? If so, then it is highly recommendedthat you ensure that the docker-based unit tests pass.
Was this change documented?
N/A this is a refactor.
Does this PR introduce new dependencies?
This library is designed to be integrated into third-party applications that have bespoke and customized deployment environments. Adding dependencies will increase the chance of library version conflicts and incompatabilities. Please evaluate the addition of new dependencies. See the Dependencies section of DEVELOPMENT.md for more details.
Is this a breaking change?
No, this just refactors current functionality.
Does this change impact security?
No.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.