refactor: introduce controller-based architecture for Qt UI async operations by justinsaws · Pull Request #979 · aws-deadline/deadline-cloud

justinsaws · 2026-01-22T21:58:17Z

What was the problem/requirement? (What/Why)

There were a number of threading bugs in the UI code that would sometimes cause cascading error dialogs to appear or cause the app to freeze up. This was due to using Python's threading module intermixed with the Qt UI code and not taking the appropriate steps to update UI code in a thread safe manner. There was also a large amount of business code in our shared UI constructs that could be in a central location for better error handling and refreshing.

What was the solution? (How)

Replace direct Python threading with DeadlineUIController singleton pattern for async AWS API operations. Adds AsyncTaskRunner for automatic cancellation of superseded requests, JobSubmissionWorker for job submission, and dedicated DeadlineThreadPool.

Refactors deadline_config_dialog, submit_job_progress_dialog, and shared_job_settings_tab to use the new controller pattern, improving thread safety and simplifying cancellation handling.

What is the impact of this change?

The dialog updates correctly and no longer has errors when switching higher order resource (AWS profiles, Farms, Queues)
Makes making async UI elements easier to expand and create going forward.a

Old behavior:

Screen.Recording.2026-01-22.at.4.14.11.PM.mov

New behavior:

Screen.Recording.2026-01-22.at.4.18.27.PM.mov

How was this change tested?

Have you run the unit tests?
- Yes, they pass.
Have you run the integration tests?
- Yes, they pass.
Have you made changes to the download or asset_sync modules? If so, then it is highly recommended
that you ensure that the docker-based unit tests pass.
- N/A

Was this change documented?

N/A this is a refactor.

Does this PR introduce new dependencies?

This library is designed to be integrated into third-party applications that have bespoke and customized deployment environments. Adding dependencies will increase the chance of library version conflicts and incompatabilities. Please evaluate the addition of new dependencies. See the Dependencies section of DEVELOPMENT.md for more details.

This PR adds one or more new dependency Python packages. I acknowledge I have reviewed the considerations for adding dependencies in DEVELOPMENT.md.
This PR does not add any new dependencies.

Is this a breaking change?

No, this just refactors current functionality.

Does this change impact security?

No.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

larrygao001 · 2026-01-24T01:01:58Z

+        self.default_storage_profile_box.clear_list()
        self.refresh()
+        # Trigger cascading refresh - queues will load, then storage profiles
+        self._awaiting_queues_for_cascade = True


Could there be a race condition? Can the flags properly track which cascade corresponds to which user action? Similar for line 852, and line 233-235.

What's the race condition that you are seeing? The updating is handled by the controller class and they should cancel requests that would overlap.

Thanks for the explanation. I understand the controller cancels overlapping requests, but my concern is specifically about the boolean flags like _awaiting_queues_for_cascade. If a user rapidly switches profiles
twice, could the flag set by action 1 still be true when action 2's callback fires, causing it to incorrectly trigger a cascade? Would a request ID or counter be more robust than a boolean?

The AsyncTaskRunnner will cancel the previous call when a new call comes in so data corruption shouldn't happen. There is a potential race condition here that could be solved with the call id or generation counter that would prevent the data from flipping on the user if they are changing the selection very rapidly.

I'm not sure if the additional complexity is warented here because this is a very edge case scenario but if you feel the change is needed for the optimal experience then I'd be fine with adding it in.

larrygao001 · 2026-01-24T01:06:33Z

+        self._active_tasks: Dict[str, AsyncTask] = {}
+        self._operation_counter = 0
+
+    def run(


What if tasks hang? Will they be cleaned up?

If tasks hang, they will just lock up the thread forever. That is no different than the previous implementation and how threads work in general. You could force terminate the thread but that could have other consequences and isn't generally advisable.

Fair point that this matches previous behavior. Would it be worth adding a debug log when a task exceeds some threshold (e.g., 30s)? That would help with diagnosing issues in the field without changing the threading model.

Additional logging could be helpful but not sure if it's needed in this case. Threaded logging if we aren't logging to individual logs can get messy and we weren't doing it before. This isn't complex threading logic for a long running application but just getting API responses from the service so that feels like additional complexity that probably won't add much. We will already get the logging from boto3 if the call to the API fails.

larrygao001 · 2026-01-24T01:08:54Z

+        self.confirmation_requested.emit(message, default_response)
+
+        # Block until main thread responds or cancellation
+        self._confirmation_event.wait()


So if the main thread is busy or (if possible) the dialog is closed, the worker could hang forever?

This is only done when we are waiting on user input so this is desired behavior. If the main thread is blocked for some reason and not showing the dialog then the application will show as hung and get killed. Any potentially blocking operation shouldn't be being done in the main thread already so this case shouldn't come up.

I understand this is intentional blocking for user input. My concern is if the dialog gets closed unexpectedly (crash, force quit, etc.), the worker thread would hang indefinitely. Would adding a timeout with a
check for dialog validity be a reasonable safeguard, or is that overkill for this use case?

The way this is parented in Qt, if the dialog gets closed this will get terminated and cleaned up. It is owned by the dialog. It's using the Qt constructs appropriately so it should unblock on deletion.

waninggibbon · 2026-03-03T00:09:23Z

 # OpenJD model library for job template parsing in mock backend
 openjd-model >= 0.8.0; python_version >= '3.9'
+
+# GUI testing dependencies


Why only Windows and Darwin, do we not perform GUI tests in Linux?

The Linux test runners don't have any windowing system or UI libraries installed so Qt will crash the second it tries to create the event queue. Adding and setting up those deps on the linux runners would be non-trivial and the difference between our three platforms is essential none in the UI.

I think our time would be better spent implementing validation testing of the UI elements instead of worrying about unit tests here.

…rations Replace direct Python threading with DeadlineUIController singleton pattern for async AWS API operations. Adds AsyncTaskRunner for automatic cancellation of superseded requests, JobSubmissionWorker for job submission, and dedicated DeadlineThreadPool. Refactors deadline_config_dialog, submit_job_progress_dialog, and shared_job_settings_tab to use the new controller pattern, improving thread safety and simplifying cancellation handling. Signed-off-by: Justin Sawatzky <132946620+justinsaws@users.noreply.github.com>

andychoquette · 2026-03-04T23:43:08Z

        on_create_job_bundle_callback: OnCreateJobBundleCallback,
        parent: Optional[QWidget] = None,
-        f: Qt.WindowFlags = Qt.WindowFlags(),
+        f: Any = Qt.WindowFlags(),


nit - this likely could've remained Qt.WindowFlags

Yeah, I don't know why it would have changed this. I'll swap that back.

andychoquette · 2026-03-04T23:51:36Z

+                logger.info("Canceling submission...")
+                self.status_label.setText(tr("Canceling submission..."))
+                # Wait for worker to finish with event processing
+                while self._worker.isRunning():


Nit: When this busy wait is running while submitting, will this freeze up the UI? Would it make more sense to add a QThread.wait(timeout) with a fallback?

Instead of adding a timeout here, we could get around this issue all together with a signal based approach like we are doing elsewhere with the AsyncTaskRunner since JobSubmissionWorker is a qt thread, right?

The line below is what prevents this from blocking the UI. It releases control to the event queue to process background events and then comes back to re-evaluate this condition. Timeouts don't make sense here because we don't want to force terminate threads. That can leave the application in an undefined state and isn't advised on desktop applications.

A signal based approach wouldn't work here because this is on a destruction/close event. This object is going away so it needs to clean-up now. The threads are just making API calls so I think the risk here is low. Also, this isn't a long running application.

Makes sense, thanks for clarifying.

sonarqubecloud · 2026-03-06T23:06:17Z

Quality Gate passed

Issues
0 New issues
3 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

justinsaws requested a review from a team as a code owner January 22, 2026 21:58

github-actions Bot added the waiting-on-maintainers Waiting on the maintainers to review. label Jan 22, 2026

larrygao001 reviewed Jan 24, 2026

View reviewed changes

justinsaws force-pushed the refctor/qt_mvc_refactor branch from 68c23c6 to 45d654c Compare January 30, 2026 17:26

justinsaws force-pushed the refctor/qt_mvc_refactor branch 2 times, most recently from 7d654e9 to 33f2900 Compare February 13, 2026 00:43

justinsaws force-pushed the refctor/qt_mvc_refactor branch from 33f2900 to 48ad4e0 Compare February 20, 2026 16:02

waninggibbon reviewed Mar 3, 2026

View reviewed changes

justinsaws force-pushed the refctor/qt_mvc_refactor branch 3 times, most recently from c7753d7 to 24f5baa Compare March 3, 2026 17:28

justinsaws force-pushed the refctor/qt_mvc_refactor branch from 24f5baa to 6bcbdff Compare March 3, 2026 19:42

andychoquette reviewed Mar 4, 2026

View reviewed changes

andychoquette approved these changes Mar 4, 2026

View reviewed changes

waninggibbon approved these changes Mar 6, 2026

View reviewed changes

Merge branch 'mainline' into refctor/qt_mvc_refactor

eaff3c7

justinsaws merged commit f82cc31 into aws-deadline:mainline Mar 6, 2026
26 of 27 checks passed

rickrams mentioned this pull request Mar 9, 2026

feat: auto-select farm and queue when only one is available #1015

Open

2 tasks

Conversation

justinsaws commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was the problem/requirement? (What/Why)

What was the solution? (How)

What is the impact of this change?

How was this change tested?

Was this change documented?

Does this PR introduce new dependencies?

Is this a breaking change?

Does this change impact security?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larrygao001 Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larrygao001 Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Mar 6, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

justinsaws commented Jan 22, 2026 •

edited

Loading

larrygao001 Mar 4, 2026 •

edited

Loading

larrygao001 Mar 4, 2026 •

edited

Loading