Skip to content

refactor: introduce controller-based architecture for Qt UI async operations#979

Merged
justinsaws merged 2 commits intoaws-deadline:mainlinefrom
justinsaws:refctor/qt_mvc_refactor
Mar 6, 2026
Merged

refactor: introduce controller-based architecture for Qt UI async operations#979
justinsaws merged 2 commits intoaws-deadline:mainlinefrom
justinsaws:refctor/qt_mvc_refactor

Conversation

@justinsaws
Copy link
Copy Markdown
Contributor

@justinsaws justinsaws commented Jan 22, 2026

What was the problem/requirement? (What/Why)

There were a number of threading bugs in the UI code that would sometimes cause cascading error dialogs to appear or cause the app to freeze up. This was due to using Python's threading module intermixed with the Qt UI code and not taking the appropriate steps to update UI code in a thread safe manner. There was also a large amount of business code in our shared UI constructs that could be in a central location for better error handling and refreshing.

What was the solution? (How)

Replace direct Python threading with DeadlineUIController singleton pattern for async AWS API operations. Adds AsyncTaskRunner for automatic cancellation of superseded requests, JobSubmissionWorker for job submission, and dedicated DeadlineThreadPool.

Refactors deadline_config_dialog, submit_job_progress_dialog, and shared_job_settings_tab to use the new controller pattern, improving thread safety and simplifying cancellation handling.

What is the impact of this change?

  • The dialog updates correctly and no longer has errors when switching higher order resource (AWS profiles, Farms, Queues)
  • Makes making async UI elements easier to expand and create going forward.a

Old behavior:

Screen.Recording.2026-01-22.at.4.14.11.PM.mov

New behavior:

Screen.Recording.2026-01-22.at.4.18.27.PM.mov

How was this change tested?

  • Have you run the unit tests?
    • Yes, they pass.
  • Have you run the integration tests?
    • Yes, they pass.
  • Have you made changes to the download or asset_sync modules? If so, then it is highly recommended
    that you ensure that the docker-based unit tests pass.
    • N/A

Was this change documented?

N/A this is a refactor.

Does this PR introduce new dependencies?

This library is designed to be integrated into third-party applications that have bespoke and customized deployment environments. Adding dependencies will increase the chance of library version conflicts and incompatabilities. Please evaluate the addition of new dependencies. See the Dependencies section of DEVELOPMENT.md for more details.

  • This PR adds one or more new dependency Python packages. I acknowledge I have reviewed the considerations for adding dependencies in DEVELOPMENT.md.
  • This PR does not add any new dependencies.

Is this a breaking change?

No, this just refactors current functionality.

Does this change impact security?

No.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@justinsaws justinsaws requested a review from a team as a code owner January 22, 2026 21:58
@github-actions github-actions Bot added the waiting-on-maintainers Waiting on the maintainers to review. label Jan 22, 2026
self.default_storage_profile_box.clear_list()
self.refresh()
# Trigger cascading refresh - queues will load, then storage profiles
self._awaiting_queues_for_cascade = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a race condition? Can the flags properly track which cascade corresponds to which user action? Similar for line 852, and line 233-235.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the race condition that you are seeing? The updating is handled by the controller class and they should cancel requests that would overlap.

Copy link
Copy Markdown
Contributor

@larrygao001 larrygao001 Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I understand the controller cancels overlapping requests, but my concern is specifically about the boolean flags like _awaiting_queues_for_cascade. If a user rapidly switches profiles
twice, could the flag set by action 1 still be true when action 2's callback fires, causing it to incorrectly trigger a cascade? Would a request ID or counter be more robust than a boolean?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AsyncTaskRunnner will cancel the previous call when a new call comes in so data corruption shouldn't happen. There is a potential race condition here that could be solved with the call id or generation counter that would prevent the data from flipping on the user if they are changing the selection very rapidly.

I'm not sure if the additional complexity is warented here because this is a very edge case scenario but if you feel the change is needed for the optimal experience then I'd be fine with adding it in.

self._active_tasks: Dict[str, AsyncTask] = {}
self._operation_counter = 0

def run(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if tasks hang? Will they be cleaned up?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tasks hang, they will just lock up the thread forever. That is no different than the previous implementation and how threads work in general. You could force terminate the thread but that could have other consequences and isn't generally advisable.

Copy link
Copy Markdown
Contributor

@larrygao001 larrygao001 Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point that this matches previous behavior. Would it be worth adding a debug log when a task exceeds some threshold (e.g., 30s)? That would help with diagnosing issues in the field without changing the threading model.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional logging could be helpful but not sure if it's needed in this case. Threaded logging if we aren't logging to individual logs can get messy and we weren't doing it before. This isn't complex threading logic for a long running application but just getting API responses from the service so that feels like additional complexity that probably won't add much. We will already get the logging from boto3 if the call to the API fails.

self.confirmation_requested.emit(message, default_response)

# Block until main thread responds or cancellation
self._confirmation_event.wait()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the main thread is busy or (if possible) the dialog is closed, the worker could hang forever?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only done when we are waiting on user input so this is desired behavior. If the main thread is blocked for some reason and not showing the dialog then the application will show as hung and get killed. Any potentially blocking operation shouldn't be being done in the main thread already so this case shouldn't come up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is intentional blocking for user input. My concern is if the dialog gets closed unexpectedly (crash, force quit, etc.), the worker thread would hang indefinitely. Would adding a timeout with a
check for dialog validity be a reasonable safeguard, or is that overkill for this use case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is parented in Qt, if the dialog gets closed this will get terminated and cleaned up. It is owned by the dialog. It's using the Qt constructs appropriately so it should unblock on deletion.

Comment thread src/deadline/client/ui/controllers/_async_task.py
@justinsaws justinsaws force-pushed the refctor/qt_mvc_refactor branch from 68c23c6 to 45d654c Compare January 30, 2026 17:26
@justinsaws justinsaws force-pushed the refctor/qt_mvc_refactor branch 2 times, most recently from 7d654e9 to 33f2900 Compare February 13, 2026 00:43
@justinsaws justinsaws force-pushed the refctor/qt_mvc_refactor branch from 33f2900 to 48ad4e0 Compare February 20, 2026 16:02
Comment thread requirements-testing.txt
# OpenJD model library for job template parsing in mock backend
openjd-model >= 0.8.0; python_version >= '3.9'

# GUI testing dependencies
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only Windows and Darwin, do we not perform GUI tests in Linux?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Linux test runners don't have any windowing system or UI libraries installed so Qt will crash the second it tries to create the event queue. Adding and setting up those deps on the linux runners would be non-trivial and the difference between our three platforms is essential none in the UI.

I think our time would be better spent implementing validation testing of the UI elements instead of worrying about unit tests here.

@justinsaws justinsaws force-pushed the refctor/qt_mvc_refactor branch 3 times, most recently from c7753d7 to 24f5baa Compare March 3, 2026 17:28
…rations

Replace direct Python threading with DeadlineUIController singleton
pattern for async AWS API operations. Adds AsyncTaskRunner for automatic
cancellation of superseded requests, JobSubmissionWorker for job
submission, and dedicated DeadlineThreadPool.

Refactors deadline_config_dialog, submit_job_progress_dialog, and
shared_job_settings_tab to use the new controller pattern, improving
thread safety and simplifying cancellation handling.

Signed-off-by: Justin Sawatzky <132946620+justinsaws@users.noreply.github.com>
@justinsaws justinsaws force-pushed the refctor/qt_mvc_refactor branch from 24f5baa to 6bcbdff Compare March 3, 2026 19:42
on_create_job_bundle_callback: OnCreateJobBundleCallback,
parent: Optional[QWidget] = None,
f: Qt.WindowFlags = Qt.WindowFlags(),
f: Any = Qt.WindowFlags(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - this likely could've remained Qt.WindowFlags

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't know why it would have changed this. I'll swap that back.

logger.info("Canceling submission...")
self.status_label.setText(tr("Canceling submission..."))
# Wait for worker to finish with event processing
while self._worker.isRunning():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: When this busy wait is running while submitting, will this freeze up the UI? Would it make more sense to add a QThread.wait(timeout) with a fallback?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a timeout here, we could get around this issue all together with a signal based approach like we are doing elsewhere with the AsyncTaskRunner since JobSubmissionWorker is a qt thread, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line below is what prevents this from blocking the UI. It releases control to the event queue to process background events and then comes back to re-evaluate this condition. Timeouts don't make sense here because we don't want to force terminate threads. That can leave the application in an undefined state and isn't advised on desktop applications.

A signal based approach wouldn't work here because this is on a destruction/close event. This object is going away so it needs to clean-up now. The threads are just making API calls so I think the risk here is low. Also, this isn't a long running application.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for clarifying.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Mar 6, 2026

@justinsaws justinsaws merged commit f82cc31 into aws-deadline:mainline Mar 6, 2026
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-on-maintainers Waiting on the maintainers to review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants