fix(tui): contain Windows shell process trees#2498
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces Windows Job Objects to manage and reliably terminate child processes and their descendants on Windows, preventing orphaned processes from leaking or blocking output collection. The changes include updating the windows crate features in Cargo.toml, implementing a WindowsJob wrapper, integrating it into BackgroundShell and ShellManager, and adding a test to verify the behavior. Feedback suggests improving the robustness of terminate_windows_job by falling back to killing the immediate child process if terminating the job object fails.
|
Addressed the review feedback in commit What changed:
Validation: cargo fmt --check
git diff --check
cargo check -p codewhale-tui
cargo test -p codewhale-tui background_collection_does_not_block_on_detached_descendant_pipe -- --nocapturePaulo Aboim Pinto |
|
Correction: I removed the temporary turn-dispatch diagnostics from this PR branch after confirming they are debug-only instrumentation. The PR now contains only the Windows process-tree containment fix. The dispatch diagnostics were moved to the local DEV worktree only, in local commit Paulo Aboim Pinto |
6ea6eea to
3ab06d9
Compare
|
Harvested both Windows job-object commits into The scope is Windows-only and the PR CI was green when reviewed; on the triage branch I also ran the orphaned subprocess regression and |
|
Added regression coverage to address the confidence gap around Windows job cleanup fallback behavior. What changed:
Validation run locally on Windows:
Paulo Aboim Pinto |
|
Follow-up harvest complete: the v0.8.50 triage branch #2504 now includes the final #2498 head through Harvest commits on #2504:
Thanks for catching the gap before #2504 moved further. |
Summary
Partial #1812.
This PR contains Windows shell process trees with a Job Object so
exec_shelloutput collection cannot block forever when a descendant process outlives the immediate shell and keeps inherited stdout/stderr pipe handles open.Why
I captured a live Windows freeze where CodeWhale was stuck after starting this foreground shell command:
The immediate shell process had already exited, but an orphaned
findstr.exedescendant was still alive and holding inherited pipe handles.BackgroundShell::collect_output()then joined the stdout/stderr reader threads, but those readers never saw EOF until the descendant was killed. Killing onlyfindstr.exeimmediately released the frozen TUI and the log emitted the delayedtool.exec.endafter about 18.8 minutes.This is a separate freeze mode from the crossterm input-poll freeze discussed in #1812. It is still in the same Windows/shell family, but the blocking point here is shell output collection after process-tree leakage.
What changed
WindowsJobwrapper around Job Objects.JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE.get_outputreturns quickly.Scope
This should fix the observed
exec_shellhang where a descendant process keeps stdout/stderr pipes open after the shell exits.It does not claim to solve every Windows TUI freeze. In particular, it does not replace the crossterm
event::poll(timeout)investigation/mitigation, and it does not address context-pressure or model-streaming stalls.Validation
Local Windows note: the GNU toolchain needed
C:\msys64\mingw64\binon PATH fordlltool.exe. The test command was also run withCARGO_INCREMENTAL=0andRUSTFLAGS="-C debuginfo=0"after the local disk filled during test binary linking.Paulo Aboim Pinto
Greptile Summary
This PR introduces a
WindowsJobRAII wrapper around Windows Job Objects to contain shell process trees during non-PTYexec_shellruns. The immediate child is assigned to a job withJOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSEset so that descendants holding inherited pipe handles are reliably killed before stdout/stderr reader threads are joined, eliminating the hang observed with orphanedfindstr.exeand similar detached descendants.WindowsJobwrapper withattach_to_child,terminate, andDrop(viaCloseHandle) covering all three execution paths:execute_sync_sandboxed,execute_interactive_sandboxed, andstart_background/collect_output.TerminateJobObjectfirst,JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSEas an automatic fallback when the handle is closed, and a per-childkill()fallback when termination access is denied.TerminateJobObject-denied fallback, kill-on-close as a secondary defence, and the end-to-endget_outputnon-block scenario.Confidence Score: 5/5
Safe to merge — the Windows-only job-object path is well-isolated behind cfg(windows) guards, all three execution paths drop the job handle before joining reader threads, and the KILL_ON_JOB_CLOSE fallback fires correctly even when explicit TerminateJobObject is denied.
All three shell execution paths (sync sandboxed, interactive sandboxed, background) now drop the WindowsJob handle before joining stdout/stderr reader threads, making KILL_ON_JOB_CLOSE an effective second-line defence. Attach failures are surfaced via tracing::warn rather than silently swallowed. The drop ordering fix from a prior review round has been applied consistently. Four targeted regression tests, including a deny-terminate scenario via DuplicateHandle, give good confidence the fix holds under access-constrained environments.
No files require special attention.
Important Files Changed
Sequence Diagram
sequenceDiagram participant SM as ShellManager participant CH as Child Process participant WJ as WindowsJob (KILL_ON_JOB_CLOSE) participant RT as Reader Threads participant DC as Descendant Processes SM->>CH: spawn() SM->>WJ: CreateJobObjectW + SetInformationJobObject SM->>WJ: AssignProcessToJobObject(child) CH-->>DC: spawns descendants (inherit pipes) CH->>CH: exits (immediate shell) DC-->>RT: hold pipe write-ends open Note over SM: poll() detects shell exit SM->>WJ: TerminateJobObject (explicit) WJ->>DC: kill all job processes SM->>WJ: drop handle (CloseHandle) Note over WJ: KILL_ON_JOB_CLOSE fires as fallback WJ-->>DC: kill remaining descendants DC-->>RT: pipe write-ends closed, EOF SM->>RT: join stdout_thread SM->>RT: join stderr_thread RT-->>SM: output collected (no hang)Reviews (7): Last reviewed commit: "fix(tui): close Windows job before foreg..." | Re-trigger Greptile