Skip to content

Commit 390eb0d

Browse files
dangtrivan15claude
andcommitted
todo: document zombie reaping gap in orphan_reaper
The orphan reaper kills live orphaned processes but cannot clean up zombies (state Z) when the container PID 1 doesn't call waitpid(). Documents known container-side and code-side solutions. Relates to AutoForgeAI#222. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 0303dad commit 390eb0d

1 file changed

Lines changed: 10 additions & 0 deletions

File tree

server/services/orphan_reaper.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,16 @@
1212
3. Have been orphaned for at least 30 seconds (grace period)
1313
1414
Only active on Linux (containers). No-op on macOS/Windows.
15+
16+
TODO: The reaper kills live orphans but cannot clean up zombie processes
17+
(state Z). Zombies occur when terminated children are reparented to PID 1
18+
but PID 1 never calls waitpid(). This happens in containerized deployments
19+
where PID 1 is not a proper init (e.g. no tini/dumb-init). Zombies don't
20+
consume memory but accumulate PID table entries. Known solutions:
21+
- Container-side: use tini or dumb-init as PID 1 (ENTRYPOINT ["tini", "--"])
22+
- Code-side: prctl(PR_SET_CHILD_SUBREAPER) to adopt orphans into this
23+
process, then reap via SIGCHLD + os.waitpid(-1, WNOHANG)
24+
See: https://github.com/AutoForgeAI/autoforge/pull/222
1525
"""
1626

1727
import asyncio

0 commit comments

Comments
 (0)