8-agent architecture review (2026-03-15/16). 102 raw findings deduplicated. Agents: pipeline, prompt, error, security, agent-loop, ux-flow, decomposer, verification.
- File:
forge_agent.py:1486(_tool_bash) - Problem:
asyncio.create_subprocess_shellwithstdout=PIPEkeeps pipe open for backgrounded child.communicate()blocks until 120s timeout. Verifier hits this 2-3x = 360s+ wasted. - Fix: When command ends with
&, redirect to/dev/null:cmd + " > /dev/null 2>&1"before backgrounding. Or addstart_servertool.
- File:
forge_orchestrator.py:302-450 - Problem: Decomposer uses 10-turn LLM agent with Flask-centric example for ALL projects. For a 1-file game it over-thinks, produces malformed JSON, and returns 0 tasks. No fallback — user gets "try a more specific description."
- Fix: (a) Add deterministic fast-path for 1-3 file projects. (b) Extract JSON from decomposer's text response when write_file wasn't called. (c) Fallback single-task from spec.md.
- File:
forge_cli.py:2277-2450 - Problem: Pre-verify → integration-check (3 agents) → re-verify → gate → functional repair. No aggregate time limit. Verifier bash timeout (UX1) makes this 6+ minutes.
- Fix: Add 120s total pipeline budget. Skip later stages when budget exhausted.
- File:
forge_cli.py:350-420(run loop) - Problem: User types "Build me a Breakout game" → routed to chat assistant, not build pipeline. Only
/planor/buildtrigger actual builds. - Fix: Detect build-intent phrases ("build", "create", "make") and route to
_guided_build.
- File:
forge_cli.py(/new command) - Problem:
/new gamein a directory with old .forge/tasks.json contaminates the new project with stale tasks and file references from previous builds. - Fix: Clear .forge/state/ when creating a new project in an existing directory.
- File:
forge_orchestrator.py:438-452 - Problem: When decomposer outputs JSON in text (common with Nova Lite) instead of calling write_file, the JSON recovery code is never reached. Valid task JSON is silently discarded.
- Fix: Check
decomp_result.outputfor JSON when tasks.json doesn't exist on disk.
- File:
forge_agent.py:1472-1478 - Problem:
asyncio.create_subprocess_shell()passes LLM-provided commands directly to the shell.RiskClassifierregex is fundamentally bypassable via base64 encoding (echo cm0gLXJmIC8= | base64 -d | sh), variable indirection, hex escapes, here-documents, and$()substitution. - Fix: Run bash commands inside Docker containers, or use a command allowlist instead of denylist. At minimum add detection for base64 piping, here-docs, and
$()patterns. - Severity: CRITICAL (complete sandbox bypass)
- File:
forge_agent.py:1703-1711 - Problem:
shlex.quote('/tmp/test.py')returns/tmp/test.py(unquoted for simple paths). Interpolated intopy_compile.compile({safe_path}, ...)this becomespy_compile.compile(/tmp/test.py, ...)— Python interprets/tmp/test.pyas division, not a string. EVERY Python file gets a false"Syntax issue detected: SyntaxError: invalid syntax". Same bug affects json.load/yaml.safe_load checks. - Evidence:
python3 -c "import py_compile; py_compile.compile(/tmp/test.py, doraise=True)"→SyntaxError: invalid syntax(the py_compile command itself fails, not the user's file). - Fix: Use
repr(str(path))instead ofshlex.quote(str(path))for Python-level quoting. Therepr()always produces a quoted string like'/tmp/test.py'. - Impact: Every agent turn that writes a .py/.json/.yaml file wastes the model's attention on a phantom syntax error. Models may spend turns "fixing" code that was already correct.
- File:
forge_agent.py:843vsforge_agent.py:1725 - Problem: Post-turn fix injection checks for
"SYNTAX ERROR"but_auto_verifyreturns"Syntax issue detected:". Strings never match. The entire syntax error safety net (lines 850-862) never triggers. - Fix: Change line 843 to
if result_str and "Syntax issue" in result_str:
- File:
model_router.py:870-876 - Problem:
stream_send()returnsusage={"input_tokens": 0, "output_tokens": 0}. All cost tracking is broken when streaming is enabled (the default). - Fix: Parse the
metadataevent in Bedrock streams,message_stopfor Anthropic, orstream_options={"include_usage": True}for OpenAI.
- File:
forge_pipeline.py:553-589 - Problem:
_find_role_for_taskreturns the first formation role for any task withoutmetadata.agent, regardless of wave. A testing task without an agent hint getsbackend-implwith full write tools instead of the tester's readonly policy. - Fix: Match tasks to roles by wave index — pass
wave_indexto_find_role_for_task.
- File:
forge_pipeline.py:347-352 - Problem: Failed task lookup uses
task.metadata.get("agent")butagent_resultsis keyed byrole_name:task_id. Key mismatch means failures are never detected,_block_dependentsnever called, dependent tasks run against broken state. - Fix: Check
task.status == "failed"(already updated by executor) instead of reverse-looking up inagent_results.
- File:
prompt_builder.py:46,91,202,forge_agent.py:112,305,forge_cli.py:1731 - Problem: Four different chunk thresholds (80, 100, 120, 150 lines) appear across SLIM prompt, FULL prompt, FOCUSED prompt, and tool descriptions. Models see conflicting numbers in a single conversation turn.
- Fix: Standardize: 80 lines for SLIM tier, 120 for FOCUSED/FULL. Update all tool descriptions to match.
- File:
forge_cli.py:1753-1756 - Problem: Adjacent instructions: "Include ALL functions in your FIRST write_file call" vs "NEVER write more than 80 lines." Logical contradiction — models resolve nondeterministically.
- Fix: Replace with chunking-aware completeness: "Include ALL functions. Write first ~80 lines via write_file, then IMMEDIATELY call append_file for the rest. Get ALL functions down before moving to the next file."
- File:
forge_pipeline.py:208-214 - Problem: Artifacts >2KB truncated to preview. Dependent agents see partial file, write code assuming full content exists.
- Fix: Append "You MUST call read_file('{path}') to see full content." Raise inline threshold to 4KB.
- File:
forge_preview.py:229-305 - Problem: All 14 stack detectors bind preview servers to
0.0.0.0. Unauthenticated LLM-generated apps are directly accessible from the network. Contradicts deployer's127.0.0.1security invariant. - Fix: Change all server commands to bind
127.0.0.1. Cloudflare tunnel connects to localhost.
- File:
forge_agent.py:424 - Problem: PathSandbox allows writes to
/tmp. Agent can write a script to/tmp/exploit.shand execute it via bash, bypassing all sandbox restrictions. - Fix: Remove
/tmpfromextra_allowedor restrict to a unique/tmp/forge-{session_id}/subdirectory.
- File:
forge_preview.py:670-674 - Problem: Preview servers run as the
herculesuser with full filesystem, network, and process access. No containerization, resource limits, or seccomp filters. - Fix: Run preview servers inside Docker containers with
--read-only --memory=512m --cpus=1 --network=none.
- File:
forge_deployer.py:248-262 - Problem:
domainparameter interpolated directly into nginx config. Semicolons, braces, orincludedirectives can inject arbitrary nginx configuration. - Fix: Validate domain against strict regex:
^[a-z0-9]([a-z0-9-]*[a-z0-9])?(\.[a-z0-9]([a-z0-9-]*[a-z0-9])?)*$
- File:
forge_guards.py:311-318 - Problem: Sandbox validates resolved path, but the actual write uses a separately resolved path. A symlink planted between check and write follows the symlink, escaping the sandbox.
- Fix: Use
os.path.realpath()immediately before write and re-validate. Reject symlinks pointing outside project root.
- File:
prompt_builder.py:40-56 - Problem: SLIM says "Read existing files before editing" but not "read files you DEPEND ON before writing NEW code." Nova Lite (32K) is most prone to hallucinating imports.
- Fix: Add: "Before writing code that imports/uses other files, read them first."
- File:
prompt_builder.py:40-56 - Problem: FOCUSED has "Before finishing, read back files you created." SLIM has nothing — Nova Lite never self-checks for logical errors.
- Fix: Add one-line: "Before finishing, read back key files and verify imports match exports."
- File:
forge_agent.py:229vsforge_agent.py:330-332 - Problem: BUILT_IN_TOOLS:
"required": []. SLIM_TOOLS:"required": ["path"]. Nova Lite agent callinglist_directory()without path fails, wastes a turn. - Fix: Change SLIM_TOOLS to
"required": [].
- File:
prompt_builder.py:233-271,forge_cli.py:1774 - Problem: A0 ("wait for approval") injected into automated builds where no human is present. Agent wastes turns describing actions instead of executing.
- Fix: Override autonomy to A4/A5 in the automated build path.
- File:
forge_agent.py:599-606 - Problem: Compaction runs but no check if context size actually decreased. Burns all 3 retry attempts on futile compaction loops.
- Fix: After compaction, check
_estimate_tokens(messages)decreased. If not, break immediately.
- File:
forge_agent.py:896-926 - Problem: Recursive
self.run()modifiesself.model_config,self.max_turns. If recursive call throws, originals never restored. - Fix: Wrap in
try/finallythat always restores.
- File:
forge_orchestrator.py:89-123 - Problem: Heuristic escapes valid closing quotes when the next char isn't
,]}:. Can corrupt task descriptions. - Fix: Only apply
_fix_inner_quotes()after initialjson.loads()has failed.
- File:
forge_agent.py:1480-1482 - Problem:
asyncio.TimeoutErrorcaught butproccontinues running. Zombie processes accumulate, hold ports. - Fix: Add
proc.kill(); await proc.wait()in the timeout handler.
- File:
model_router.py:811-815 - Problem:
except Exceptionwith no logging falls back to non-streaming. Hides auth errors, SDK bugs, and streaming failures. - Fix: Log the exception. Only catch transient errors (connection, stream interruption).
- File:
forge_tasks.py:462-465 - Problem: Blocked (non-active) dependencies increment in-degree but never resolve. Dependents appear as false-positive cycles.
- Fix: Skip blocked deps (treat as resolved) or exclude their dependents with "blocked by upstream" status.
- File:
forge_comms.py:85,137-144 - Problem: Append-only list with no cap. Grows proportionally to build duration.
- Fix: Use
collections.deque(maxlen=500)or addprune()between waves.
- File:
forge_pipeline.py:797 - Problem:
r"\{.*\}"withre.DOTALLmatches from first{to last}, can include non-JSON text between separate objects. - Fix: Try non-greedy
r"\{.*?\}"first, fall back to greedy.
- File:
forge_pipeline.py:433-434 - Problem: Two tasks throwing exceptions both get key
"unknown:task_id". First exception's result overwritten silently. - Fix: Always use
f"{role_name}:{task.id}"as the key.
- File:
forge_agent.py:1478 - Problem:
{**os.environ, ...}passes all secrets (AWS keys, API tokens) to LLM-generated bash commands. Agent can read them withecho $SECRET. - Fix: Create minimal env with only
PATH,HOME,LANG,TERM. Don't pass fullos.environ.
- File:
forge_guards.py:44-80 - Problem: Agent can prefix any command with
sudofor root privileges. Not flagged as HIGH risk. - Fix: Add
sudoto_HIGH_PATTERNS.
- File:
forge_guards.py:757-964 - Problem: 25 successful
thinkcalls escalate from A0 to A3 (Trusted). Agent can game escalation thresholds. - Fix: Only count write/execute tool calls toward escalation. Exclude
think,list_directory, reads.
- File:
forge_agent.py:1425-1432 - Problem: Missing:
dd,install,rsync,wget -O,curl -o,tar -x,unzip,patch,chmod,ln -s. - Fix: Add missing patterns. For readonly mode, use a positive allowlist instead.
- File:
forge_guards.py:112-122,273-279 - Problem: Missing:
~/.aws/credentials,~/.kube/config,~/.docker/config.json,~/.gnupg/,~/.netrc,.pem/.keyfiles,/etc/ssl/private/. - Fix: Expand both deny lists.
- File:
forge_deployer.py:351-357 - Problem: No
--read-only,--cap-drop=ALL,--memory,--cpus,--user,--security-opt=no-new-privileges. - Fix: Add security flags to
docker runcommand.
- File:
benchmark_nova_models.py:1290-1299 - Problem: ~88 tokens of SQLite threading advice injected even for HTML/CSS tasks. Wastes context on 32K models.
- Fix: Wrap in
if any(f.endswith('.py') for f in expected_files).
- File:
forge_verify.py:254-271 - Problem: Server passes TCP port check but HTTP request fails (not yet ready). No retry — false negative.
- Fix: Add 2-3 retry loop with 1s intervals for
_check_http.
- File:
forge_cli.py:1789-1827 - Problem: After no-write + stub retries fail, task is marked failed. No automatic escalation to a better model.
- Fix: Wire
escalation_modelinto CLI retry path.
- File:
forge_agent.py:1196-1210 - Problem:
edit_fileblocks on unread files.write_fileonly warns. Agent can overwrite files without reading them. - Fix: Block
write_fileon existing, unread files. Allow for new file creation.
- File:
forge_agent.py:1619-1638 - Problem:
edit_fileenforces_files_readcheck andclaim_file().search_replace_allhas neither — can modify unread files and files claimed by other agents. - Fix: Add same
_files_readcheck andbuild_context.claim_file()pattern.
- File:
forge_agent.py:1269-1298 - Problem:
write_fileandappend_filerun_unescape_contenton content.edit_filedoesn't apply it toold_stringornew_string. Nova's double-escaping issue can cause\nliteral mismatches. - Fix: Apply
_unescape_contentto bothold_stringandnew_stringin_tool_edit_file.
- File:
forge_agent.py:499,917 - Problem: Recursive
self.run()creates freshartifacts = {}. Escalated result only has files from escalation, not the original run's files (which ARE on disk). - Fix: Merge:
escalated_result.artifacts = {**artifacts, **escalated_result.artifacts}
- File:
forge_agent.py:1845-1858 - Problem: Loop finds a safe cut point and sets
i, butiis never used. All ofmiddleis compressed regardless. Tool pair protection is dead code. - Fix: Use
ito splitmiddleinto compress vs keep-verbatim.
- File:
config.py:125 - Problem: Tasks with no files specified (e.g., "set up project structure") get base=6, hard=10. Insufficient for discovery + bash commands.
- Fix: Increase 0-file base to 10-15. These tasks often need MORE discovery, not less.
- File:
tests/unit/test_sprint5_comprehensive.py:359 - Problem: Test asserts
"syntax" in result.lower()which passes on the ERROR message too. Can't distinguish correct behavior from broken behavior. - Fix: Assert
"syntax OK" in result.lower()for valid-file tests.
- File:
forge_preview.py:697-703 - Problem: Quick Tunnel (
*.trycloudflare.com) is publicly accessible with no auth. - Fix: Use authenticated Cloudflare Access, or add HTTP basic auth via reverse proxy.
- Adaptive turn budgets (compute_turn_budget)
- ConvergenceTracker (disables writes after 5 idle turns)
- Verify phase with hard budget (soft//4 turns)
- Escalation budget reduction (half original, min 8)
- Hard limit tightening (max(n+4, n1.3) not n2)
- Soft-limit message injection reverted
- Benchmark aligned to CLI path (PromptBuilder + model-aware tools)
- Completeness directive in user prompt