fix: add droplet reality check to all cataractae#504
Merged
Conversation
added 7 commits
May 9, 2026 22:27
The previous ct filter implementation used a direct-exec approach (opencode run --format json) which doesn't work because opencode run requires an existing session ID and doesn't produce output on stdout. The new implementation spawns opencode in a tmux session, mirroring how cataractae work: 1. Create temp workdir with CONTEXT.md + AGENTS.md + agent identity 2. Spawn opencode run in tmux with --dangerously-skip-permissions --agent filter 3. Use pipe-pane to capture PTY output to a log file 4. Wait for session exit (poll with 10min timeout) 5. Read log, strip ANSI, extract agent response 6. Clean up tmux session and temp files Key changes: - filter.go: Replaced callFilterAgent with filterAgentTmux/filterAgentResume - filter_agent.go: New file with tmux-based spawning logic - preset.go: Removed FormatArgs (was --format json, doesn't work with opencode) - dashboard_web.go: Updated filterResume to use invokeFilterNew - filter_test.go: Kept structural tests, callFilterAgent now returns error Also unset OPENCODE_SERVER_* env vars in tmux sessions to prevent 'session not found' errors (same fix as cataractae). Replaces the incomplete FormatArgs fix (PR #500) with a proper architectural solution.
…lter The PTY log approach for extracting agent responses was fundamentally broken — pipe-pane captures raw terminal output mixed with TUI chrome, and the heuristic extraction (cleanANSI + isTUICrome) consistently truncated responses or captured TUI artifacts. Solution: instruct the filter agent to write its response to RESPONSE.md in the workdir. After the tmux session exits, read the file directly. No ANSI stripping, no extraction heuristics, no truncated paragraphs. Changes: - filterAgentsMD() now instructs the agent to write RESPONSE.md - filterAgentTmux() reads RESPONSE.md instead of PTY log - filterAgentResume() uses tmux display-message to find workdir, then reads RESPONSE.md instead of polling PTY log for size changes - Removed cleanANSI(), extractFilterResponse(), isTUICrome() functions - Removed pipe-pane log setup and PTY parsing logic - Added tmuxSessionWorkdir() helper for --resume path - Added dropVec0Objects() to prevent 'no such module: vec0' errors - Updated fakeagent to write RESPONSE.md when --agent filter is present - Updated tests: removed deprecated callFilterAgent tests, updated FormatArgs test for empty (removed) format args, simplified integration tests
The root cause of all previous failures was OPENCODE_SERVER_* env vars causing 'Session not found' errors. Unsetting those vars allows opencode run to work correctly as a direct subprocess. opencode run --format json produces NDJSON output with: - type:'text' events containing the agent's response - sessionID for --resume support This is dramatically simpler and more reliable than the tmux approach: - No PTY log parsing, no ANSI stripping - No RESPONSE.md file redirect with timing issues - No tmux session management - Direct JSON parsing of stdout, just like the original design intended The tmux approach was a workaround for 'Session not found' errors that turned out to be caused by OPENCODE_SERVER_* vars leaking into the tmux session. Now that we know the root cause, the direct-exec approach works correctly. Changes: - filter_agent.go: complete rewrite from tmux to direct-exec - filterAgentRun() runs opencode as subprocess, parses NDJSON - filterAgentRunResume() uses -s flag for session continuation - buildFilterRunCommand() constructs args and env - Env vars OPENCODE_SERVER_*, OPENCODE_PID, OPENCODE are unset - --format json flag captures output programmatically - Removed tmux, pipe-pane, RESPONSE.md, cleanANSI, extractFilterResponse - Removed isSessionAlive, shellQuote, homeDir, minimalTmuxEnv - filter.go: updated invokeFilterNew/Resume to use new functions - callFilterAgent deprecated stub now references filterAgentRun - filter_test.go: updated for new architecture - Replaced tmux-based tests with direct-exec tests - Added unsetEnvPrefix, buildFilterRunCommand tests - Removed ResponseFileName test (no longer applicable) - fakeagent: added --agent filter detection for test mode
…attern - Remove stale --file flag reference (flag was removed) - Replace tmux wrapper pattern with direct ct droplet add --filter - Note that ct filter now uses opencode run --format json
…taractae Post-incident fix after LLMem Go rewrite shipped with vec0 migration bug, missing dependency import, test timeouts, and CLI flag incompat. Architect INSTRUCTIONS.md: - Add Migration Surface Analysis section (mandatory for rewrites) - Require enumeration of every migration scenario (existing DBs, config, CLI invocations, plugins/hooks) - Require specific verification commands for each scenario - Require dependency verification: brief dependency → go.mod/package.json - Require breaking change matrix with callers Reviewer INSTRUCTIONS.md: - Add migration compatibility check to migration_safety rubric - Add dependency verification check (brief mentions X, is X imported?) - Add test timeout discipline check (30s max for I/O tests) QA INSTRUCTIONS.md: - Add migration compatibility test requirement for rewrites - Add dependency verification check - Add test timeout discipline check - Add deploy-and-verify gate for rewrites: MUST run core commands against existing data, not just test suite
Every cataractae now reads the original droplet AND the architect brief, then cross-references them. The brief is a contract, but the droplet is the source of truth. Key additions: - Architect: extract implicit requirements from droplet (migration, dependency viability, interface compatibility) - Implementer: file issues for gaps between droplet and brief before implementing - Reviewer: flag gaps between droplet and implementation even if brief is followed - QA: test against what the droplet asked for, not just what the brief specified. Verify CLI compat, data compat, dependency imports - Security: check security surface from rewrite migration paths - Docs: document breaking changes and migration paths for rewrites This closes the gap where the pipeline checked code against briefs but never checked briefs against reality.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Building on PR #503 (mechanical migration/dependency gates), this adds a droplet reality check to every cataractae in the pipeline.
The core problem: every cataractae treated the architect's brief as the source of truth. But the brief is a translation of the droplet, and translations lose information — especially implicit requirements like "existing data must work" or "the plugin interface must stay the same."
This PR adds a "Droplet Reality Check" section to all six cataractae:
The key insight: "follows the brief" and "delivers what was requested" are different metrics. The pipeline was only measuring the first one. Now every cataractae measures both.
Combined with #503's mechanical gates (migration compatibility test, dependency cross-reference, deploy-and-verify, 30s test timeout), this should prevent the class of bugs where the pipeline faithfully implements the wrong requirements.