Skip to content

fix: replace ct filter with opencode run --format json (direct-exec)#502

Closed
MichielDean wants to merge 6 commits into
mainfrom
fix/tmux-filter
Closed

fix: replace ct filter with opencode run --format json (direct-exec)#502
MichielDean wants to merge 6 commits into
mainfrom
fix/tmux-filter

Conversation

@MichielDean
Copy link
Copy Markdown
Owner

@MichielDean MichielDean commented May 10, 2026

Summary

Replaces the broken ct filter direct-exec approach with a working implementation using opencode run --format json.

Root cause: opencode run requires OPENCODE_SERVER_* env vars to be unset. When these vars are set (which they are in the running lobsterdog session), opencode connects to an existing server and fails with "Session not found." Unsetting OPENCODE_SERVER_USERNAME, OPENCODE_SERVER_PASSWORD, OPENCODE_PID, and OPENCODE allows opencode to start a fresh session.

Solution: opencode run --format json --dangerously-skip-permissions with those env vars unset produces clean NDJSON output with type:"text" events and sessionID for --resume support. No tmux, no PTY parsing, no file redirect — just a subprocess with stdout capture.

Key changes:

  • filter_agent.go: Complete rewrite from tmux to direct-exec
    • filterAgentRun() runs opencode as subprocess, parses NDJSON from stdout
    • filterAgentRunResume() uses -s flag for session continuation
    • buildFilterRunCommand() constructs args and env, unsets OPENCODE_SERVER_*
    • --format json flag captures output programmatically
    • Removed: tmux session management, pipe-pane, PTY parsing, RESPONSE.md file redirect, cleanANSI, extractFilterResponse, isTUIChrome
  • filter.go: Updated invokeFilterNew/Resume to use new functions
  • filter_test.go: Updated for new architecture, replaced tmux tests with direct-exec tests
  • internal/provider/preset_test.go: Updated FormatArgs test (opencode preset no longer has FormatArgs)
  • internal/testutil/fakeagent: Added --agent filter detection for test mode
  • SKILL.md: Removed stale --file flag reference, updated tmux wrapper to direct command
  • references/commands.md: Removed tmux wrapper pattern

Testing

  • All unit tests pass
  • Manual test: unset OPENCODE_SERVER_* OPENCODE_PID OPENCODE && opencode run --format json --dangerously-skip-permissions --model ollama/glm-5.1:cloud 'Say OK' produces correct NDJSON output
  • --resume with session ID works correctly
  • ct filter with the new code produces correct JSON/text output

Lobsterdog Contributors added 4 commits May 9, 2026 22:27
The previous ct filter implementation used a direct-exec approach
(opencode run --format json) which doesn't work because opencode run
requires an existing session ID and doesn't produce output on stdout.

The new implementation spawns opencode in a tmux session, mirroring
how cataractae work:

1. Create temp workdir with CONTEXT.md + AGENTS.md + agent identity
2. Spawn opencode run in tmux with --dangerously-skip-permissions --agent filter
3. Use pipe-pane to capture PTY output to a log file
4. Wait for session exit (poll with 10min timeout)
5. Read log, strip ANSI, extract agent response
6. Clean up tmux session and temp files

Key changes:
- filter.go: Replaced callFilterAgent with filterAgentTmux/filterAgentResume
- filter_agent.go: New file with tmux-based spawning logic
- preset.go: Removed FormatArgs (was --format json, doesn't work with opencode)
- dashboard_web.go: Updated filterResume to use invokeFilterNew
- filter_test.go: Kept structural tests, callFilterAgent now returns error

Also unset OPENCODE_SERVER_* env vars in tmux sessions to prevent
'session not found' errors (same fix as cataractae).

Replaces the incomplete FormatArgs fix (PR #500) with a proper
architectural solution.
…lter

The PTY log approach for extracting agent responses was fundamentally
broken — pipe-pane captures raw terminal output mixed with TUI chrome,
and the heuristic extraction (cleanANSI + isTUICrome) consistently
truncated responses or captured TUI artifacts.

Solution: instruct the filter agent to write its response to RESPONSE.md
in the workdir. After the tmux session exits, read the file directly.
No ANSI stripping, no extraction heuristics, no truncated paragraphs.

Changes:
- filterAgentsMD() now instructs the agent to write RESPONSE.md
- filterAgentTmux() reads RESPONSE.md instead of PTY log
- filterAgentResume() uses tmux display-message to find workdir,
  then reads RESPONSE.md instead of polling PTY log for size changes
- Removed cleanANSI(), extractFilterResponse(), isTUICrome() functions
- Removed pipe-pane log setup and PTY parsing logic
- Added tmuxSessionWorkdir() helper for --resume path
- Added dropVec0Objects() to prevent 'no such module: vec0' errors
- Updated fakeagent to write RESPONSE.md when --agent filter is present
- Updated tests: removed deprecated callFilterAgent tests, updated
  FormatArgs test for empty (removed) format args, simplified
  integration tests
The root cause of all previous failures was OPENCODE_SERVER_* env vars
causing 'Session not found' errors. Unsetting those vars allows
opencode run to work correctly as a direct subprocess.

opencode run --format json produces NDJSON output with:
- type:'text' events containing the agent's response
- sessionID for --resume support

This is dramatically simpler and more reliable than the tmux approach:
- No PTY log parsing, no ANSI stripping
- No RESPONSE.md file redirect with timing issues
- No tmux session management
- Direct JSON parsing of stdout, just like the original design intended

The tmux approach was a workaround for 'Session not found' errors that
turned out to be caused by OPENCODE_SERVER_* vars leaking into the
tmux session. Now that we know the root cause, the direct-exec approach
works correctly.

Changes:
- filter_agent.go: complete rewrite from tmux to direct-exec
  - filterAgentRun() runs opencode as subprocess, parses NDJSON
  - filterAgentRunResume() uses -s flag for session continuation
  - buildFilterRunCommand() constructs args and env
  - Env vars OPENCODE_SERVER_*, OPENCODE_PID, OPENCODE are unset
  - --format json flag captures output programmatically
  - Removed tmux, pipe-pane, RESPONSE.md, cleanANSI, extractFilterResponse
  - Removed isSessionAlive, shellQuote, homeDir, minimalTmuxEnv
- filter.go: updated invokeFilterNew/Resume to use new functions
  - callFilterAgent deprecated stub now references filterAgentRun
- filter_test.go: updated for new architecture
  - Replaced tmux-based tests with direct-exec tests
  - Added unsetEnvPrefix, buildFilterRunCommand tests
  - Removed ResponseFileName test (no longer applicable)
- fakeagent: added --agent filter detection for test mode
…attern

- Remove stale --file flag reference (flag was removed)
- Replace tmux wrapper pattern with direct ct droplet add --filter
- Note that ct filter now uses opencode run --format json
@MichielDean MichielDean changed the title feat: replace ct filter direct-exec with tmux-based spawning fix: replace ct filter with opencode run --format json (direct-exec) May 11, 2026
Lobsterdog Contributors added 2 commits May 10, 2026 21:59
Codifies the env var pollution root cause and fixes that were discovered
through the PR #500#502 → direct-exec arc. Includes:
- Session not found: OPENCODE_SERVER_* env var pollution
- Empty response: missing --dangerously-skip-permissions or invalid model
- Timeout: CT_FILTER_TIMEOUT env var
The upgrade test used a stale config with workflow_path instead of
aqueduct and no aqueducts section. Since the config validator
requires both aqueducts and repo-level aqueduct refs, the
castellarius service failed to start with this config.

Updated the stale config to include valid current-schema fields
(aqueducts + aqueduct refs) while keeping the stale unknown keys
(old_binary_path, legacy_agent_timeout) that YAML parser should
silently ignore. Also added trailing newlines to Go source files.
@MichielDean
Copy link
Copy Markdown
Owner Author

Closing this PR as superseded. The filter changes (opencode run --format json direct-exec) have already been merged to main via #503 and #504. The remaining CI failure (installer-tests) is fixed in #505.

MichielDean added a commit that referenced this pull request May 22, 2026
## Summary

The installer upgrade test used a stale config with `workflow_path`
(deprecated key) instead of `aqueduct` and was missing the `aqueducts:`
section. Since the config validator requires both `aqueducts:` and
repo-level `aqueduct:` refs, the castellarius service failed to start
with this config.

This fixes the **installer-tests** CI failure on PR #502 (and main).

## Changes

- Updated the stale config in the upgrade test scenario to include valid
current-schema fields (`aqueducts:` section + `aqueduct:` refs on repos)
- Preserved the stale unknown keys (`old_binary_path`,
`legacy_agent_timeout`) that the YAML parser silently ignores — this is
the actual upgrade scenario being tested
- Removed `workflow_path` (no longer a valid key) and replaced with
`aqueduct: default`

## Testing

- All unit tests pass locally
- The installer-tests CI should now pass since the upgrade scenario uses
a valid config

Fixes the failing installer-tests check on #502.

---------

Co-authored-by: Lobsterdog Contributors <noreply@lobsterdog.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant