Surface input-blocked Symphony sessions#66
Conversation
Summary: - Treat Codex MCP elicitation and input-required events as blocked sessions instead of retryable worker exits. - Keep blocked issues claimed, visible in snapshots, and listed on the dashboard/API until their Linear state changes. - Add presenter and dashboard payloads for blocked issue details. - Cover app-server MCP elicitation and orchestrator blocked-state paths. Rationale: - Operator-input blockers should pause for human action instead of burning retries or moving issues to terminal states. - Demo operators need blocked sessions visible in the dashboard and API. Tests: - mise exec -- mix format - mise exec -- mix test test/symphony_elixir/app_server_test.exs test/symphony_elixir/extensions_test.exs test/symphony_elixir/orchestrator_status_test.exs Co-authored-by: Codex <codex@openai.com>
f61a471 to
ea86b01
Compare
gpt-cmdr
left a comment
There was a problem hiding this comment.
Code Review: Surface input-blocked Symphony sessions
Architecture is solid — blocking instead of retrying input-waiting sessions is the right call. A few things to address before merge:
Critical
1. Stall timeout may fire on sessions that should be blocked (high confidence)
reconcile_stalled_running_issues compares elapsed_ms > stall_timeout_ms without guarding for the input-blocked condition. If a running session emits :turn_input_required but hasn't exited yet, the stall path will fire and call restart_stalled_issue before the blocked state is populated.
Fix: restart_stalled_issue/5 should check input_required_blocker?(running_entry.last_codex_event) and route to stop_and_block_issue rather than schedule_issue_retry in that case. Also suppress stall fire entirely for entries already in state.blocked.
2. Blocked state is ephemeral — undocumented behavior on restart
state.blocked is in-memory only (%{}). On orchestrator restart, blocked sessions silently re-enter as dispatch candidates. This is acceptable but should be documented. Consider logging a warning when a session has been blocked longer than a configurable threshold.
3. MCP elicitation clause ordering — verify placement
The new needs_input?("mcpServer/elicitation/request", payload) clause must appear before the generic defp needs_input?(_method, _payload), do: false fallback. CI green suggests it's correct, but worth a visual confirm since Elixir won't error on an unreachable clause.
Test Coverage Gaps
4. Normal-exit :input_required → blocked path not tested
Tests exercise the stall path but not the happy path: agent exits normally with {:input_required, issue} → handle_normal_agent_exit → issue appears in state.blocked. Add a test that injects a running_entry with completion: %{outcome: :input_required}, sends :DOWN with :normal, and asserts the issue is in state.blocked.
5. Presenter counts map may be missing blocked field
Dashboard adds a blocked table, but verify the metric cards include counts.blocked so the summary numbers don't undercount visible work.
6. static_snapshot fixture missing blocked key
extensions_test.exs static_snapshot/0 doesn't include a blocked key, so the dashboard liveview rendering of blocked entries has no test coverage.
Overall: approve with the above addressed. The stall-guard issue (#1) is the highest-priority fix since without it sessions can oscillate between stalling and retrying — the exact problem this PR solves.
Context
Codex app-server sessions can request operator input or MCP elicitation. These should pause visibly instead of retrying until exhausted.
TL;DR
Show input-blocked Symphony sessions in state, API, and dashboard.
Summary
Alternatives
Test Plan
make -C elixir allmise exec -- mix formatmise exec -- mix test test/symphony_elixir/app_server_test.exs test/symphony_elixir/extensions_test.exs test/symphony_elixir/orchestrator_status_test.exs