[rollout-trace] Add rollout trace capture and reducer#17982
[rollout-trace] Add rollout trace capture and reducer#17982cassirer-openai wants to merge 6 commits intomainfrom
Conversation
0274455 to
9ccb870
Compare
9ccb870 to
55d7e43
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d364b9651e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let Some(source_item_id) = self.latest_assistant_message_item_for_turn( | ||
| &observed.child_thread_id, | ||
| &observed.child_codex_turn_id, | ||
| ) else { | ||
| bail!( | ||
| "agent result edge {} could not find a child result message", | ||
| observed.edge_id | ||
| ); |
There was a problem hiding this comment.
Avoid hard-failing agent-result edges without child message
queue_agent_result_interaction_edge bails when no assistant message exists for the child turn. Child threads can terminate (e.g., aborted/failed) before producing any assistant message, yet still emit AgentResultObserved. In that valid case, replay_bundle errors and no state.json is produced. Fall back to a non-message source anchor instead of bailing.
Useful? React with 👍 / 👎.
| let previous_items = self | ||
| .rollout | ||
| .inference_calls | ||
| .values() | ||
| .find(|inference| { | ||
| inference.upstream_request_id.as_deref() == Some(previous_response_id) | ||
| }) |
There was a problem hiding this comment.
Restrict previous_response_id lookup to the same thread
Incremental request reconstruction matches previous_response_id across all inference_calls without checking thread_id. In multi-thread traces, identical upstream IDs (common with fixtures or non-global provider IDs) can pull the wrong thread’s prefix, corrupting conversation linkage or causing mismatch failures. Filter candidates by the current thread.
Useful? React with 👍 / 👎.
Summary
Adds opt-in rollout tracing for Codex sessions.
A trace records raw runtime evidence into a local bundle, then reduces that bundle into a semantic
state.jsongraph. The goal is to make complex failures inspectable across model requests, compaction, code-modeexec, nested tool calls, terminal operations, and multi-agent v2 child threads.The best review entry point is the new rollout trace README. It has the system diagrams and explains the main invariant: hot-path code observes first, while the offline reducer interprets later.
At a high level:
codex-coreemits best-effort raw observations through a thin recorder.codex-rollout-traceowns the bundle schema, writer, reduced model, and reducer.state.jsoncontains the parent/child graph and spawn/task/result/close edges.This also adds:
to replay a raw bundle and write
state.json.Review Guide
The commits are ordered by layer:
eff9328Add rollout trace crateSchema, README, writer, raw payload model, reduced model, reducer, and reducer tests.
856ef51Add core rollout trace recorderCore recorder facade plus inference and compaction capture.
2abf9dcTrace tool and code-mode runtime boundariesTool dispatch capture, code cells, terminal/write_stdin/exec relationships, and code-mode provenance.
c657343Trace rollout sessions and multi-agent v2 edgesSession trace creation/inheritance and multi-agent v2 graph edges, including close-agent shutdown edges.
55d7e43Add debug trace reduction commandCLI entrypoint for reducing a bundle to
state.json.