Skip to content

[rollout-trace] Add rollout trace capture and reducer#17982

Open
cassirer-openai wants to merge 6 commits intomainfrom
codex/rollout-trace-submission
Open

[rollout-trace] Add rollout trace capture and reducer#17982
cassirer-openai wants to merge 6 commits intomainfrom
codex/rollout-trace-submission

Conversation

@cassirer-openai
Copy link
Copy Markdown
Contributor

@cassirer-openai cassirer-openai commented Apr 15, 2026

Summary

Adds opt-in rollout tracing for Codex sessions.

A trace records raw runtime evidence into a local bundle, then reduces that bundle into a semantic state.json graph. The goal is to make complex failures inspectable across model requests, compaction, code-mode exec, nested tool calls, terminal operations, and multi-agent v2 child threads.

The best review entry point is the new rollout trace README. It has the system diagrams and explains the main invariant: hot-path code observes first, while the offline reducer interprets later.

At a high level:

  • codex-core emits best-effort raw observations through a thin recorder.
  • codex-rollout-trace owns the bundle schema, writer, reduced model, and reducer.
  • Raw payloads preserve exact evidence; reduced objects describe what the model saw, what Codex did, and how information moved.
  • Multi-agent support is v2-only. Spawned child threads share the root trace bundle, so one state.json contains the parent/child graph and spawn/task/result/close edges.

This also adds:

codex debug trace-reduce <trace-bundle>

to replay a raw bundle and write state.json.

Review Guide

The commits are ordered by layer:

  1. eff9328 Add rollout trace crate
    Schema, README, writer, raw payload model, reduced model, reducer, and reducer tests.

  2. 856ef51 Add core rollout trace recorder
    Core recorder facade plus inference and compaction capture.

  3. 2abf9dc Trace tool and code-mode runtime boundaries
    Tool dispatch capture, code cells, terminal/write_stdin/exec relationships, and code-mode provenance.

  4. c657343 Trace rollout sessions and multi-agent v2 edges
    Session trace creation/inheritance and multi-agent v2 graph edges, including close-agent shutdown edges.

  5. 55d7e43 Add debug trace reduction command
    CLI entrypoint for reducing a bundle to state.json.

@cassirer-openai cassirer-openai force-pushed the codex/rollout-trace-submission branch 2 times, most recently from 0274455 to 9ccb870 Compare April 15, 2026 23:02
@cassirer-openai cassirer-openai force-pushed the codex/rollout-trace-submission branch from 9ccb870 to 55d7e43 Compare April 15, 2026 23:10
@cassirer-openai cassirer-openai requested a review from jif-oai April 15, 2026 23:35
@cassirer-openai cassirer-openai marked this pull request as ready for review April 16, 2026 00:16
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d364b9651e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +334 to +341
let Some(source_item_id) = self.latest_assistant_message_item_for_turn(
&observed.child_thread_id,
&observed.child_codex_turn_id,
) else {
bail!(
"agent result edge {} could not find a child result message",
observed.edge_id
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid hard-failing agent-result edges without child message

queue_agent_result_interaction_edge bails when no assistant message exists for the child turn. Child threads can terminate (e.g., aborted/failed) before producing any assistant message, yet still emit AgentResultObserved. In that valid case, replay_bundle errors and no state.json is produced. Fall back to a non-message source anchor instead of bailing.

Useful? React with 👍 / 👎.

Comment on lines +73 to +79
let previous_items = self
.rollout
.inference_calls
.values()
.find(|inference| {
inference.upstream_request_id.as_deref() == Some(previous_response_id)
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict previous_response_id lookup to the same thread

Incremental request reconstruction matches previous_response_id across all inference_calls without checking thread_id. In multi-thread traces, identical upstream IDs (common with fixtures or non-global provider IDs) can pull the wrong thread’s prefix, corrupting conversation linkage or causing mismatch failures. Filter candidates by the current thread.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants