Skip to content

Add CodeAct-style trajectory exporter for training data #101

Description

@evansenter

Background

From analyzing the Opus 4.5 training methods document:

"CodeActInstruct dataset (7,139 trajectories) includes critical filtering for self-improvement: trajectories where the model encounters errors but rectifies in later turns are preserved, explicitly promoting self-debugging capability."

The session logs already contain this data. We should be able to export it in a format suitable for training.

Proposal

Add a trajectory export feature that:

1. Extracts tool-use trajectories

Convert session logs to Thought-Action-Observation format:

Thought: I need to find the config file
Action: Read(path="config.json")
Observation: {"error": "File not found"}
Thought: Maybe it's in a subdirectory
Action: exec(command="find . -name config.json")
Observation: ./src/config.json
Action: Read(path="./src/config.json")
Observation: {"api_key": "..."}

2. Filters for error→recovery patterns

Identify trajectories where:

  • An error occurred (tool returned error, assertion failed, etc.)
  • The agent successfully recovered (subsequent success on related task)
  • Mark these as high-value training examples

3. Outputs in standard format

# Export all trajectories from last 7 days
agent-session-analytics-cli export-trajectories --since 7d --format codeact

# Export only error-recovery patterns
agent-session-analytics-cli export-trajectories --filter error-recovery --format jsonl

# Export with metadata for training
agent-session-analytics-cli export-trajectories --include-metadata --output trajectories.jsonl

Output Format Options

CodeAct JSONL

{
  "session_id": "abc123",
  "trajectory": [
    {"role": "thought", "content": "..."},
    {"role": "action", "tool": "Read", "args": {...}},
    {"role": "observation", "content": "..."}
  ],
  "outcome": "success",
  "has_error_recovery": true,
  "error_recovery_spans": [[2, 5]]
}

Anthropic Messages Format

For direct use with Claude fine-tuning (if/when available).

Value

  1. Training data generation: Build custom training sets from real usage
  2. Quality analysis: Identify which sessions had good error recovery
  3. Pattern mining: Find common failure modes and successful recovery strategies

Related Work

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions