Skip to content

pagarsky/agent-trace

Repository files navigation

AgentTrace

AgentTrace is an open dataset of tool-using language-model agent traces with execution telemetry. Each trace records model-generation steps, tool calls, wall-clock timing, OS-level resource usage, tool inputs and outputs, reasoning content, and reproducibility metadata.

The repository contains the dataset, collection code, analysis scripts, and the deterministic NL2Bash fixture needed to replay the local command-line tasks.

Links

Dataset Files

File Source tasks Model Traces
datasets/mbpp_0_6B_20260328T133144Z.jsonl MBPP test split Qwen3-0.6B 500
datasets/mbpp_1_7B_20260403T211347Z.jsonl MBPP test split Qwen3-1.7B 500
datasets/nl2bash_0_6B_20260328T133144Z.jsonl NL2Bash / InterCode curated Qwen3-0.6B 200
datasets/nl2bash_1_7B_20260403T211347Z.jsonl NL2Bash / InterCode curated Qwen3-1.7B 200

Total: 1,400 traces.

Trace Schema

Each JSONL row is one agent run with:

  • trace_id, timestamp_utc, prompt, model, total_duration_ms
  • spans[]: tool invocations with tool_name, tool_input, tool_output, timing, exit code, and resource telemetry
  • llm_steps[]: model-generation steps with visible output, reasoning content, parsed tool calls, and token counts when available
  • metadata: dataset source, task id, run id, serving configuration, model artifact, platform, hardware, and collection version metadata

Telemetry fields include user CPU time, system CPU time, peak resident-set size, disk read bytes, and disk write bytes. On macOS and Linux, memory accounting differs at the OS API level; the collector normalizes ru_maxrss to bytes.

Quick Start

uv sync
uv run python analyze.py datasets/mbpp_0_6B_20260328T133144Z.jsonl
uv run python analyze_deep.py datasets/mbpp_0_6B_20260328T133144Z.jsonl datasets/nl2bash_0_6B_20260328T133144Z.jsonl

Load the Hugging Face dataset:

import json
from datasets import load_dataset

ds = load_dataset("pagarsky/agent-trace")["train"]
row = ds[0]
spans = json.loads(row["spans_json"])
llm_steps = json.loads(row["llm_steps_json"])
metadata = json.loads(row["metadata_json"])

The data/agenttrace.parquet file is a normalized convenience view for the Hugging Face dataset viewer and the datasets library. The raw release artifacts remain available under datasets/*.jsonl.

Generate plots into a local directory:

uv run python plots.py --outdir figures

Replay And Collection

Start a local llama-server compatible with OpenAI-style tool calling, then run:

./scripts/llama-server.sh start
uv run python collect.py --dataset mbpp -n 10 --model Qwen/Qwen3-0.6B --output datasets/traces.jsonl
uv run python collect.py --dataset nl2bash -n 10 --model Qwen/Qwen3-0.6B --output datasets/traces.jsonl

For full runs:

./scripts/collect-all.sh Qwen/Qwen3-1.7B

The NL2Bash tasks use a deterministic fixture under testdata/; regenerate or verify it with:

python scripts/generate-testdata.py --check
python scripts/generate-testdata.py

Data Sources And Licensing

Code in this repository is released under Apache-2.0. The trace dataset is released as openly as possible under the same license, subject to any applicable terms of the upstream prompt sources used to generate traces: MBPP and the curated NL2Bash/InterCode task set. Users should respect the licenses and terms of those upstream datasets.

The traces contain model-generated reasoning and tool outputs. Host-specific paths have been normalized to /testdata or redacted placeholders where applicable. A small number of early NL2Bash traces accidentally captured output lines from repo-local files outside the /testdata fixture; those line contents are masked with <redacted ...> markers, but no trace files or rows were removed.

Citation

A paper citation will be added after publication. Until then, please cite the dataset and code repository:

@misc{paharskyi_agenttrace_2026,
  title        = {AgentTrace: Tool-Using Model Telemetry Dataset},
  author       = {Paharskyi, Oleksii and Haina, Heorhii},
  year         = {2026},
  howpublished = {GitHub and Hugging Face},
  url          = {https://github.com/pagarsky/agent-trace},
  note         = {Dataset and code: https://huggingface.co/datasets/pagarsky/agent-trace}
}

Releases

No releases published

Packages

 
 
 

Contributors