AgentTrace is an open dataset of tool-using language-model agent traces with execution telemetry. Each trace records model-generation steps, tool calls, wall-clock timing, OS-level resource usage, tool inputs and outputs, reasoning content, and reproducibility metadata.
The repository contains the dataset, collection code, analysis scripts, and the deterministic NL2Bash fixture needed to replay the local command-line tasks.
- GitHub: https://github.com/pagarsky/agent-trace
- Hugging Face: https://huggingface.co/datasets/pagarsky/agent-trace
| File | Source tasks | Model | Traces |
|---|---|---|---|
datasets/mbpp_0_6B_20260328T133144Z.jsonl |
MBPP test split | Qwen3-0.6B | 500 |
datasets/mbpp_1_7B_20260403T211347Z.jsonl |
MBPP test split | Qwen3-1.7B | 500 |
datasets/nl2bash_0_6B_20260328T133144Z.jsonl |
NL2Bash / InterCode curated | Qwen3-0.6B | 200 |
datasets/nl2bash_1_7B_20260403T211347Z.jsonl |
NL2Bash / InterCode curated | Qwen3-1.7B | 200 |
Total: 1,400 traces.
Each JSONL row is one agent run with:
trace_id,timestamp_utc,prompt,model,total_duration_msspans[]: tool invocations withtool_name,tool_input,tool_output, timing, exit code, and resource telemetryllm_steps[]: model-generation steps with visible output, reasoning content, parsed tool calls, and token counts when availablemetadata: dataset source, task id, run id, serving configuration, model artifact, platform, hardware, and collection version metadata
Telemetry fields include user CPU time, system CPU time, peak resident-set size, disk read bytes, and disk write bytes. On macOS and Linux, memory accounting differs at the OS API level; the collector normalizes ru_maxrss to bytes.
uv sync
uv run python analyze.py datasets/mbpp_0_6B_20260328T133144Z.jsonl
uv run python analyze_deep.py datasets/mbpp_0_6B_20260328T133144Z.jsonl datasets/nl2bash_0_6B_20260328T133144Z.jsonlLoad the Hugging Face dataset:
import json
from datasets import load_dataset
ds = load_dataset("pagarsky/agent-trace")["train"]
row = ds[0]
spans = json.loads(row["spans_json"])
llm_steps = json.loads(row["llm_steps_json"])
metadata = json.loads(row["metadata_json"])The data/agenttrace.parquet file is a normalized convenience view for the Hugging Face dataset viewer and the datasets library. The raw release artifacts remain available under datasets/*.jsonl.
Generate plots into a local directory:
uv run python plots.py --outdir figuresStart a local llama-server compatible with OpenAI-style tool calling, then run:
./scripts/llama-server.sh start
uv run python collect.py --dataset mbpp -n 10 --model Qwen/Qwen3-0.6B --output datasets/traces.jsonl
uv run python collect.py --dataset nl2bash -n 10 --model Qwen/Qwen3-0.6B --output datasets/traces.jsonlFor full runs:
./scripts/collect-all.sh Qwen/Qwen3-1.7BThe NL2Bash tasks use a deterministic fixture under testdata/; regenerate or verify it with:
python scripts/generate-testdata.py --check
python scripts/generate-testdata.pyCode in this repository is released under Apache-2.0. The trace dataset is released as openly as possible under the same license, subject to any applicable terms of the upstream prompt sources used to generate traces: MBPP and the curated NL2Bash/InterCode task set. Users should respect the licenses and terms of those upstream datasets.
The traces contain model-generated reasoning and tool outputs. Host-specific paths have been normalized to /testdata or redacted placeholders where applicable. A small number of early NL2Bash traces accidentally captured output lines from repo-local files outside the /testdata fixture; those line contents are masked with <redacted ...> markers, but no trace files or rows were removed.
A paper citation will be added after publication. Until then, please cite the dataset and code repository:
@misc{paharskyi_agenttrace_2026,
title = {AgentTrace: Tool-Using Model Telemetry Dataset},
author = {Paharskyi, Oleksii and Haina, Heorhii},
year = {2026},
howpublished = {GitHub and Hugging Face},
url = {https://github.com/pagarsky/agent-trace},
note = {Dataset and code: https://huggingface.co/datasets/pagarsky/agent-trace}
}