pi-structured-return

A Pi extension that adds a structured_return tool alongside bash, returning compact parsed results with full logs — 60–95% fewer tokens without losing signal.

A failing test run, before and after:

Raw pytest output (262 tokens):

============================= test session starts ==============================
platform darwin -- Python 3.14.2, pytest-9.0.2
collecting ... collected 3 items

test_math.py::test_adds_two_numbers_correctly PASSED                     [ 33%]
test_math.py::test_multiplies_two_numbers_correctly FAILED               [ 66%]
test_math.py::test_does_not_divide_by_zero FAILED                        [100%]

=================================== FAILURES ===================================
____________________ test_multiplies_two_numbers_correctly _____________________

    def test_multiplies_two_numbers_correctly():
>       assert 3 * 4 == 99
E       assert (3 * 4) == 99

test_math.py:5: AssertionError
_________________________ test_does_not_divide_by_zero _________________________

    def test_does_not_divide_by_zero():
>       result = 1 / 0
                 ^^^^^
E       ZeroDivisionError: division by zero

test_math.py:8: ZeroDivisionError
=========================== short test summary info ============================
FAILED test_math.py::test_multiplies_two_numbers_correctly
FAILED test_math.py::test_does_not_divide_by_zero - ZeroDivisionError: ...
========================= 2 failed, 1 passed in 0.01s ==========================

Structured result returned to the model (56 tokens):

pytest test_math.py --junitxml=.tmp/report.xml → cwd: project
2 failed, 1 passed
  test_math.py:5  assert (3 * 4) == 99
  test_math.py:8  ZeroDivisionError: division by zero

262 → 56 tokens on a 3-test example. Real test suites are much larger — the reduction scales with them, saving thousands of tokens per run.

Installation

pi install npm:@robhowley/pi-structured-return

Design

structured_return is a separate tool, not a wrapper around bash. Intercepting bash to silently rewrite commands would override a primitive the model and platform both rely on. Pi's philosophy is to extend rather than obfuscate: features are built on top of the platform, not hidden inside it. A dedicated tool honors that. It adds to the available surface, keeps bash honest, and leaves the choice explicit. The skill guides the model toward it; nothing is hijacked to get there.

Token reduction

Measured with cl100k_base (tiktoken). All benchmarks use tiny fixtures — reduction grows with real-world output.

Test runners

Benchmark: 3 tests — 1 passing, 1 assertion failure, 1 unexpected error.

Tool	Raw	Structured	Reduction	Notes
`mvn test`	1063	86	92%	build lifecycle noise with surefire stack traces per failure
`node --test`	629	64	90%	strips full stack traces, assertion internals, timing; preserves expected/actual
`npx ava`	483	56	88%	source snippets, diffs, full stack traces stripped; expected/actual preserved
`go test`	394	48	88%	stack traces, goroutine frames, panic recovery noise stripped; file:line + expected/actual preserved
`dotnet test`	487	107	78%	build header and VSTest output with per-failure stack traces
`npx vitest`	348	75	78%	source diff with inline arrows and ANSI color codes per failure
`python -m unittest`	231	52	78%	full tracebacks with source annotations; expected/actual from AssertionError
`cargo test`	285	68	76%	cargo progress + test binary output with panic traces per failure
`pytest`	289	71	75%	verbose output with source snippets and summary footer
`rspec`	212	55	74%	default output with backtrace
`gradle test`	263	81	69%	gradle console output with build lifecycle noise
`npx mocha`	180	55	69%	stack traces + assertion diff formatting; expected/actual preserved
`npx jest`	309	99	68%	source annotations with deep jest-circus stack traces per failure
`ruby` (minitest)	168	59	65%	default output with backtrace

Build tools and compilers

Benchmark: 1 file, 1–2 errors. Reduction scales with error count since raw output includes source snippets, caret indicators, and annotations per error.

Tool	Raw	Structured	Reduction	Notes
`dotnet build`	383	53	86%	strips restore/timing noise, deduplicates repeated error lines, absolute paths relativized
`npx jsonlint`	148	28	81%	strips stack trace, source pointer line; preserves line number and expecting message
`tidy`	233	51	78%	strips remediation advice, accessibility tips, reformatted HTML output, Info lines
`cargo build`	225	77	66%	rustc error annotations with code spans and help text per error
`swiftc`	161	58	64%	source annotations with backtick markers deduplicated
`gcc` / `clang`	109	77	29%	strips source snippets, caret indicators, line numbers from gutter
`javac`	79	66	16%	strips source snippets, caret indicators; folds symbol/location into message

Linters and type checkers

Benchmark: 1 file, 1–2 violations. Reduction is a conservative lower bound — scales with file and error count since raw output repeats paths, source snippets, and annotations per violation.

Tool	Raw	Structured	Reduction	Notes
`isort --check`	143	29	80%	strips diff hunks, absolute paths, timestamps; lists files with unsorted imports
`black --check`	155	31	80%	strips diff hunks, emoji, timestamps; lists files needing reformatting
`ruff check`	107	52	51%	source context + help text per error
`shellcheck`	224	117	48%	strips source snippets, carets, suggestions, wiki URLs
`npx htmlhint`	174	92	47%	strips ANSI codes, source evidence, rule descriptions, URLs
`vale`	141	79	44%	strips ANSI codes, Action/Span metadata, column-aligned formatting
`markdownlint`	199	117	41%	strips context quotes, URLs, fix info, error ranges
`pyright`	100	59	41%	strips version, timing, absolute paths; detail lines collapsed
`rubocop`	149	90	40%	strips source snippets, caret indicators, summary line
`tsc`	107	72	33%	vs `--pretty true` default; source snippets and underlines stripped
`stylelint`	70	51	27%	strips summary footer and fix hint
`pylint`	141	120	15%	strips header, score line, separator; scales with error count
`prettier --check`	38	33	13%	strips preamble, [warn] prefixes, footer hint; scales with file count
`hadolint`	178	156	12%	strips ANSI color codes and level labels; measured vs colored output
`eslint`	64	59	8%	already compact formatter
`mypy`	75	72	4%	mypy text is already compact; notes folded into parent errors

Security and audit

Tool	Raw	Structured	Reduction	Notes
`bandit`	402	99	75%	strips source snippets, CWE URLs, run metrics, confidence labels
`npm audit`	158	50	68%	strips advisory URLs, fix instructions, CVSS vectors; advisory titles joined per package

Pipeline tools

dbt output is the noisiest tool in this repo relative to useful signal. Every run prints version info, adapter registration, project stats, concurrency settings, and per-node start/finish lines — all before any result.

The numbers below use 3–4 model toy examples; real projects run hundreds of models where the noise scales linearly and reduction compounds.

Tool	Raw	Structured	Reduction	Notes
`dbt run` (success)	428	20	95%	version, adapter, concurrency, per-model start/finish — all noise on success
`dbt run` (failure)	618	198	68%	error messages, model paths, compiled code paths preserved
`dbt test`	720	274	62%	unit test diff tables preserved verbatim; preamble stripped
`dbt compile`	775	683	12%	compiled SQL is the signal and returned verbatim

At 12 models, run failures hit 85% reduction. An 18-model DAG success: 1,645 → 20 tokens (99%).

Already compact — use `bash` directly

Evaluated for structured parsing but raw output is already compact enough that a parser adds no reduction (or goes negative). Use bash instead of structured_return for these tools.

Tool	Raw tokens	Format	Why no parser
`go build`	85	`file:line:col: message`	one line per error, no decoration
`flake8`	75	`file:line:col: CODE message`	no JSON without a plugin; text is already one line per violation
`yamllint`	72	`file:line:col level message (rule)`	filename printed once; one line per issue
`golangci-lint`	59	`file:line:col: message (linter)`	text output already minimal; JSON includes massive linter report
`go vet`	~60	`file:line:col: message`	same format as `go build`
`vulture`	58	`file:line: message (confidence%)`	single line per finding
`pydocstyle`	48	`file:line context + CODE: message`	two lines per issue; structured format would repeat file paths

How it works

The agent runs commands through structured_return when it would reduce noise and token usage.
Full output is captured and stored as a log.
A parser converts noisy CLI output into a compact structured result. If no parser matches, the last 200 lines and the log path are returned as a fallback.
The agent receives the structured result in context — signal only, no noise.
The full log is always available on disk for both the agent and humans to inspect.
Run /sr-stats to see how many tokens structured-return has saved — both in the current session and across all sessions.

Run /sr-parsers in a pi session to see all registered parsers with their match rules. Run /sr-stats to see token savings for the current session and lifetime.

Extending with project-local parsers

Built-in parsers cover common tools. For everything else — internal CLIs, custom test runners, proprietary lint tools — add a .pi/structured-return.json to your project root.

Why: keeps token costs low for tools the built-ins don't know about, without forking the package.

Two options:

1. Re-use a built-in parser

Route a project-specific command to an existing parser. Use this when your tool's output already matches a supported format (e.g. a test runner that emits JUnit XML).

// .pi/structured-return.json
{
  "parsers": [
    {
      "id": "acme-tests",
      "match": { "argvIncludes": ["acme", "test"] },
      "parseAs": "junit-xml"
    }
  ]
}

2. Write a custom parser

Point to a local .ts file for tools with unique output formats.

// .pi/structured-return.json
{
  "parsers": [
    {
      "id": "foo-json",
      "match": { "argvIncludes": ["foo-cli", "check"] },
      "module": "parsers/foo-cli.js"
    }
  ]
}

// .pi/parsers/foo-cli.ts
import fs from "node:fs";
import type { RunContext } from "@robhowley/pi-structured-return/types";

export default {
  id: "foo-json",
  async parse(ctx: RunContext) {
    const data = JSON.parse(fs.readFileSync(ctx.stdoutPath, "utf8"));
    return {
      tool: "foo-cli",
      status: data.ok ? "pass" : "fail",
      summary: data.ok ? "passed" : `${data.errors.length} errors`,
      failures: data.errors.map((e, i) => ({ id: e.id ?? `error-${i}`, file: e.file, line: e.line, message: e.message })),
      logPath: ctx.logPath,
    };
  },
};

The parser receives a RunContext (command, argv, cwd, stdout/stderr paths, artifact paths, log path) and returns a ParsedResult. Match rules support argvIncludes (array of required tokens) or regex (tested against the full argv string).

Structured result schema

Every parser returns the same shape. The model always knows where to look.

Field	Type	Description
`tool`	`string`	Name of the tool that ran (`eslint`, `pytest`, etc.)
`exitCode`	`number`	Raw process exit code
`status`	`pass \| fail \| error`	Normalized outcome
`summary`	`string`	One-line human+model readable result (`3 failed, 12 passed`)
`cwd`	`string`	Working directory — anchor for resolving relative paths in failures
`failures`	`{ id, file?, line?, message?, rule? }[]`	Per-failure details with relative file paths
`artifact`	`string?`	Path to the saved report file, if one was written
`logPath`	`string`	Path to full stdout+stderr log
`rawTail`	`string?`	Last 200 lines of log, included on fallback when no parser matched

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github		.github
.husky		.husky
benchmarks		benchmarks
extensions/structured-return		extensions/structured-return
skills/structured-return		skills/structured-return
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pi-structured-return

Installation

Design

Token reduction

Test runners

Build tools and compilers

Linters and type checkers

Security and audit

Pipeline tools

Already compact — use `bash` directly

How it works

Extending with project-local parsers

1. Re-use a built-in parser

2. Write a custom parser

Structured result schema

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

pi-structured-return

Installation

Design

Token reduction

Test runners

Build tools and compilers

Linters and type checkers

Security and audit

Pipeline tools

Already compact — use bash directly

How it works

Extending with project-local parsers

1. Re-use a built-in parser

2. Write a custom parser

Structured result schema

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Already compact — use `bash` directly

Packages