Your AI assistant is flying blind. Every time it needs to understand your project, it burns thousands of tokens reading file bodies just to figure out what the files are for.
- One line on top of every
.py:# AX: TAG | SUM: one-line summary | SIG: version-tag - Grep-friendly architecture β
grep -l '^# AX: PIPE'lists every pipeline;grep -l '^# AX: CONN'every external connector. Your system's shape is inspectable with a shell one-liner. - Behaviour-version signal β bump the
SIGfield on meaningful change; downstream consumers grepSIG: v5to find every site that relied on the old behavior - Pre-commit hook included (bash, no deps) β enforces AX-1 (presence) and AX-3 (tag allowlist). Soft-warn by default;
AX_STRICT=1blocks. - 5-check lifecycle spec (
SPEC.md) β AX-1 presence, AX-2 SIG bump on behavior change, AX-3 tag allowlist, AX-4 no-filler SUM, AX-5 multi-concern split hint - 16-tag starter vocabulary (CORE, PIPE, CONN, DOMAIN, INFRA, MEM, PROMPT, TEST, MIGR, SCRIPT, UTIL, SEC, LLM, IO, ENTRY, DAEMON) β adopt, extend, or replace
- Measured ~30 % reduction in context tokens on agent triage tasks across a ~350-file production codebase
- Zero runtime cost, zero build step, zero dependencies. One comment per file.
- MIT licensed.
You pay a context tax every day. Claude Code and Codex CLI waste time and tokens opening files that are irrelevant to the task at hand, simply to figure out what those files do. A filename like handler.py tells the agent nothing, so it opens the file. And the next one. And the next one.
The cost compounds. Triage tasks that should take one focused read turn into five speculative ones. Context windows fill with irrelevant code. Agents make worse decisions because they spent their attention budget on orientation, not reasoning.
Agents read codebases the way humans do β by skimming file lists. But they are flawless at processing structured metadata. One dense, machine-readable line at the top of every file turns "guess what this is" into "know what this is." The agent orients from headers and only loads bodies when it needs the detail.
You are formatting your codebase for human eyes. Machines do most of the reading now. Give them headers.
# AX: CORE | SUM: spawn CLI Brain sessions with MCP via --mcp-config | SIG: v5
# AX: CONN | SUM: Telegram connector β send, chunk, approve, retry, digest | SIG: tgconn03
# AX: PIPE | SUM: classify raw gateway events into brief artifacts with LLM + hashing | SIG: cls04That is the entire proposal.
# AX: <TAG> | SUM: <one-line summary> | SIG: <version-tag>
- TAG β one word from a project-level allowlist (β€ 20 tags, agent-relevant file roles).
- SUM β one dense sentence: what the file does, not why. Target β€ 120 chars total.
- SIG β short version tag (
v1,v5,tgconn03). Human-picked, not hash. Bump on meaningful behavior change; stable through refactors.
Adopt, extend, or replace. Consistent vocabulary within one codebase matters more than any particular word choice.
| TAG | Meaning |
|---|---|
CORE |
Core platform β main loop, orchestrator, config |
PIPE |
Scheduled or event-driven pipeline |
CONN |
External-world connector (API client, queue consumer) |
DOMAIN |
Domain logic β models, tool handlers, business rules |
INFRA |
Infrastructure β migrations, deploy, backups |
MEM |
Memory / storage / persistence |
PROMPT |
Prompt builders or prompt data |
TEST |
Tests |
MIGR |
Database migrations |
SCRIPT |
Ops scripts / maintenance tooling |
UTIL |
Shared utilities |
SEC |
Security / authentication / signing |
LLM |
LLM wrappers / budget limiters |
IO |
File / network / disk adapters |
ENTRY |
CLI entry point |
DAEMON |
Long-running daemon |
Keep the list short. A list of 100 tags is a search index, not a vocabulary.
Drop the hook into any Python repo:
cp hooks/pre-commit-ax ~/your-project/.git/hooks/pre-commit
chmod +x ~/your-project/.git/hooks/pre-commitSoft-warns on staged .py files with missing or invalid headers. Set AX_STRICT=1 to block the commit instead of warning.
To skip the header on a specific file (throwaway scripts, vendored code), add # noqa: AX anywhere in the first 5 lines.
Running in production across a ~350 Python file codebase. After six months of gradual adoption:
- ~30 % reduction in context tokens on agent triage tasks. Agents skim headers instead of opening files.
grep -l '^# AX: CONN'became the canonical way to list every external connector β replacing a docs page that drifted within weeks of every refactor.- Header drift (TAG changes not matching behavior changes) became the earliest signal that a file was becoming a grab-bag and needed splitting.
Coverage is gradual β adopt as you touch files, do not bulk-rewrite history. git blame is valuable.
- Public signature change, return-type change, invariant change β bump.
- New handler, new tool, new field β bump.
- Pure rename, docstring change, whitespace β do not bump.
- Breaking bug fix where callers must re-validate assumptions β bump.
Each file has its own SIG history. No repo-wide version numbers.
ax-headers is part of a trio. Any of them work standalone; together they compound:
- full-review β adversarial code review. 0.80+ recall on bug detection vs 0.40 for the standard "parallel specialists" pattern. Detects header drift during review.
- raisin β dense Python authoring style for LLMs. ~50 % fewer tokens, same test coverage.
MIT. Built by @Oldrich333.
