These rules define how an AI coding agent should plan, execute, verify, communicate, and recover when working in a real codebase. Optimize for correctness, minimalism, and developer experience.
This project uses the Agent Manifest Protocol (AMP) — a per-module navigation and impact map. Every module has a _MANIFEST.md.
> Open Codebase Map (agent_actions/_MANIFEST.md)
- Start Here: Use this file for high-level context.
- Find the module: Follow the root
_MANIFEST.mdlink above to find the module you need. - Read the manifest: Each module's
_MANIFEST.mdtells you what it does, what user project files it touches, and what depends on it. - Assess impact: Before changing code, read the
## Project Surfacetable to see what user-facing files your change affects. Read## Dependenciesto see what other modules will be impacted. - Read code: Only read source files after the manifest has pointed you to the right location.
Every _MANIFEST.md has these sections in this order:
| Section | Purpose | Parseable |
|---|---|---|
# {Module Name} |
Module name and one-line description | Yes |
## Sub-Modules |
Links to nested package manifests (optional) | Yes |
## Modules |
Table of source files, types, exports, signals | Yes |
## Project Surface |
What user project files this module reads/writes/validates | Yes |
## Dependencies |
What depends on this module and what it depends on | Yes |
## Notes |
Design decisions, gotchas, diagrams (optional) | No |
The parseable sections use standardized table headers so tooling can extract JSON from markdown. See bin/manifest-to-json.py.
The ## Project Surface table maps framework symbols to user project files:
| Symbol | File | Interaction | Config Key |
|--------|------|-------------|------------|
These are the stable paths that exist in every user's project:
| Path | What it is |
|---|---|
agent_actions.yml |
Project config |
agent_config/{workflow}.yml |
Workflow definition |
prompt_store/{workflow}.md |
Prompt templates |
schema/{workflow}/{action}.yml |
Output schemas |
tools/{workflow}/*.py |
UDF tool scripts |
seed_data/*.json |
Reference data |
agent_io/staging/ |
Input data |
agent_io/target/{action}/ |
Output data per action |
agent_io/store/ |
Durable database (SQLite) |
.env |
Environment variables |
Use this to answer: "I'm changing this code — what user files does it affect?" Or in reverse: "Something broke in my config — what framework code touches it?"
When making code changes:
- Add/remove a module: Update the
## Modulestable. - Change what a symbol reads/writes: Update the
## Project Surfacetable. - Add/remove a module dependency: Update the
## Dependenciestable. - New sub-package: Create a
_MANIFEST.mdin it and link from the parent's## Sub-Modules.
- Install:
uv sync - Activate:
source .venv/bin/activate - Test:
pytest - Lint:
ruff check . - Format:
ruff format .
- Correctness over cleverness: Prefer boring, readable solutions that are easy to maintain.
- Smallest change that works: Minimize blast radius; don't refactor adjacent code unless it meaningfully reduces risk or complexity.
- Leverage existing patterns: Follow established project conventions before introducing new abstractions or dependencies.
- Prove it works: "Seems right" is not done. Validate with tests/build/lint and/or a reliable manual repro.
- Be explicit about uncertainty: If you cannot verify something, say so and propose the safest next step to verify.
- Enter plan mode for any non-trivial task (3+ steps, multi-file change, architectural decision, production-impacting behavior).
- Include verification steps in the plan (not as an afterthought).
- If new information invalidates the plan: stop, update the plan, then continue.
- Write a crisp spec first when requirements are ambiguous (inputs/outputs, edge cases, success criteria).
- Use subagents to keep the main context clean and to parallelize:
- repo exploration, pattern discovery, test failure triage, dependency research, risk review.
- Give each subagent one focused objective and a concrete deliverable:
- "Find where X is implemented and list files + key functions" beats "look around."
- Merge subagent outputs into a short, actionable synthesis before coding.
- Prefer thin vertical slices over big-bang changes.
- Land work in small, verifiable increments:
- implement → test → verify → then expand.
- When feasible, keep changes behind:
- feature flags, config switches, or safe defaults.
- After any user correction or a discovered mistake:
- add a new entry to
tasks/lessons.mdcapturing: - the failure mode, the detection signal, and a prevention rule.
- Review
tasks/lessons.mdat session start and before major refactors.
- Never mark complete without evidence:
- tests, lint/typecheck, build, logs, or a deterministic manual repro.
- Compare behavior baseline vs changed behavior when relevant.
- Ask: "Would a staff engineer approve this diff and the verification story?"
- For non-trivial changes, pause and ask:
- "Is there a simpler structure with fewer moving parts?"
- If the fix is hacky, rewrite it the elegant way if it does not expand scope materially.
- Do not over-engineer simple fixes; keep momentum and clarity.
- When given a bug report:
- reproduce → isolate root cause → fix → add regression coverage → verify.
- Do not offload debugging work to the user unless truly blocked.
- If blocked, ask for one missing detail with a recommended default and explain what changes based on the answer.
- Plan First
- Write a checklist to
tasks/todo.mdfor any non-trivial work. - Include "Verify" tasks explicitly (lint/tests/build/manual checks).
- Define Success
- Add acceptance criteria (what must be true when done).
- Track Progress
- Mark items complete as you go; keep one "in progress" item at a time.
- Checkpoint Notes
- Capture discoveries, decisions, and constraints as you learn them.
- Document Results
- Add a short "Results" section: what changed, where, how verified.
- Capture Lessons
- Update
tasks/lessons.mdafter corrections or postmortems.
- Lead with outcome and impact, not process.
- Reference concrete artifacts:
- file paths, command names, error messages, and what changed.
- Avoid dumping large logs; summarize and point to where evidence lives.
When you must ask:
- Ask exactly one targeted question.
- Provide a recommended default.
- State what would change depending on the answer.
- If you inferred requirements, list them briefly.
- If you could not run verification, say why and how to verify.
- Always include:
- what you ran (tests/lint/build), and the outcome.
- If you didn't run something, give a minimal command list the user can run.
- Don't narrate every step.
- Do provide checkpoints when:
- scope changes, risks appear, verification fails, or you need a decision.
- Before editing:
- locate the authoritative source of truth (existing module/pattern/tests).
- Prefer small, local reads (targeted files) over scanning the whole repo.
- Maintain a short running "Working Notes" section in
tasks/todo.md: - key constraints, invariants, decisions, and discovered pitfalls.
- When context gets large:
- compress into a brief summary and discard raw noise.
- Prefer explicit names and direct control flow.
- Avoid clever meta-programming unless the project already uses it.
- Leave code easier to read than you found it.
- If a change reveals deeper issues:
- fix only what is necessary for correctness/safety.
- log follow-ups as TODOs/issues rather than expanding the current task.
If anything unexpected happens (test failures, build errors, behavior regressions):
- stop adding features
- preserve evidence (error output, repro steps)
- return to diagnosis and re-plan
- Reproduce reliably (test, script, or minimal steps).
- Localize the failure (which layer: UI, API, DB, network, build tooling).
- Reduce to a minimal failing case (smaller input, fewer steps).
- Fix root cause (not symptoms).
- Guard with regression coverage (test or invariant checks).
- Verify end-to-end for the original report.
- Prefer "safe default + warning" over partial behavior.
- Degrade gracefully:
- return an error that is actionable, not silent failure.
- Avoid broad refactors as "fixes."
- Keep changes reversible:
- feature flag, config gating, or isolated commits.
- If unsure about production impact:
- ship behind a disabled-by-default flag.
- Add logging/metrics only when they:
- materially reduce debugging time, or prevent recurrence.
- Remove temporary debug output once resolved (unless it's genuinely useful long-term).
- Design boundaries around stable interfaces:
- functions, modules, components, route handlers.
- Prefer adding optional parameters over duplicating code paths.
- Keep error semantics consistent (throw vs return error vs empty result).
- Add the smallest test that would have caught the bug.
- Prefer:
- unit tests for pure logic,
- integration tests for DB/network boundaries,
- E2E only for critical user flows.
- Avoid brittle tests tied to incidental implementation details.
- Avoid suppressions (
any, ignores) unless the project explicitly permits and you have no alternative. - Encode invariants where they belong:
- validation at boundaries, not scattered checks.
- Do not add new dependencies unless:
- the existing stack cannot solve it cleanly, and the benefit is clear.
- Prefer standard library / existing utilities.
- Never introduce secret material into code, logs, or chat output.
- Treat user input as untrusted:
- validate, sanitize, and constrain.
- Prefer least privilege (especially for DB access and server-side actions).
- Avoid premature optimization.
- Do fix:
- obvious N+1 patterns, accidental unbounded loops, repeated heavy computation.
- Measure when in doubt; don't guess.
- Keyboard navigation, focus management, readable contrast, and meaningful empty/error states.
- Prefer clear copy and predictable interactions over fancy effects.
| Layer | Tool | Level |
|---|---|---|
| CLI user output | click.echo() / console.print() |
N/A |
| Operational flow events | logger.info |
Start, complete, skip |
| Recoverable failures | logger.warning |
Retries, fallbacks, degradation |
| Unrecoverable errors | logger.error |
Failures that affect output |
| Debug diagnostics | logger.debug |
Exception traces, internal state |
Rules:
- Business logic:
logger = logging.getLogger(__name__)— neverprint() - CLI:
click.echo()for user output,logger.*for operational logging - Silent
except: passmust log atdebugorwarning(except destructors and safety utilities) print()permitted only in: standalone scripts (__main__), docstring examples, dedicated debug handlers
- Keep commits atomic and describable; avoid "misc fixes" bundles.
- Don't rewrite history unless explicitly requested.
- Don't mix formatting-only changes with behavioral changes unless the repo standard requires it.
- Treat generated files carefully:
- only commit them if the project expects it.
A task is done when:
- Behavior matches acceptance criteria.
- Tests/lint/typecheck/build (as relevant) pass or you have a documented reason they were not run.
- Risky changes have a rollback/flag strategy (when applicable).
- The code follows existing conventions and is readable.
- A short verification story exists: "what changed + how we know it works."
- Restate goal + acceptance criteria
- Locate existing implementation / patterns
- Design: minimal approach + key decisions
- Implement smallest safe slice
- Add/adjust tests
- Run verification (lint/tests/build/manual repro)
- Summarize changes + verification story
- Record lessons (if any)
- Repro steps:
- Expected vs actual:
- Root cause:
- Fix:
- Regression coverage:
- Verification performed:
- Risk/rollback notes: