Citation-gated tax verification built on Claude. Four-layer knowledge base. Caught a $19,000 state income tax error that TurboTax defended as correct.
This agent verifies tax filings against a structured knowledge base — federal code, state statutes, IRS publications, and prior-year precedents — and refuses to return a verdict without a citation chain. Every claim traces to a source. No citation, no answer.
Filing software optimizes for completion, not accuracy. When it told me a calculation was correct, I didn't trust the confidence — I trusted the citation. The agent flagged a $19,000 state income tax discrepancy the software had defended. The discrepancy was real.
That's the gap this is built for: not replacing tax software, but pressure-testing it with a system that has to show its work.
knowledge_base/
├── federal/ # IRC sections, Treasury regulations
├── state/ # State statutes and DOR guidance
├── irs_pubs/ # IRS publications (Pub 17, 525, 590, etc.)
└── precedents/ # Prior rulings and edge-case notes
agent/
├── retriever.py # Four-layer KB query with ranked retrieval
├── verifier.py # Claude prompt chain — claim → citation → verdict
└── reporter.py # Structured output with source attribution
The verification loop: parse the filing input → retrieve relevant KB passages → prompt Claude to verify each claim → block any assertion that lacks a grounded citation → return a structured report.
# Clone and install
git clone https://github.com/mitwilli-create/tax-verification-agent
cd tax-verification-agent
pip install -r requirements.txt
# Set your Anthropic API key
export ANTHROPIC_API_KEY=your_key_here
# Run verification against a filing input
python agent/verifier.py --input examples/sample_filing.json
# Output: structured report with citations
python agent/reporter.py --input examples/sample_filing.json --output report.mdRequirements: Python 3.10+, Anthropic API access, filing data in JSON format (schema in examples/).
CLAIM: State income exclusion applied — $X excluded from AGI
STATUS: ⚠ DISPUTED
CITATION REQUIRED: Yes
KB MATCH: [state_statute_ref] — exclusion applies only if condition Y is met
VERDICT: Condition Y not present in filing. Exclusion not valid.
Applied AI architecture — citation-gated retrieval isn't a prompt trick. It's a constraint layer that forces the model to ground every output in a retrievable source. Same pattern I used building the Executive RAG pipeline at Google: the system's credibility comes from what it refuses to say without evidence.
Real-world validation — the $19,000 catch wasn't a benchmark. It was a live filing, a confident piece of commercial software on the other side, and a structured agent that disagreed with a citation.
Production thinking on a personal project — four-layer KB, structured output, source attribution. Built the way I'd build it if it were going to production.
claude anthropic tax-verification rag citation-grounded applied-ai llm-agent python
Built by Mitchell Williams — GitHub · LinkedIn · thestorytellermitch.com