tax-code-pipeline 🏛️

A production-ready RAG pipeline for indexing and querying Title 26 (Internal Revenue Code). This project focuses on the engineering challenges of handling hierarchical legal structures, metadata integrity, and system observability.

🏗 System Architecture

ETL & Ingestion: Custom Python-based parser for Title 26 XML. Implements cleaning logic to strip boilerplate while preserving statutory hierarchy.
Hierarchical Indexing: Moves beyond fixed-length windowing to semantic section-based chunking, ensuring retrieval preserves legal context.
Vector Store: Qdrant (Dockerized) for metadata-filtered vector search.
Observability: Arize Phoenix (OpenTelemetry) integration for trace logging, latency tracking, and retrieval debugging.
Evaluation: Quantitative benchmarking via Ragas (Faithfulness, Relevancy, and Context Precision).

🛠 Tech Stack

Python 3.10+
Infrastructure: Docker, Docker Compose
Vector DB: Qdrant
Observability: Arize Phoenix / OpenTelemetry
Orchestration: LlamaIndex (or LangChain)

📂 Repository Structure . ├── src/ │ ├── ingestion/ # XML/HTML parsing & cleaning │ ├── processing/ # Hierarchical chunking & embedding │ ├── retrieval/ # Hybrid search & reranking logic │ └── api/ # FastAPI entry points ├── infra/ # Docker Compose & DB configs ├── eval/ # Ragas evaluation scripts └── data/ # Raw Title 26 sources (git-ignored)

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
design_guidelines		design_guidelines
docs/designs		docs/designs
examples		examples
frontend		frontend
infra		infra
scripts		scripts
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
findings.md		findings.md
gemini.md		gemini.md
progress.md		progress.md
requirements.txt		requirements.txt
task_plan.md		task_plan.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tax-code-pipeline 🏛️

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tax-code-pipeline 🏛️

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages