A scalable, async AI-powered document processing and question-answering system built with FastAPI, Python, and LLM frameworks. This service allows users to upload structured and unstructured documents, perform semantic search, and ask natural language questions using embeddings and vector databases..
- Upload and process CSV, Excel, PDF, and JSON documents
- Clean and preprocess data using Pandas / NumPy
- Generate embeddings using LLM frameworks (LangChain)
- Store and retrieve vectors using a Vector Database (FAISS / Chroma)
- Ask natural language questions over uploaded documents
- Fully async REST APIs for high performance
- Clean, modular, and scalable backend architecture
Client
↓
FastAPI (Async REST APIs)
↓
Document Service
├── File Processors (CSV / Excel / PDF / JSON)
├── Data Cleaning (Pandas / NumPy)
├── Text Chunking
↓
Embedding Service (LangChain)
↓
Vector Store (FAISS / Chroma)
↓
LLM (Semantic Search & Q&A)
- Python 3.10+
- FastAPI
- Pydantic
- AsyncIO
- LangChain
- OpenAI / Vertex AI (pluggable)
- Pandas
- NumPy
- PostgreSQL / MySQL (metadata)
- FAISS / ChromaDB (vector storage)
- Docker
- Docker Compose
- Git
ai-document-intelligence/
├── app/
│ ├── api/ # API routes
│ ├── services/ # Business logic
│ ├── processors/ # File processors
│ ├── models/ # DB models
│ ├── schemas/ # API schemas
│ ├── core/ # Config & logging
│ ├── db/ # DB session
│ └── main.py # App entry point
├── embeddings/ # Vector DB storage
├── data/ # Sample documents
├── tests/ # Unit & API tests
├── Dockerfile
├── docker-compose.yml
└── README.md
POST /api/v1/documents/upload
Supports:
- CSV
- Excel
- JSON
What happens internally:
- File is validated and parsed
- Data is cleaned using Pandas
- Text is chunked and embedded
- Embeddings are stored in vector DB
Final requirements