A FastAPI-based multi-tool agent system that supports document ingestion, vector search, intelligent query handling, and retrieval evaluation using OpenAI and ChromaDB.
This project demonstrates how to build a modular agent architecture with multiple embedding backends, tool definitions, and automated evaluation pipelines.
- Environment variables are managed using pydantic-settings.
- Configure your OpenAI key, model names, and other values in the
.envfile. - Settings are automatically loaded at runtime.
Two ingestion APIs allow you to store documents in ChromaDB using different embedding strategies:
-
Sentence Transformer Embeddings
- Uses ChromaDB’s default sentence transformer for embeddings.
- Stores the embeddings in a persistent client on disk.
-
OpenAI Embeddings
- Uses OpenAI’s embedding models for ingestion.
- Stored in a separate persistent client for comparison.
Ingestion Process:
- Document Loader: Uses
pymupdffor TXT/document parsing. - Chunking Strategy: Fixed size chunks of 100 tokens with 50 token overlap.
- VectorStore: Persistent client of chromaDB ensures embeddings are saved locally.
- Provides an API to handle user queries.
- Accepts structured input via a Pydantic model.
- Loads
bot.jsoncontaining:- System instructions / LLM prompt.
- Tool definitions.
Agent Workflow:
- Initializes the
agentobject usingAgentclass with the system prompt, tool definitions, and chat history. - The
run()method calls the OpenAI responses API with:- query
- LLM decides which tool to invoke in run method.
- The selected tool is dynamically executed and returns results.
- The agent run method recursively re-runs with updated chat history until a final output is generated.
- Provides an API to evaluate retrieval quality from both embedding approaches (Sentence Transformer vs OpenAI Embeddings).
- Measures:
- Best Model (via LLM-as-a-judge).
- Latency (time required to retrieve chunks).
- LLM outputs which embedding method is more suitable.
- FastAPI – Async Web framework
- ChromaDB – Vector database
- OpenAI – LLM & embeddings
- pydantic-settings – Settings management
- pymupdf – Document loading
- uv – Dependency management
- Docker – Containerization
After starting the server, visit Swagger UI (http://0.0.0.0:8000/docs) to explore the 4 APIs:
POST rag/chroma/ingest– Ingest the data with default sentence transformer.POST rag/chroma/ingest/openai– Ingest the data with OpenAI embeddings.POST rag/message-agent– Query via Message Agent.POST rag/evaluate– Outputs Best Model for retrieval and latency.
git clone https://github.com/sunilvepanjeri/multi-agent.git