A Retrieval-Augmented Generation (RAG) chatbot designed to answer questions about company policies using uploaded PDF or TXT documents. This project demonstrates the use of LangChain, Ollama LLM & embeddings, and Chroma vector database to provide contextual answers with source citations.
✅ Upload PDF or TXT files containing company policies.
✅ Semantic search with embeddings + vector database (Chroma).
✅ LLM-based answer generation using Ollama.
✅ Retrieves relevant document chunks and cites sources.
✅ Maintains conversation history within the session.
🌐 Interactive web UI built with Streamlit.
This project is designed to showcase skills in:
- Document ingestion and chunking.
- Embedding & semantic search for relevant content.
- Retrieval-Augmented Generation (RAG).
- Session-based conversation memory.
- Interactive, user-friendly UI for querying documents.
| Component | Tool / Library |
|---|---|
| LLM & Embeddings | Ollama |
| Vector Store | Chroma |
| Python Framework | LangChain |
| Web Interface | Streamlit |
| Document Loaders | PyPDFLoader (PDF), TextLoader (TXT) |
ollama serve
ollama pull nomic-embed-text
ollama pull llama3.2git clone https://github.com/MirazulHasan/COMPANY_POLICY_ASSISTANT_CHATBOT.git
cd company-policy-chatbotpython -m venv venvsource venv/bin/activatevenv\Scripts\activateYou can install all dependencies using pip:
pip install -r requirements.txtOr manually:
pip install streamlit
pip install langchain
pip install langchain_community
pip install langchain_text_splitters
pip install langchain_chroma
pip install langchain_ollama
pip install chromadb
pip install pypdfPython 3.11+ recommended.
ollama serveRequired for embeddings and chat.
streamlit run policy_chatbot.pyOpen your browser: http://localhost:8501
-
Click "Upload Policy Docs" in the sidebar.
-
Accepts PDF or TXT files.
Use the chat input to ask anything like:
What is the company’s leave policy?
How can employees request remote work?
-
Answers come only from uploaded docs.
-
Click "View Sources" to see file + page + snippet.
-
Maintained within the session.
-
Each message shows in the chat interface.
- Load Documents → Read PDFs/TXTs
- Split → Chunk into 1000-char segments with overlap
- Embed → Convert to vectors using
nomic-embed-text - Store → In Chroma vector DB
- Query → Retrieve top-4 relevant chunks
- Generate → LLM answers using only retrieved context
- Cite → Show source file + page + snippet
- Chat → Session memory preserved
policy_chatbot.py– Main Streamlit + RAG apprequirements.txt– Python dependencies
- Ensure
nomic-embed-textandllama3.2are pulled via Ollama. - Session memory only (in-browser). For persistence: add SQLite or JSON logging.
- Fully local & private — no data leaves your machine.
© 2025 Md. Mirazul Hasan
All Rights Reserved.
For educational and internal use.