A chatbot system designed to handle school-related queries using the Mistral LLM model and Milvus vector database. The system processes PDF documents (such as FAQs, handbooks, etc.), chunks them intelligently, and provides accurate responses to student queries.This system implements RAG (Retrieval Augmented Generation) architecture.
- PDF document processing and intelligent text chunking
- Vector similarity search using Milvus
- Integration with Mistral LLM for natural language understanding
- Specialized handling of FAQ-style documents
- Support for multiple document types and formats
- Metadata extraction and categorization
- Question-Answer pair extraction from documents
- Python 3.x
- Milvus server running locally or remotely
- Sufficient storage for model weights and document embeddings
- GPU recommended for better performance
- Clone the repository:
git clone https://github.com/yourusername/school_chatbot.git
cd school_chatbot- Install the required packages:
pip install -e .- Set up Milvus:
- Follow the official Milvus installation guide
- Start the Milvus server
- Configure the application:
- Copy
config/config.yaml.exampletoconfig/config.yaml - Update the configuration values as needed:
- Milvus connection settings
- Model paths and parameters
- Processing directories
- API and GUI settings
school_chatbot/
βββ src/
β βββ data_processing/ # PDF processing and chunking
β βββ db/ # Milvus client and database operations
β βββ llm/ # Mistral model integration
β βββ utils/ # Utility functions
βββ tests/ # Test files and test data
βββ config/ # Configuration files
βββ setup.py # Package setup file
- Process PDF documents:
from src.data_processing.pdf_processor import PDFProcessor
processor = PDFProcessor(input_dir="data/raw", output_dir="data/processed")
processor.process_directory()- Load processed documents into Milvus:
from src.data_loading.data_loader import DataLoader
loader = DataLoader()
loader.load_directory("data/processed")- Query the system:
from src.llm.mistral_client import MistralClient
from src.db.milvus_client import MilvusClient
# Initialize clients
mistral = MistralClient()
milvus = MilvusClient()
# Search for relevant context
results = milvus.search(query_embedding)
# Generate response using Mistral
response = mistral.generate_response(query, context=results)The test suite includes:
- PDF processing tests
- Text chunking tests
- Milvus integration tests
- Data loading tests
Key configuration options in config.yaml:
milvus:
host: "localhost"
port: 19530
collection_name: "school_docs"
model:
name: "Mistral-9B-Instruct"
path: "/path/to/model/weights"
context_length: 4096
embedding:
model_name: "all-MiniLM-L6-v2"
dimension: 384
batch_size: 32