Skip to content

Emre-Dinc/DBSGPT_DEMO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

School Knowledge Base Chatbot

A chatbot system designed to handle school-related queries using the Mistral LLM model and Milvus vector database. The system processes PDF documents (such as FAQs, handbooks, etc.), chunks them intelligently, and provides accurate responses to student queries.This system implements RAG (Retrieval Augmented Generation) architecture.

Features

  • PDF document processing and intelligent text chunking
  • Vector similarity search using Milvus
  • Integration with Mistral LLM for natural language understanding
  • Specialized handling of FAQ-style documents
  • Support for multiple document types and formats
  • Metadata extraction and categorization
  • Question-Answer pair extraction from documents

Prerequisites

  • Python 3.x
  • Milvus server running locally or remotely
  • Sufficient storage for model weights and document embeddings
  • GPU recommended for better performance

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/school_chatbot.git
cd school_chatbot
  1. Install the required packages:
pip install -e .
  1. Set up Milvus:
  1. Configure the application:
  • Copy config/config.yaml.example to config/config.yaml
  • Update the configuration values as needed:
    • Milvus connection settings
    • Model paths and parameters
    • Processing directories
    • API and GUI settings

πŸ“ Project Structure

school_chatbot/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ data_processing/     # PDF processing and chunking
β”‚   β”œβ”€β”€ db/                  # Milvus client and database operations
β”‚   β”œβ”€β”€ llm/                 # Mistral model integration
β”‚   └── utils/              # Utility functions
β”œβ”€β”€ tests/                  # Test files and test data
β”œβ”€β”€ config/                 # Configuration files
└── setup.py               # Package setup file

πŸ”§ Usage

  1. Process PDF documents:
from src.data_processing.pdf_processor import PDFProcessor

processor = PDFProcessor(input_dir="data/raw", output_dir="data/processed")
processor.process_directory()
  1. Load processed documents into Milvus:
from src.data_loading.data_loader import DataLoader

loader = DataLoader()
loader.load_directory("data/processed")
  1. Query the system:
from src.llm.mistral_client import MistralClient
from src.db.milvus_client import MilvusClient

# Initialize clients
mistral = MistralClient()
milvus = MilvusClient()

# Search for relevant context
results = milvus.search(query_embedding)

# Generate response using Mistral
response = mistral.generate_response(query, context=results)

The test suite includes:

  • PDF processing tests
  • Text chunking tests
  • Milvus integration tests
  • Data loading tests

βš™οΈ Configuration

Key configuration options in config.yaml:

milvus:
  host: "localhost"
  port: 19530
  collection_name: "school_docs"

model:
  name: "Mistral-9B-Instruct"
  path: "/path/to/model/weights"
  context_length: 4096

embedding:
  model_name: "all-MiniLM-L6-v2"
  dimension: 384
  batch_size: 32

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages