Skip to content

Latest commit

ย 

History

History
424 lines (318 loc) ยท 11.5 KB

File metadata and controls

424 lines (318 loc) ยท 11.5 KB

๐Ÿค– LectureHub Chatbot - Complete Documentation

A comprehensive guide to the LectureHub Chatbot, a Streamlit-based RAG (Retrieval-Augmented Generation) chatbot for educational content.

๐Ÿ“‹ Table of Contents

  1. Overview
  2. Architecture
  3. Installation
  4. Configuration
  5. Usage
  6. Development
  7. Testing
  8. Deployment
  9. Troubleshooting

๐ŸŽฏ Overview

The LectureHub Chatbot is a sophisticated RAG application designed to help students and educators interact with educational content. It combines:

  • ๐Ÿ” Retrieval-Augmented Generation (RAG) for accurate, context-aware responses
  • ๐Ÿ—„๏ธ PostgreSQL with pgvector for efficient vector storage and similarity search
  • ๐Ÿง  Google Gemini LLM for natural language processing (with mock fallback)
  • ๐ŸŒ Streamlit for an intuitive web interface
  • ๐Ÿ—๏ธ Modular architecture for maintainability and extensibility

โœจ Key Features

  • ๐Ÿ“„ Multi-format Document Support: PDF, Markdown, and Python code
  • ๐ŸŽฏ Smart Question Handling: Accepts all questions without strict relevance filtering
  • โšก Vector Similarity Search: Fast and accurate document retrieval
  • ๐Ÿณ Docker Integration: Easy setup with containerized database
  • ๐Ÿ”ง Modular Design: Clean, maintainable codebase
  • ๐Ÿ”„ Mock LLM Support: Works without API key for testing

๐Ÿ—๏ธ Architecture

๐Ÿ“ Project Structure

lecturehub-chatbot/
โ”œโ”€โ”€ src/                          # Main source code
โ”‚   โ”œโ”€โ”€ core/                     # Core application components
โ”‚   โ”‚   โ”œโ”€โ”€ config.py            # Configuration constants
โ”‚   โ”‚   โ”œโ”€โ”€ main.py              # Main entry point
โ”‚   โ”‚   โ”œโ”€โ”€ chatbot_logic.py     # Core chatbot logic
โ”‚   โ”‚   โ””โ”€โ”€ rag_chatbot.py       # Main orchestrator
โ”‚   โ”œโ”€โ”€ database/                 # Database operations
โ”‚   โ”‚   โ”œโ”€โ”€ database.py          # PostgreSQL and pgvector
โ”‚   โ”‚   โ””โ”€โ”€ vectorstore.py       # Vector store management
โ”‚   โ”œโ”€โ”€ llm/                      # LLM and language processing
โ”‚   โ”‚   โ”œโ”€โ”€ llm_chain.py         # LLM chain management
โ”‚   โ”‚   โ”œโ”€โ”€ mock_llm_chain.py    # Mock LLM for testing
โ”‚   โ”‚   โ””โ”€โ”€ keyword_extractor.py # Keyword extraction
โ”‚   โ”œโ”€โ”€ ui/                       # User interface
โ”‚   โ”‚   โ”œโ”€โ”€ chat_manager.py      # Chat management
โ”‚   โ”‚   โ””โ”€โ”€ ui_components.py     # UI components
โ”‚   โ””โ”€โ”€ utils/                    # Utilities
โ”‚       โ””โ”€โ”€ document_loader.py   # Document processing
โ”œโ”€โ”€ tests/                        # Test files
โ”œโ”€โ”€ docs/                         # Documentation
โ”œโ”€โ”€ config/                       # Configuration files
โ”œโ”€โ”€ docker/                       # Docker setup
โ””โ”€โ”€ data/                         # Data files

๐Ÿ”ง Core Components

1. RAGChatbot (src/core/rag_chatbot.py)

The main orchestrator that coordinates all components:

  • ๐ŸŽฎ Manages application lifecycle
  • ๐Ÿ”— Coordinates database, LLM, and UI components
  • ๐Ÿ›ก๏ธ Handles error scenarios gracefully

2. Database Management (src/database/)

  • ๐Ÿ—„๏ธ DatabaseManager: PostgreSQL connection and pgvector operations
  • ๐Ÿ” VectorStoreManager: Vector store operations and embedding management

3. LLM Processing (src/llm/)

  • ๐Ÿง  LLMChainBuilder: Creates and manages QA chains
  • ๐Ÿ”‘ KeywordExtractor: Extracts keywords for relevance checking
  • ๐ŸŽฏ MockLLMChainBuilder: Provides mock responses for testing
  • โœ… SmartRelevanceChecker: Determines if questions are relevant (currently disabled)

4. User Interface (src/ui/)

  • ๐Ÿ’ฌ ChatManager: Manages chat history and interactions
  • โš™๏ธ SidebarManager: Handles configuration UI
  • ๐Ÿ–ฅ๏ธ MainUIManager: Manages main UI components

5. Utilities (src/utils/)

  • ๐Ÿ“„ DocumentLoader: Loads and processes documents
  • โœ‚๏ธ DocumentProcessor: Handles text chunking and metadata

๐Ÿš€ Installation

๐Ÿ“‹ Prerequisites

  • Python 3.8 or higher
  • Docker and Docker Compose (for database)
  • Google Gemini API key (optional - mock LLM available)

๐Ÿ“ฆ Step-by-Step Installation

  1. ๐Ÿ“ฅ Clone the repository:

    git clone <repository-url>
    cd lecturehub-chatbot
  2. ๐Ÿ“ฆ Install Python dependencies:

    pip install -r requirements.txt
  3. ๐Ÿณ Start the database:

    ./docker/docker-setup.sh start
  4. โš™๏ธ Configure environment (optional):

    cp env.template .env
    # Edit .env with your GEMINI_API_KEY (optional)
  5. ๐Ÿงช Test the setup:

    chmod +x ./docker/docker-setup.sh
    ./docker/docker-setup.sh test

โš™๏ธ Configuration

๐Ÿ”ง Environment Variables

The application uses environment variables for configuration. Key variables include:

# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=embedding
DB_USER=root
DB_PASSWORD=root_password

# LLM Configuration (Optional)
GEMINI_API_KEY=your_api_key_here
GEMINI_MODEL=gemini-2.5-flash-lite

# Application Configuration
PROBLEM_ID=ma_de_001

๐Ÿ“ Configuration Files

  • config/env.template: Template for environment variables
  • config/example.env: Example configuration
  • docker/docker.env: Docker-specific configuration

๐Ÿ—„๏ธ Database Configuration

The application supports two database configuration methods:

  1. ๐Ÿ”ง Individual Parameters:

    db = DatabaseManager(
        host="localhost",
        port="5432",
        database="embedding",
        user="root",
        password="root_password"
    )
  2. ๐Ÿ”— Connection String:

    db = DatabaseManager(
        connection_string="postgresql+psycopg://user:pass@host:5432/db"
    )

๐ŸŽฎ Usage

๐Ÿš€ Running the Application

# Development mode
python main.py

# Production mode
streamlit run main.py

๐Ÿ’ฌ Using the Chatbot

  1. ๐Ÿš€ Start the application and navigate to the web interface
  2. โš™๏ธ Configure settings in the sidebar:
    • Database connection parameters
    • Google Gemini API key (optional)
    • Application settings
  3. ๐Ÿ“ฅ Ingest documents by clicking "Ingest dแปฏ liแป‡u"
  4. โ“ Ask questions about your educational content

๐Ÿ“„ Document Requirements

Place these files in the data/lectures/mmceasar2/ directory:

  • mmceasar2.pdf - Problem statement
  • mmceasar2.md - Lecture content
  • mmceasar2.py - Solution code

๐Ÿ’ฌ Chat Interface

The chatbot provides:

  • โšก Real-time responses based on your documents
  • ๐Ÿ“š Source citations showing which documents were used
  • ๐ŸŽฏ Flexible question handling - accepts all questions
  • ๐Ÿ’พ Chat history for continued conversations

๐Ÿ› ๏ธ Development

๐Ÿ”ง Project Setup

  1. ๐Ÿ“ฆ Install in development mode:

    pip install -e .
  2. ๐Ÿ”— Set up pre-commit hooks:

    pre-commit install

๐Ÿ†• Adding New Features

1. New Document Types

Extend the DocumentLoader class in src/utils/document_loader.py:

def _load_new_format(self, file_path: str) -> List[Any]:
    # Implementation for new document type
    pass

2. New UI Components

Add components to src/ui/ui_components.py:

class NewUIManager:
    def render_new_component(self):
        # Implementation
        pass

3. New LLM Models

Extend LLMChainBuilder in src/llm/llm_chain.py:

def build_new_chain(self, retriever) -> NewChain:
    # Implementation for new chain type
    pass

๐Ÿ“ Code Style

  • Follow PEP 8 guidelines
  • Use type hints for all function parameters and return values
  • Write docstrings for all public functions and classes
  • Keep functions small and focused

๐Ÿงช Testing

๐Ÿš€ Running Tests

# Run all tests
python -m pytest tests/

# Run specific test file
python tests/test_components.py

# Run with coverage
python -m pytest tests/ --cov=src

๐Ÿ“ Test Structure

  • tests/test_components.py: Tests for individual components
  • tests/test_database_config.py: Database configuration tests
  • tests/test_docker_db.py: Docker database integration tests
  • tests/test_relevance.py: Relevance checker tests

โœ๏ธ Writing Tests

def test_new_feature():
    """Test description."""
    # Arrange
    component = Component()
    
    # Act
    result = component.method()
    
    # Assert
    assert result == expected_value

๐Ÿš€ Deployment

๐Ÿณ Docker Deployment

  1. ๐Ÿ”จ Build the application:

    docker build -t lecturehub-chatbot .
  2. ๐Ÿš€ Run with Docker Compose:

    docker-compose -f docker/docker-compose.yml up -d

๐Ÿญ Production Considerations

  • ๐Ÿ” Environment Variables: Use production environment variables
  • ๐Ÿ—„๏ธ Database: Use production PostgreSQL instance
  • ๐Ÿ›ก๏ธ Security: Implement proper authentication and authorization
  • ๐Ÿ“Š Monitoring: Add logging and monitoring
  • โš–๏ธ Scaling: Consider load balancing for multiple users

๐Ÿ”ง Troubleshooting

๐Ÿšจ Common Issues

1. Database Connection Issues

# Check database status
./docker/docker-setup.sh status

# View database logs
./docker/docker-setup.sh logs

# Restart database
./docker/docker-setup.sh restart

2. LLM API Issues

  • Verify GEMINI_API_KEY is set correctly (optional)
  • Check API quota and limits
  • Ensure network connectivity
  • ๐Ÿ’ก Tip: Application works with mock LLM without API key

3. Document Loading Issues

  • Verify required files exist in data/lectures/mmceasar2/
  • Check file permissions
  • Ensure file formats are supported

4. Import Errors

  • Verify Python path includes src/ directory
  • Check that all dependencies are installed
  • Ensure __init__.py files exist in all packages

๐Ÿ› Debug Mode

Enable debug logging by setting:

export LOG_LEVEL=DEBUG

๐Ÿ†˜ Getting Help

  1. Check the logs for error messages
  2. Verify configuration settings
  3. Test individual components
  4. Consult the project documentation

๐Ÿ”„ Recent Changes

๐Ÿ†• Latest Updates

  • ๐Ÿ”„ Migrated from psycopg2 to psycopg: Updated all database connections
  • ๐ŸŽฏ Disabled strict relevance checking: Chatbot now accepts all questions
  • ๐Ÿค– Added mock LLM support: Works without API key for testing
  • ๐Ÿ”ง Improved error handling: Better fallback mechanisms
  • ๐Ÿ“ Updated documentation: Comprehensive guides and examples

๐ŸŽฏ Key Improvements

  • โœ… No more "Xin lแป—i, tรดi chแป‰ hแป— trแปฃ hแปi vแป bร i giแบฃng nร y thรดi." responses
  • โœ… Works without Google API key
  • โœ… Better database compatibility
  • โœ… Enhanced user experience

๐Ÿค Contributing

๐Ÿ”„ Development Workflow

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Update documentation
  6. Submit a pull request

๐Ÿ‘€ Code Review Process

  • All code must pass tests
  • Documentation must be updated
  • Code style must follow project guidelines
  • Security considerations must be addressed

๐Ÿ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

๐Ÿ†˜ Support

For support and questions:

  • ๐Ÿ“– Check the documentation
  • ๐Ÿ”ง Review the troubleshooting section
  • ๐Ÿ› Open an issue on GitHub
  • ๐Ÿ“ง Contact the development team

๐ŸŽ‰ Happy coding with LectureHub Chatbot! ๐Ÿš€