A comprehensive guide to the LectureHub Chatbot, a Streamlit-based RAG (Retrieval-Augmented Generation) chatbot for educational content.
- Overview
- Architecture
- Installation
- Configuration
- Usage
- Development
- Testing
- Deployment
- Troubleshooting
The LectureHub Chatbot is a sophisticated RAG application designed to help students and educators interact with educational content. It combines:
- ๐ Retrieval-Augmented Generation (RAG) for accurate, context-aware responses
- ๐๏ธ PostgreSQL with pgvector for efficient vector storage and similarity search
- ๐ง Google Gemini LLM for natural language processing (with mock fallback)
- ๐ Streamlit for an intuitive web interface
- ๐๏ธ Modular architecture for maintainability and extensibility
- ๐ Multi-format Document Support: PDF, Markdown, and Python code
- ๐ฏ Smart Question Handling: Accepts all questions without strict relevance filtering
- โก Vector Similarity Search: Fast and accurate document retrieval
- ๐ณ Docker Integration: Easy setup with containerized database
- ๐ง Modular Design: Clean, maintainable codebase
- ๐ Mock LLM Support: Works without API key for testing
lecturehub-chatbot/
โโโ src/ # Main source code
โ โโโ core/ # Core application components
โ โ โโโ config.py # Configuration constants
โ โ โโโ main.py # Main entry point
โ โ โโโ chatbot_logic.py # Core chatbot logic
โ โ โโโ rag_chatbot.py # Main orchestrator
โ โโโ database/ # Database operations
โ โ โโโ database.py # PostgreSQL and pgvector
โ โ โโโ vectorstore.py # Vector store management
โ โโโ llm/ # LLM and language processing
โ โ โโโ llm_chain.py # LLM chain management
โ โ โโโ mock_llm_chain.py # Mock LLM for testing
โ โ โโโ keyword_extractor.py # Keyword extraction
โ โโโ ui/ # User interface
โ โ โโโ chat_manager.py # Chat management
โ โ โโโ ui_components.py # UI components
โ โโโ utils/ # Utilities
โ โโโ document_loader.py # Document processing
โโโ tests/ # Test files
โโโ docs/ # Documentation
โโโ config/ # Configuration files
โโโ docker/ # Docker setup
โโโ data/ # Data files
The main orchestrator that coordinates all components:
- ๐ฎ Manages application lifecycle
- ๐ Coordinates database, LLM, and UI components
- ๐ก๏ธ Handles error scenarios gracefully
- ๐๏ธ DatabaseManager: PostgreSQL connection and pgvector operations
- ๐ VectorStoreManager: Vector store operations and embedding management
- ๐ง LLMChainBuilder: Creates and manages QA chains
- ๐ KeywordExtractor: Extracts keywords for relevance checking
- ๐ฏ MockLLMChainBuilder: Provides mock responses for testing
- โ SmartRelevanceChecker: Determines if questions are relevant (currently disabled)
- ๐ฌ ChatManager: Manages chat history and interactions
- โ๏ธ SidebarManager: Handles configuration UI
- ๐ฅ๏ธ MainUIManager: Manages main UI components
- ๐ DocumentLoader: Loads and processes documents
- โ๏ธ DocumentProcessor: Handles text chunking and metadata
- Python 3.8 or higher
- Docker and Docker Compose (for database)
- Google Gemini API key (optional - mock LLM available)
-
๐ฅ Clone the repository:
git clone <repository-url> cd lecturehub-chatbot
-
๐ฆ Install Python dependencies:
pip install -r requirements.txt
-
๐ณ Start the database:
./docker/docker-setup.sh start
-
โ๏ธ Configure environment (optional):
cp env.template .env # Edit .env with your GEMINI_API_KEY (optional) -
๐งช Test the setup:
chmod +x ./docker/docker-setup.sh ./docker/docker-setup.sh test
The application uses environment variables for configuration. Key variables include:
# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=embedding
DB_USER=root
DB_PASSWORD=root_password
# LLM Configuration (Optional)
GEMINI_API_KEY=your_api_key_here
GEMINI_MODEL=gemini-2.5-flash-lite
# Application Configuration
PROBLEM_ID=ma_de_001config/env.template: Template for environment variablesconfig/example.env: Example configurationdocker/docker.env: Docker-specific configuration
The application supports two database configuration methods:
-
๐ง Individual Parameters:
db = DatabaseManager( host="localhost", port="5432", database="embedding", user="root", password="root_password" )
-
๐ Connection String:
db = DatabaseManager( connection_string="postgresql+psycopg://user:pass@host:5432/db" )
# Development mode
python main.py
# Production mode
streamlit run main.py- ๐ Start the application and navigate to the web interface
- โ๏ธ Configure settings in the sidebar:
- Database connection parameters
- Google Gemini API key (optional)
- Application settings
- ๐ฅ Ingest documents by clicking "Ingest dแปฏ liแปu"
- โ Ask questions about your educational content
Place these files in the data/lectures/mmceasar2/ directory:
mmceasar2.pdf- Problem statementmmceasar2.md- Lecture contentmmceasar2.py- Solution code
The chatbot provides:
- โก Real-time responses based on your documents
- ๐ Source citations showing which documents were used
- ๐ฏ Flexible question handling - accepts all questions
- ๐พ Chat history for continued conversations
-
๐ฆ Install in development mode:
pip install -e . -
๐ Set up pre-commit hooks:
pre-commit install
Extend the DocumentLoader class in src/utils/document_loader.py:
def _load_new_format(self, file_path: str) -> List[Any]:
# Implementation for new document type
passAdd components to src/ui/ui_components.py:
class NewUIManager:
def render_new_component(self):
# Implementation
passExtend LLMChainBuilder in src/llm/llm_chain.py:
def build_new_chain(self, retriever) -> NewChain:
# Implementation for new chain type
pass- Follow PEP 8 guidelines
- Use type hints for all function parameters and return values
- Write docstrings for all public functions and classes
- Keep functions small and focused
# Run all tests
python -m pytest tests/
# Run specific test file
python tests/test_components.py
# Run with coverage
python -m pytest tests/ --cov=srctests/test_components.py: Tests for individual componentstests/test_database_config.py: Database configuration teststests/test_docker_db.py: Docker database integration teststests/test_relevance.py: Relevance checker tests
def test_new_feature():
"""Test description."""
# Arrange
component = Component()
# Act
result = component.method()
# Assert
assert result == expected_value-
๐จ Build the application:
docker build -t lecturehub-chatbot . -
๐ Run with Docker Compose:
docker-compose -f docker/docker-compose.yml up -d
- ๐ Environment Variables: Use production environment variables
- ๐๏ธ Database: Use production PostgreSQL instance
- ๐ก๏ธ Security: Implement proper authentication and authorization
- ๐ Monitoring: Add logging and monitoring
- โ๏ธ Scaling: Consider load balancing for multiple users
# Check database status
./docker/docker-setup.sh status
# View database logs
./docker/docker-setup.sh logs
# Restart database
./docker/docker-setup.sh restart- Verify
GEMINI_API_KEYis set correctly (optional) - Check API quota and limits
- Ensure network connectivity
- ๐ก Tip: Application works with mock LLM without API key
- Verify required files exist in
data/lectures/mmceasar2/ - Check file permissions
- Ensure file formats are supported
- Verify Python path includes
src/directory - Check that all dependencies are installed
- Ensure
__init__.pyfiles exist in all packages
Enable debug logging by setting:
export LOG_LEVEL=DEBUG- Check the logs for error messages
- Verify configuration settings
- Test individual components
- Consult the project documentation
- ๐ Migrated from psycopg2 to psycopg: Updated all database connections
- ๐ฏ Disabled strict relevance checking: Chatbot now accepts all questions
- ๐ค Added mock LLM support: Works without API key for testing
- ๐ง Improved error handling: Better fallback mechanisms
- ๐ Updated documentation: Comprehensive guides and examples
- โ No more "Xin lแปi, tรดi chแป hแป trแปฃ hแปi vแป bร i giแบฃng nร y thรดi." responses
- โ Works without Google API key
- โ Better database compatibility
- โ Enhanced user experience
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Update documentation
- Submit a pull request
- All code must pass tests
- Documentation must be updated
- Code style must follow project guidelines
- Security considerations must be addressed
This project is licensed under the MIT License. See the LICENSE file for details.
For support and questions:
- ๐ Check the documentation
- ๐ง Review the troubleshooting section
- ๐ Open an issue on GitHub
- ๐ง Contact the development team
๐ Happy coding with LectureHub Chatbot! ๐