chatbot_project/ │ ├── app/ │ ├── main.py │ ├── pages/ │ │ ├── chat.py │ │ ├── document_upload.py │ │ └── search.py │ └── components/ │ ├── chatbot.py │ └── document_viewer.py │ ├── backend/ │ ├── database/ │ │ ├── mongodb_client.py │ │ └── models.py │ ├── document_processing/ │ │ ├── pdf_processor.py │ │ ├── jsonl_processor.py │ │ └── batch_processor.py │ ├── search/ │ │ ├── semantic_search.py │ │ └── filtering.py │ ├── ai_models/ │ │ ├── response_generator.py │ │ ├── function_caller.py │ │ └── model_loader.py │ └── utils/ │ ├── text_splitter.py │ └── metadata_extractor.py │ ├── config/ │ ├── app_config.py │ └── logging_config.py │ ├── tests/ │ ├── test_document_processing.py │ ├── test_search.py │ ├── test_database.py │ └── test_ai_models.py │ ├── requirements.txt └── README.md
- Set up the main Streamlit application
- Configure page layout and navigation
- Import and render other pages/components
- Implement the chat interface
- Handle user input and display bot responses
- Integrate with backend chatbot component
- Create file upload interface for PDFs and JSONL files
- Trigger document processing on upload
- Display upload status and processing results
- Implement search interface with filters
- Connect to backend search functionality
- Display search results
- Create a reusable Streamlit component for the chatbot interface
- Handle chat history and message display
- Implement a component to view processed documents
- Display document metadata and content snippets
- Set up MongoDB connection
- Implement CRUD operations for documents and metadata
- Define data models for documents, metadata, and chat history
- Implement PDF parsing and text extraction
- Handle PDF-specific metadata extraction
- Implement JSONL parsing and data extraction
- Handle JSONL-specific metadata extraction
- Implement batch processing logic for large documents
- Manage processing queue and status updates
- Implement semantic search functionality
- Integrate with a vector database or similarity search library
- Implement metadata-based filtering
- Combine filtering with semantic search results
- Implement text splitting algorithms
- Handle different splitting strategies (by sentence, paragraph, etc.)
- Implement generic metadata extraction
- Handle different metadata fields and formats
- Implement interface with language model API or local model
- Handle context management for coherent conversations
- Generate responses based on user input and relevant document content
- Define a set of functions that the AI can call
- Implement logic for parsing AI function calls and executing appropriate actions
- Handle error cases and unexpected inputs in function calls
- Manage initialization and loading of AI models
- Implement model switching logic if multiple models are used
- Handle model-specific configurations and optimizations
- Store application-wide configuration
- Include settings for Streamlit, MongoDB, and processing options
- Configure logging for the application
- Set up different log levels and outputs
- Unit tests for PDF and JSONL processors
- Test batch processing functionality
- Unit tests for semantic search and filtering
- Test search result accuracy and performance
- Unit tests for MongoDB operations
- Test data model integrity and CRUD operations
- List all Python package dependencies with versions
- Provide project overview and setup instructions
- Include usage guidelines and contribution information