A sophisticated terminal-based movie recommendation system powered by ChromaDB vector database and AI embeddings. Features a modern, interactive CLI with beautiful visualizations and advanced search capabilities.
- Smart Recommendations: Get personalized movie recommendations based on semantic similarity
- Advanced Search: Combine semantic search with metadata filters (year, rating, runtime)
- Genre Exploration: Browse movies by genre with beautiful table displays
- Director Discovery: Find all movies by a director (demonstrates join-like operations)
- Rich UI: Beautiful terminal interface with colors, tables, and progress indicators
- CRUD Operations: Full Create, Read, Update, Delete functionality for movies
- Database Insights: View statistics and genre distribution with terminal bar charts
- Query Benchmarking: Test and analyze query performance
- Indexing Education: Learn about HNSW vector indexing and tuning parameters
- Database Peek: Quick view of database records
Movie Recommendation System/
โโโ main.py # Entry point with interactive menus
โโโ database.py # ChromaDB interaction layer
โโโ operations.py # User-facing features
โโโ admin.py # Admin panel functions
โโโ config.py # Configuration and constants
โโโ ingest_data.py # Data ingestion script
โโโ requirements.txt # Python dependencies
โโโ tmdb_5000_movies.csv # Movie dataset
โโโ tmdb_5000_credits.csv # Credits dataset
โโโ movie_db/ # ChromaDB storage (created after ingestion)
- Python 3.8 or higher
- pip package manager
-
Clone or navigate to the project directory
cd "Movie Recommendation System"
-
Create a virtual environment (recommended)
python -m venv .venv .venv\Scripts\activate # On Windows # source .venv/bin/activate # On Linux/Mac
-
Install dependencies
pip install -r requirements.txt
-
Ingest the movie data (first time only)
python ingest_data.py
This will:
- Load movie and credits data from CSV files
- Extract director information from crew data
- Create rich text documents for embedding
- Store everything in ChromaDB
- Takes a few minutes depending on your system
If you already have a database without director info:
python fix_director_metadata.py
This updates existing records with director information.
-
Run the application
python main.py
-
Try sample inputs (see
QUICK_START.mdandSAMPLE_INPUTS.md)- Quick examples:
QUICK_START.md - Comprehensive guide:
SAMPLE_INPUTS.md
- Quick examples:
- PROJECT_STATUS.md - Complete project overview and status
- QUICK_START.md - Quick reference guide
- SAMPLE_INPUTS.md - Best inputs to try
- IMPORTANT_NOTES.md - Understanding recommendation behavior
- INTERACTIVE_INDEXING_GUIDE.md - Advanced indexing benchmark feature (NEW)
-
๐ฏ Get Movie Recommendations
- Enter a movie title you like
- Specify how many recommendations you want
- View similar movies with similarity scores
-
๐ Advanced Search
- Enter a semantic search query (e.g., "a thriller about artificial intelligence")
- Optionally add filters:
- Release year (after/before a specific year)
- Minimum rating (0-10)
- Maximum runtime (in minutes)
- Combines vector search with metadata filtering
-
๐ญ Find Movies by Genre
- Select or type a genre
- Browse all movies in that genre
- Autocomplete suggestions for common genres
-
๐ฌ Explore by Director
- Enter a movie title
- System finds the director
- Shows all other movies by that director
- Demonstrates "join-like" operations in vector DB
-
๐ List Sample Movies
- View a sample of movies from the database
- Specify how many to display
-
โ๏ธ Admin Panel
- Password protected (default:
admin123) - Access advanced database operations
- Password protected (default:
- โ Create Movie: Add a new movie with all details
- ๐ Read Movie by ID: View full details of a specific movie
- โ๏ธ Update Movie: Modify existing movie information
- ๐๏ธ Delete Movie: Remove a movie from the database
- ๐ Peek at Database: Quick view of first N records
- ๐ Database Insights:
- Total movie count
- Top 10 genres with bar chart visualization
- โก Benchmark Queries:
- Run performance tests
- View average, min, max query times
- Learn about HNSW indexing and tuning
- ๐ฌ Interactive Indexing Benchmark (NEW):
- Dynamically test different distance functions (cosine, L2, IP)
- Recreate collection with new settings
- Real-time progress tracking during re-ingestion
- Comprehensive performance benchmarking
- Compare results across different configurations
Edit config.py to customize:
# Database settings
DB_PATH = "movie_db"
COLLECTION_NAME = "movies"
# Admin password
ADMIN_PASSWORD = "admin123" # Change this!
# Search settings
DEFAULT_RESULTS = 6
MAX_SEARCH_RESULTS = 50
# Display settings
TOP_GENRES_COUNT = 10
BENCHMARK_ITERATIONS = 10- Semantic Search: Uses sentence transformers to create embeddings
- Metadata Filtering: Combines vector search with structured queries
- HNSW Indexing: Fast approximate nearest neighbor search
- Batch Processing: Efficient data ingestion
- Complex Filters:
$and,$or,$gte,$lteoperators - Hybrid Search: Semantic + metadata filtering in single query
- Join Simulation: Multi-step queries to simulate relational joins
- Performance Monitoring: Built-in benchmarking tools
- Rich Tables: Beautiful formatted tables with colors
- Interactive Menus: Questionary-powered selection menus
- Progress Indicators: Visual feedback for long operations
- Styled Panels: Information displayed in bordered panels
- Color Coding: Consistent color scheme throughout
Each movie in the database contains:
{
"id": "unique_id",
"metadata": {
"title": "Movie Title",
"overview": "Plot summary",
"genres": "Action, Thriller",
"release_date": "2024-01-01",
"vote_average": 7.5,
"runtime": 120,
"director": "Director Name"
},
"document": "Rich text for embedding..."
}"a movie about time travel and paradoxes"
"romantic comedy set in New York"
"dark psychological thriller"
Query: "space exploration"
Filters:
- After 2010
- Rating > 7.0
- Runtime < 150 minutes
Recommendations:
Inception- Mind-bending sci-fi thrillerThe Dark Knight- Superhero/crime filmPulp Fiction- Tarantino classicThe Matrix- Sci-fi action
Director Exploration:
Inceptionโ Christopher Nolan filmographyPulp Fictionโ Quentin Tarantino filmsAvatarโ James Cameron moviesJurassic Parkโ Steven Spielberg films
See SAMPLE_INPUTS.md for comprehensive examples showcasing ChromaDB's full capabilities.
config.py: Centralized configurationdatabase.py: Low-level ChromaDB operationsoperations.py: User-facing features with rich UIadmin.py: Admin operations and analyticsmain.py: Application flow and menu systemingest_data.py: Data loading and preprocessing
- Add database functions to
database.py - Create user-facing wrappers in
operations.pyoradmin.py - Add menu options in
main.py - Update configuration in
config.pyif needed
- First Run: Must run
ingest_data.pybefore using the application - Database Location: ChromaDB stores data in
./movie_db/directory - Re-ingestion: Delete
movie_db/folder to re-ingest data - Performance: Query speed depends on dataset size and system resources
- Director Data: Extracted from crew information during ingestion
- Change the default admin password in
config.py - Admin panel is password protected
- Destructive operations require confirmation
"Collection not found" error
- Run
python ingest_data.pyto create the database
Slow queries
- Normal for first query (model loading)
- Subsequent queries should be faster
- Check benchmark results in admin panel
Import errors
- Ensure all dependencies are installed:
pip install -r requirements.txt - Activate virtual environment if using one
CSV file not found
- Ensure
tmdb_5000_movies.csvandtmdb_5000_credits.csvare in the project directory
- ChromaDB: Vector database for embeddings
- Sentence Transformers: Text embedding models
- Rich: Terminal formatting and tables
- Questionary: Interactive prompts
- Pandas: Data processing
- Python 3.8+: Core language
The application includes educational content about:
- Vector database indexing (HNSW algorithm)
- Semantic search vs keyword search
- Metadata filtering in vector databases
- Query performance optimization
- Embedding-based recommendations
Access this through the Admin Panel โ Benchmark Queries option.
This project uses the TMDB 5000 Movie Dataset. Please refer to the dataset's license for usage terms.
Feel free to enhance this project by:
- Adding new search features
- Improving the UI
- Optimizing query performance
- Adding more analytics
- Expanding the dataset
Enjoy exploring movies with AI-powered recommendations! ๐ฌ๐ฟ