Skip to content

likhith1253/Movie_Recommendation_System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฌ Movie Recommendation System

A sophisticated terminal-based movie recommendation system powered by ChromaDB vector database and AI embeddings. Features a modern, interactive CLI with beautiful visualizations and advanced search capabilities.

โœจ Features

๐ŸŽฏ User Features

  • Smart Recommendations: Get personalized movie recommendations based on semantic similarity
  • Advanced Search: Combine semantic search with metadata filters (year, rating, runtime)
  • Genre Exploration: Browse movies by genre with beautiful table displays
  • Director Discovery: Find all movies by a director (demonstrates join-like operations)
  • Rich UI: Beautiful terminal interface with colors, tables, and progress indicators

โš™๏ธ Admin Features

  • CRUD Operations: Full Create, Read, Update, Delete functionality for movies
  • Database Insights: View statistics and genre distribution with terminal bar charts
  • Query Benchmarking: Test and analyze query performance
  • Indexing Education: Learn about HNSW vector indexing and tuning parameters
  • Database Peek: Quick view of database records

๐Ÿ“ Project Structure

Movie Recommendation System/
โ”œโ”€โ”€ main.py                 # Entry point with interactive menus
โ”œโ”€โ”€ database.py            # ChromaDB interaction layer
โ”œโ”€โ”€ operations.py          # User-facing features
โ”œโ”€โ”€ admin.py              # Admin panel functions
โ”œโ”€โ”€ config.py             # Configuration and constants
โ”œโ”€โ”€ ingest_data.py        # Data ingestion script
โ”œโ”€โ”€ requirements.txt      # Python dependencies
โ”œโ”€โ”€ tmdb_5000_movies.csv  # Movie dataset
โ”œโ”€โ”€ tmdb_5000_credits.csv # Credits dataset
โ””โ”€โ”€ movie_db/             # ChromaDB storage (created after ingestion)

๐Ÿš€ Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Setup Steps

  1. Clone or navigate to the project directory

    cd "Movie Recommendation System"
  2. Create a virtual environment (recommended)

    python -m venv .venv
    .venv\Scripts\activate  # On Windows
    # source .venv/bin/activate  # On Linux/Mac
  3. Install dependencies

    pip install -r requirements.txt
  4. Ingest the movie data (first time only)

    python ingest_data.py

    This will:

    • Load movie and credits data from CSV files
    • Extract director information from crew data
    • Create rich text documents for embedding
    • Store everything in ChromaDB
    • Takes a few minutes depending on your system

    If you already have a database without director info:

    python fix_director_metadata.py

    This updates existing records with director information.

  5. Run the application

    python main.py
  6. Try sample inputs (see QUICK_START.md and SAMPLE_INPUTS.md)

    • Quick examples: QUICK_START.md
    • Comprehensive guide: SAMPLE_INPUTS.md

๐Ÿ“– Key Documentation to Read

  1. PROJECT_STATUS.md - Complete project overview and status
  2. QUICK_START.md - Quick reference guide
  3. SAMPLE_INPUTS.md - Best inputs to try
  4. IMPORTANT_NOTES.md - Understanding recommendation behavior
  5. INTERACTIVE_INDEXING_GUIDE.md - Advanced indexing benchmark feature (NEW)

๐ŸŽฎ Usage Guide

Main Menu Options

  1. ๐ŸŽฏ Get Movie Recommendations

    • Enter a movie title you like
    • Specify how many recommendations you want
    • View similar movies with similarity scores
  2. ๐Ÿ” Advanced Search

    • Enter a semantic search query (e.g., "a thriller about artificial intelligence")
    • Optionally add filters:
      • Release year (after/before a specific year)
      • Minimum rating (0-10)
      • Maximum runtime (in minutes)
    • Combines vector search with metadata filtering
  3. ๐ŸŽญ Find Movies by Genre

    • Select or type a genre
    • Browse all movies in that genre
    • Autocomplete suggestions for common genres
  4. ๐ŸŽฌ Explore by Director

    • Enter a movie title
    • System finds the director
    • Shows all other movies by that director
    • Demonstrates "join-like" operations in vector DB
  5. ๐Ÿ“‹ List Sample Movies

    • View a sample of movies from the database
    • Specify how many to display
  6. โš™๏ธ Admin Panel

    • Password protected (default: admin123)
    • Access advanced database operations

Admin Panel Options

  • โž• Create Movie: Add a new movie with all details
  • ๐Ÿ“– Read Movie by ID: View full details of a specific movie
  • โœ๏ธ Update Movie: Modify existing movie information
  • ๐Ÿ—‘๏ธ Delete Movie: Remove a movie from the database
  • ๐Ÿ‘€ Peek at Database: Quick view of first N records
  • ๐Ÿ“Š Database Insights:
    • Total movie count
    • Top 10 genres with bar chart visualization
  • โšก Benchmark Queries:
    • Run performance tests
    • View average, min, max query times
    • Learn about HNSW indexing and tuning
  • ๐Ÿ”ฌ Interactive Indexing Benchmark (NEW):
    • Dynamically test different distance functions (cosine, L2, IP)
    • Recreate collection with new settings
    • Real-time progress tracking during re-ingestion
    • Comprehensive performance benchmarking
    • Compare results across different configurations

๐Ÿ”ง Configuration

Edit config.py to customize:

# Database settings
DB_PATH = "movie_db"
COLLECTION_NAME = "movies"

# Admin password
ADMIN_PASSWORD = "admin123"  # Change this!

# Search settings
DEFAULT_RESULTS = 6
MAX_SEARCH_RESULTS = 50

# Display settings
TOP_GENRES_COUNT = 10
BENCHMARK_ITERATIONS = 10

๐ŸŽจ Technical Highlights

Vector Database Features

  1. Semantic Search: Uses sentence transformers to create embeddings
  2. Metadata Filtering: Combines vector search with structured queries
  3. HNSW Indexing: Fast approximate nearest neighbor search
  4. Batch Processing: Efficient data ingestion

Advanced ChromaDB Operations

  • Complex Filters: $and, $or, $gte, $lte operators
  • Hybrid Search: Semantic + metadata filtering in single query
  • Join Simulation: Multi-step queries to simulate relational joins
  • Performance Monitoring: Built-in benchmarking tools

UI/UX Features

  • Rich Tables: Beautiful formatted tables with colors
  • Interactive Menus: Questionary-powered selection menus
  • Progress Indicators: Visual feedback for long operations
  • Styled Panels: Information displayed in bordered panels
  • Color Coding: Consistent color scheme throughout

๐Ÿ“Š Data Schema

Each movie in the database contains:

{
    "id": "unique_id",
    "metadata": {
        "title": "Movie Title",
        "overview": "Plot summary",
        "genres": "Action, Thriller",
        "release_date": "2024-01-01",
        "vote_average": 7.5,
        "runtime": 120,
        "director": "Director Name"
    },
    "document": "Rich text for embedding..."
}

๐Ÿ” Example Queries

Semantic Search

"a movie about time travel and paradoxes"
"romantic comedy set in New York"
"dark psychological thriller"

Advanced Search with Filters

Query: "space exploration"
Filters:
  - After 2010
  - Rating > 7.0
  - Runtime < 150 minutes

Best Sample Inputs for Testing

Recommendations:

  • Inception - Mind-bending sci-fi thriller
  • The Dark Knight - Superhero/crime film
  • Pulp Fiction - Tarantino classic
  • The Matrix - Sci-fi action

Director Exploration:

  • Inception โ†’ Christopher Nolan filmography
  • Pulp Fiction โ†’ Quentin Tarantino films
  • Avatar โ†’ James Cameron movies
  • Jurassic Park โ†’ Steven Spielberg films

See SAMPLE_INPUTS.md for comprehensive examples showcasing ChromaDB's full capabilities.

๐Ÿ› ๏ธ Development

Module Responsibilities

  • config.py: Centralized configuration
  • database.py: Low-level ChromaDB operations
  • operations.py: User-facing features with rich UI
  • admin.py: Admin operations and analytics
  • main.py: Application flow and menu system
  • ingest_data.py: Data loading and preprocessing

Adding New Features

  1. Add database functions to database.py
  2. Create user-facing wrappers in operations.py or admin.py
  3. Add menu options in main.py
  4. Update configuration in config.py if needed

๐Ÿ“ Notes

  • First Run: Must run ingest_data.py before using the application
  • Database Location: ChromaDB stores data in ./movie_db/ directory
  • Re-ingestion: Delete movie_db/ folder to re-ingest data
  • Performance: Query speed depends on dataset size and system resources
  • Director Data: Extracted from crew information during ingestion

๐Ÿ” Security

  • Change the default admin password in config.py
  • Admin panel is password protected
  • Destructive operations require confirmation

๐Ÿ› Troubleshooting

"Collection not found" error

  • Run python ingest_data.py to create the database

Slow queries

  • Normal for first query (model loading)
  • Subsequent queries should be faster
  • Check benchmark results in admin panel

Import errors

  • Ensure all dependencies are installed: pip install -r requirements.txt
  • Activate virtual environment if using one

CSV file not found

  • Ensure tmdb_5000_movies.csv and tmdb_5000_credits.csv are in the project directory

๐Ÿ“š Technologies Used

  • ChromaDB: Vector database for embeddings
  • Sentence Transformers: Text embedding models
  • Rich: Terminal formatting and tables
  • Questionary: Interactive prompts
  • Pandas: Data processing
  • Python 3.8+: Core language

๐ŸŽ“ Learning Resources

The application includes educational content about:

  • Vector database indexing (HNSW algorithm)
  • Semantic search vs keyword search
  • Metadata filtering in vector databases
  • Query performance optimization
  • Embedding-based recommendations

Access this through the Admin Panel โ†’ Benchmark Queries option.

๐Ÿ“„ License

This project uses the TMDB 5000 Movie Dataset. Please refer to the dataset's license for usage terms.

๐Ÿค Contributing

Feel free to enhance this project by:

  • Adding new search features
  • Improving the UI
  • Optimizing query performance
  • Adding more analytics
  • Expanding the dataset

Enjoy exploring movies with AI-powered recommendations! ๐ŸŽฌ๐Ÿฟ

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages