Discover movies that match your mood, not just your keyword.
FilmFind is an AI-powered movie and TV series recommendation system that understands natural language queries, interprets emotional intent, and delivers personalized recommendations using semantic search, hybrid embeddings, and LLM-powered re-ranking.
- Overview
- Key Features
- What Makes FilmFind Unique
- Architecture
- System Diagrams
- Tech Stack
- Getting Started
- Project Structure
- API Documentation
- Contributing
- License
FilmFind goes beyond traditional movie recommendation systems by:
- Understanding Natural Language: Ask in plain English like "dark sci-fi movies like Interstellar with less romance"
- Semantic Search: Uses vector embeddings to understand themes, tones, and emotions
- Hybrid Intelligence: Combines semantic similarity, metadata filtering, and LLM reasoning
- Explainable AI: Each recommendation comes with reasoning and match scores
- Multi-Signal Scoring: Balances semantic similarity, popularity, ratings, and recency
"Shows like Stranger Things but with more horror elements"
"Movies about F1 racing with intense competition and personal rivalries"
"Mystery and magical adventures like Harry Potter with school settings"
"Lighthearted sitcoms like Friends about group of friends navigating life and relationships"
"Series like Dark"
- Natural language processing (NLP) to extract intent, themes, and constraints
- Emotion-aware classification across 8 emotional dimensions
- Reference title detection and similarity matching
- Multi-language support (English, Hindi, Korean, Telugu, etc.)
- Semantic Vector Search: FAISS-powered similarity search
- Hybrid Embeddings: Combines plot, themes, genres, cast, and emotional vectors
- Multi-Signal Ranking: Balances semantic similarity, popularity, ratings, and metadata
- Smart Filtering: Year range, language, genre, streaming services, runtime
- Uses Groq API (Llama 3.1 70B) or Ollama for intelligent re-ranking
- Contextual understanding of nuanced queries
- Generates human-readable explanations for each recommendation
- Cost-optimized with caching and free tier APIs
- 10,000+ movies and TV shows from TMDB
- Cast, crew, keywords, genres, ratings, popularity
- Streaming availability (Netflix, Prime, Disney+, etc.)
- Posters, backdrops, trailers
| Feature | Letterboxd | FilmCrave | MOVIERECS.AI | FilmFind |
|---|---|---|---|---|
| Natural Language Queries | ❌ Basic search | ❌ No | ✅ Deep intent extraction | |
| Semantic Vector Search | ❌ | ❌ | ❌ | ✅ FAISS-powered |
| Emotion-Aware Matching | ❌ | ❌ | ✅ 8-dimensional emotion vectors | |
| LLM Re-Ranking | ❌ | ❌ | ❌ | ✅ RAG with Llama 3.1 |
| Complex Multi-Condition Queries | ❌ | ❌ | ❌ | ✅ Fully supported |
| Explainable Recommendations | ❌ | ❌ | ❌ | ✅ XAI with reasoning |
| Multi-Language Support | ❌ | ❌ | ❌ | ✅ 10+ languages |
| Streaming Provider Filters | ❌ | ❌ | ✅ Full integration | |
| Cost | Social only | Paid | Limited free | ✅ 100% Free Tier |
- Emotion-Aware Engine: Scores movies across 8 emotional dimensions (Joy, Fear, Sadness, Awe, Thrill, Hope, Dark tone, Romance)
- Hybrid Vector Embeddings: Combines semantic, emotional, genre, and cast vectors into a unified representation
- LLM Query Rewrite: Transforms queries into optimized search vectors with theme extraction
- Multi-Agent System: Specialized agents for intent, emotion, filtering, retrieval, and re-ranking
- Explainable AI: Every recommendation includes thematic similarity %, emotional match %, and reasoning
FilmFind uses a multi-layered AI pipeline with the following components:
User Query → NLP Understanding → Semantic Retrieval → Multi-Signal Scoring → LLM Re-Ranking → Explainable Output
-
Data Pipeline
- TMDB API integration for movie metadata
- Embedding generation using sentence-transformers
- Vector database (FAISS) for similarity search
- PostgreSQL/Supabase for metadata storage
-
Intelligence Layer
- NLP Engine: Query parsing and intent extraction
- Embedding Service: Semantic vector generation
- Vector Search: FAISS similarity retrieval
- Scoring Engine: Multi-signal ranking
- LLM Re-Ranker: Contextual re-ranking with Groq/Ollama
-
API & Backend
- FastAPI REST API
- Redis caching (Upstash)
- Background jobs for data updates
- Rate limiting and monitoring
-
Frontend
- Next.js 14+ with App Router
- TailwindCSS + ShadCN UI
- Real-time search with debouncing
- Responsive design
-
Infrastructure
- AWS ECS (Docker) for backend
- Vercel for frontend
- AWS RDS PostgreSQL
- AWS S3 + CloudFront
- Upstash Redis
Complete end-to-end architecture showing all components and data flow.
High-level flow of data through the system from query to recommendations.
Step-by-step processing pipeline with all validation and filtering stages.
Interaction sequence between all system components during a search request.
- Python 3.11+: Core language
- FastAPI: High-performance async API framework
- SQLAlchemy: ORM for database operations
- PostgreSQL: Primary database (AWS RDS or Supabase)
- FAISS: Vector similarity search (Facebook AI)
- Redis: Caching layer (Upstash free tier)
- APScheduler: Background job scheduling
- sentence-transformers/all-mpnet-base-v2: Semantic embeddings (768-dim)
- Groq API: LLM for query understanding and re-ranking (free tier: 30 req/min)
- Ollama: Local LLM alternative (Llama 3.2, unlimited)
- spaCy: NLP for text processing and entity extraction
- Next.js 14+: React framework with App Router
- TypeScript: Type-safe JavaScript
- TailwindCSS: Utility-first CSS framework
- ShadCN UI: Beautiful accessible components
- Zustand: Lightweight state management
- Docker: Containerization
- GitHub Actions: CI/CD pipeline (2,000 min/month free)
- AWS ECS: Container orchestration (750 hours/month free for 12 months)
- Vercel: Frontend hosting (free forever for hobby projects)
- AWS RDS: Database (t3.micro, 750 hours/month free for 12 months)
- AWS S3: Object storage (5GB free)
- AWS CloudFront: CDN (1TB transfer/month free)
- Sentry: Error monitoring (5k events/month free)
- TMDB API: Movie metadata, cast, crew, keywords (free tier)
- IMDb Datasets: Additional ratings and metadata (free on Kaggle)
Following Single Responsibility Principle, we've separated the TMDB service into three focused modules:
from app.services.TMDB import TMDBAPIClient
client = TMDBAPIClient(api_key="your_key")
movie = client.get_movie(movie_id=550)
popular = client.get_popular_movies(page=1)- ✅ Handles all TMDB API requests
- ✅ Built-in rate limiting (40 requests/10s)
- ✅ Automatic error handling
- ✅ Uses HTTPClient utility for retry logic
from app.services.TMDB import TMDBDataValidator
validator = TMDBDataValidator()
is_valid = validator.validate_movie(raw_data)
cleaned = validator.clean_movie_data(raw_data)- ✅ Validates required fields
- ✅ Normalizes data structure
- ✅ Handles missing/invalid dates
- ✅ Extracts genres, cast, keywords
from app.services.TMDB import TMDBService
with TMDBService() as service:
movie = service.fetch_movie(550) # Fetch + validate + clean
popular = service.fetch_popular_movies() # Batch fetch with validation
genres = service.get_all_genres()- ✅ Simple interface to complex operations
- ✅ Automatic validation and cleaning
- ✅ Context manager for resource cleanup
- ✅ Batch operations with pagination
We've built a comprehensive utilities module following SOLID principles:
from app.utils import HTTPClient
client = HTTPClient(base_url="https://api.example.com", timeout=30)
data = client.get_json("/endpoint", params={"key": "value"})- ✅ Automatic retry with exponential backoff
- ✅ Built-in logging and error handling
- ✅ Context manager support
- ✅ Reusable across all services
from app.utils import RateLimiter
limiter = RateLimiter(max_requests=30, time_window=60)
limiter.check_and_wait() # Automatically waits if limit exceeded- ✅ Sliding window algorithm
- ✅ Configurable limits
- ✅ Thread-safe for single-threaded apps
from app.utils import setup_logger, get_logger
setup_logger("logs/app.log", "INFO")
logger = get_logger(__name__)
logger.info("Application started")- ✅ Console + file logging
- ✅ Log rotation (10 MB)
- ✅ Color-coded output
from app.utils import retry_with_backoff
@retry_with_backoff(max_retries=3, initial_delay=1.0)
def fetch_data():
return api.get("/data")- ✅ Configurable retries
- ✅ Exponential backoff
- ✅ Custom exception handling
Centralized constants for better maintainability:
- 🔗 API URLs (TMDB, Groq, Ollama)
- 🎯 LLM models and configurations
- 📊 Scoring weights and dimensions
- 🗄️ Cache TTLs and key prefixes
- 🌍 Supported languages and genres
- ⚙️ All magic numbers and strings in one place
- ✅ Single Responsibility Principle - Each class has one clear purpose
- ✅ Dependency Injection - Services don't create their dependencies
- ✅ Facade Pattern - Simple interfaces to complex subsystems
- ✅ Strategy Pattern - Multiple ingestion strategies
- ✅ Decorator Pattern - Retry logic via decorators
- ✅ Context Manager - Proper resource cleanup
- Python 3.11 or higher
- Node.js 18+ and npm/yarn
- PostgreSQL 14+ (or Supabase account)
- Redis (local or Upstash account)
- TMDB API key (free)
- Groq API key (free tier)
git clone https://github.com/yourusername/filmfind.git
cd filmfindcd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Create .env file
cat > .env << EOF
TMDB_API_KEY=your_tmdb_key
GROQ_API_KEY=your_groq_key
DATABASE_URL=postgresql://user:password@localhost:5432/filmfind
REDIS_URL=redis://localhost:6379
VECTOR_MODEL=sentence-transformers/all-mpnet-base-v2
LLM_PROVIDER=groq
EOFcd ../frontend
# Install dependencies
npm install
# Create .env.local file
cat > .env.local << EOF
NEXT_PUBLIC_API_URL=http://localhost:8000
EOFcd ../backend
# Run migrations
alembic upgrade head
# Optional: Seed with sample data
python scripts/seed_data.py# Fetch movies from TMDB
python scripts/ingest_tmdb.py --limit 10000
# Generate embeddings
python scripts/generate_embeddings.py
# Build vector index
python scripts/build_index.py# Terminal 1: Start backend
cd backend
uvicorn app.main:app --reload --port 8000
# Terminal 2: Start frontend
cd frontend
npm run devOpen http://localhost:3000 in your browser.
filmfind/
├── backend/
│ ├── app/
│ │ ├── api/
│ │ │ ├── routes/
│ │ │ │ ├── search.py # Search endpoints
│ │ │ │ ├── movies.py # Movie detail endpoints
│ │ │ │ └── filters.py # Filter endpoints
│ │ │ └── dependencies.py # Dependency injection
│ │ ├── core/
│ │ │ ├── config.py # Environment settings (Pydantic)
│ │ │ ├── constants.py # Application constants ✨
│ │ │ ├── database.py # Database connection
│ │ │ └── cache.py # Redis cache wrapper
│ │ ├── models/
│ │ │ ├── movie.py # Movie ORM models (SQLAlchemy)
│ │ │ └── user.py # User models (optional)
│ │ ├── services/
│ │ │ ├── TMDB/ # TMDB Service Module ✅ Module 1.1
│ │ │ │ ├── __init__.py # Module exports
│ │ │ │ ├── tmdb_client.py # API HTTP client (SRP)
│ │ │ │ ├── tmdb_validator.py # Data validation & cleaning (SRP)
│ │ │ │ └── tmdb_service.py # High-level facade
│ │ │ ├── embedding_service.py # Embedding generation
│ │ │ ├── vector_search.py # FAISS vector search
│ │ │ ├── query_parser.py # NLP query parsing
│ │ │ ├── reranker.py # LLM re-ranking
│ │ │ └── scoring_engine.py # Multi-signal scoring
│ │ ├── schemas/
│ │ │ ├── search.py # Search request/response (Pydantic)
│ │ │ └── movie.py # Movie schemas (Pydantic)
│ │ ├── utils/ # Reusable utilities ✨
│ │ │ ├── rate_limiter.py # Rate limiting utility
│ │ │ ├── http_client.py # HTTP client with retry
│ │ │ ├── logger.py # Logging setup
│ │ │ └── retry.py # Retry decorator
│ │ └── main.py # FastAPI app entry point
│ ├── scripts/
│ │ ├── ingest_tmdb.py # Data ingestion ✅ Module 1.1
│ │ ├── generate_embeddings.py # Embedding generation
│ │ └── build_index.py # Vector index builder
│ ├── tests/
│ │ ├── test_search.py # Search endpoint tests
│ │ └── test_embeddings.py # Embedding tests
│ ├── data/
│ │ ├── raw/ # Raw TMDB JSON data
│ │ ├── processed/ # Cleaned data
│ │ └── embeddings/ # Vector embeddings
│ ├── logs/ # Application logs
│ ├── requirements.txt # Python dependencies
│ ├── .env.example # Environment template
│ ├── Dockerfile # Docker configuration
│ └── README.md # Backend documentation
│
├── frontend/
│ ├── app/
│ │ ├── page.tsx # Home page
│ │ ├── search/
│ │ │ └── page.tsx # Search page
│ │ ├── movie/[id]/
│ │ │ └── page.tsx # Movie detail page
│ │ └── layout.tsx # Root layout
│ ├── components/
│ │ ├── SearchBar.tsx # Search input component
│ │ ├── MovieCard.tsx # Movie card component
│ │ ├── FilterPanel.tsx # Filter sidebar
│ │ └── ui/ # ShadCN UI components
│ ├── lib/
│ │ ├── api.ts # API client
│ │ └── utils.ts # Utility functions
│ ├── hooks/
│ │ └── useSearch.ts # Search hook
│ ├── package.json
│ └── next.config.js
│
├── images/ # Architecture diagrams
│ ├── System Archeitecture.png
│ ├── Flow Diagram.png
│ ├── Flow Chart.png
│ └── Sequence-diagram.png
│
├── docs/
│ ├── architecture.md # Architecture documentation
│ ├── api.md # API documentation
│ └── deployment.md # Deployment guide
│
├── .github/
│ └── workflows/
│ └── ci-cd.yml # GitHub Actions workflow
│
├── docker-compose.yml # Docker compose configuration
├── plan.md # Implementation plan
├── Project Overview # Technical design doc
└── README.md # This file
Development: http://localhost:8000
Production: https://api.filmfind.com
POST /api/search
Content-Type: application/json
{
"query": "dark sci-fi movies like Interstellar with less romance",
"limit": 10,
"filters": {
"year_min": 2010,
"year_max": 2024,
"language": "en",
"genres": ["Science Fiction"]
}
}Response:
{
"results": [
{
"id": 157336,
"title": "Interstellar",
"overview": "The adventures of a group of explorers...",
"rating": 8.4,
"match_score": 0.95,
"similarity_explanation": "Strong thematic match: space exploration, time dilation...",
"poster_url": "https://image.tmdb.org/...",
"genres": ["Science Fiction", "Drama"],
"release_date": "2014-11-07"
}
],
"count": 10,
"query_interpretation": {
"themes": ["space", "dark", "science fiction"],
"excluded": ["romance"],
"reference_movies": ["Interstellar"]
}
}GET /api/similar/{movie_id}?limit=10GET /api/movie/{movie_id}POST /api/filter
Content-Type: application/json
{
"genres": ["Thriller", "Mystery"],
"year_min": 2015,
"rating_min": 7.0,
"language": "en",
"streaming_providers": ["Netflix", "Prime Video"]
}GET /api/trending?limit=20&time_window=weekFor complete API documentation, see docs/api.md or visit /docs (Swagger UI) when running the backend.
- Mobile app (React Native)
- Episode-level recommendations for TV shows
- Real-time collaborative filtering
- Multi-user social recommendations
- Integration with more streaming services
- Podcast and documentary support
- ✅ Search response time < 500ms
- ✅ 90%+ relevant results for test queries
- ✅ Frontend Lighthouse score > 90
- ✅ API uptime > 99%
- ✅ 10,000+ movies indexed
- ✅ Support for 10+ languages
- ✅ Zero monthly costs (within free tier limits)
- ✅ Cache hit rate > 70%
- ✅ LLM calls within Groq free tier (30 req/min)
Query: "Shows like Stranger Things but with more horror elements"
FilmFind understands:
- Reference: Stranger Things
- Enhancement: More horror/darker tone
- Themes: Supernatural, group of kids, 80s setting
- Recommended: The Twilight Zone, Locke & Key, Dark, Archive 81
Query: "Movies about F1 racing with intense competition and personal rivalries"
FilmFind understands:
- Themes: Formula 1, racing, competition, rivalry
- Tone: Intense, dramatic
- Sport: Motorsport/F1
- Recommended: Rush, Ford v Ferrari, Senna, Grand Prix, Days of Thunder
Query: "Mystery and magical adventures like Harry Potter with school settings"
FilmFind understands:
- Reference: Harry Potter
- Themes: Magic, mystery, coming-of-age
- Setting: School/academy
- Genre: Fantasy + Mystery
- Recommended: The Chronicles of Narnia, Percy Jackson, His Dark Materials, The Magicians, A Discovery of Witches
Query: "Lighthearted sitcoms like Friends about group of friends navigating life and relationships"
FilmFind understands:
- Reference: Friends
- Themes: Friendship, relationships, comedy of life
- Tone: Lighthearted, feel-good
- Genre: Sitcom
- Recommended: How I Met Your Mother, New Girl, Brooklyn Nine-Nine, The Big Bang Theory, Modern Family
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Write clear, commented code
- Follow PEP 8 for Python code
- Use TypeScript for frontend code
- Write tests for new features
- Update documentation as needed
- 🐛 Bug fixes
- ✨ New features
- 📝 Documentation improvements
- 🎨 UI/UX enhancements
- 🔧 Performance optimizations
- 🧪 Test coverage
- 🌍 Internationalization
This project is licensed under the MIT License - see the LICENSE file for details.
Dheeraj Srirama
- GitHub: @dheerajsrirama
- LinkedIn: Dheeraj Srirama
- Email: sriramadheeraj@gmail.com
- TMDB for the comprehensive movie database API
- Groq for providing free tier LLM API access
- Sentence Transformers for excellent embedding models
- FastAPI for the amazing Python framework
- Next.js for the powerful React framework
- ShadCN UI for beautiful UI components
👨💻 Author
Dheeraj Srirama
- 🌐 Portfolio: dheerajsrirama.netlify.app
- 💼 LinkedIn: dheerajsrirama
- 🐙 GitHub: @dheerajram13
- 📧 Email: sriramadheeraj@gmail.com
If you have any questions, issues, or suggestions:
- 🐛 Issues: GitHub Issues
If you find FilmFind useful, please consider:
- Giving it a ⭐ on GitHub
- Sharing it with others
- Contributing to the project
- Reporting bugs and suggesting features




