A simple yet powerful command-line application for exploring IMDB movie data using PostgreSQL.
The project demonstrates SQL data modeling, joins, filtering, and query optimization on a real-world dataset of over one million movies.
- Explore movies by title, genre, length, rating and year range
- Sort results dynamically by rating, release year or length
- Fetch top-rated movies or movies within a specific time frame
- Secure, parameterized queries to prevent SQL injection
- Lightweight Python CLI interface
This project was created as a hands-on exercise in:
- Data modeling and normalization in PostgreSQL
- Writing optimized and safe SQL queries
- Integrating a Python CLI interface with a relational database
It uses cleaned IMDB datasets (not included due to file size limits).
You can download them from IMDB datasets.
cinemadb/
│
├── app/
│ ├── main.py # CLI interface
│ ├── db.py # PostgreSQL connection setup
│ ├── queries.py # SQL queries and dynamic filters
│
├── data/
│ ├── check_title.basics.py # Validates and cleans title.basics.tsv
│ ├── null.py # Converts values to PostgreSQL NULL format
│
├── sql/
│ ├── create_tables.sql # Schema definition for movies and ratings
│
├── requirements.txt
├── README.md
├── LICENSE
└── .gitignore
Before importing IMDB data into PostgreSQL, the raw .tsv files must be validated and cleaned.
This ensures correct column formatting and replaces invalid or malformed values.
All preprocessing scripts are located in the data/ folder.
Run the check_title.basics.py script to verify column types, handle malformed rows, and produce two output files:
title.basics.cleaned.csv→ valid and cleaned datatitle.basics.errors.csv→ rows that failed validation
Example:
cd data
python3 check_title.basics.pyNext, run the null.py script on the cleaned file. This replaces all "\N" strings with proper \N for PostgreSQL import.
Example:
python3 null.py title.basics.cleaned.csvThis creates a new file:
title.basics.cleaned.null.csv- Create a PostgreSQL database:
createdb moviesimdb
- Create the required tables:
psql -d moviesimdb -f sql/create_tables.sql
- Import the movie titles:
\copy moviesimdb(tconst, titleType, primaryTitle, originalTitle, isAdult, startYear, endYear, runtimeMinutes, genres) FROM '/home/path/to/your/file/title.basics.cleaned.null.csv' WITH (FORMAT csv, HEADER true, NULL '\N');
- Import the movie ratings:
\copy ratings(tconst, averageRating, numVotes) FROM '/home/path/to/your/file/title.ratings.tsv' WITH (FORMAT csv, HEADER true, DELIMITER E'\t', NULL '\N');
- Verify that the tables were populated correctly:
SELECT COUNT(*) FROM moviesimdb LIMIT 10; SELECT COUNT(*) FROM ratings LIMIT 10;
- Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate - Install dependencies:
pip install -r requirements.txt
- Run the CLI:
python3 -m app.main
Here is an example output of the Cinemadb app:
- PostgreSQL 15+
- Python 3.10+
- psycopg2 (PostgreSQL driver)
- tabulate (CLI table formatting)
- Linux (Kubuntu)
