🎬 Movie Exploration System (PostgreSQL + Python CLI)

A simple yet powerful command-line application for exploring IMDB movie data using PostgreSQL.
The project demonstrates SQL data modeling, joins, filtering, and query optimization on a real-world dataset of over one million movies.

🚀 Features

Explore movies by title, genre, length, rating and year range
Sort results dynamically by rating, release year or length
Fetch top-rated movies or movies within a specific time frame
Secure, parameterized queries to prevent SQL injection
Lightweight Python CLI interface

🧩 About the Project

This project was created as a hands-on exercise in:

Data modeling and normalization in PostgreSQL
Writing optimized and safe SQL queries
Integrating a Python CLI interface with a relational database

It uses cleaned IMDB datasets (not included due to file size limits).
You can download them from IMDB datasets.

🧱 Project Structure

cinemadb/
│
├── app/
│   ├── main.py            # CLI interface
│   ├── db.py              # PostgreSQL connection setup
│   ├── queries.py         # SQL queries and dynamic filters
│
├── data/
│   ├── check_title.basics.py   # Validates and cleans title.basics.tsv
│   ├── null.py                 # Converts values to PostgreSQL NULL format
│
├── sql/
│   ├── create_tables.sql       # Schema definition for movies and ratings
│
├── requirements.txt
├── README.md
├── LICENSE
└── .gitignore

🧼 Data Cleaning (Preprocessing)

Before importing IMDB data into PostgreSQL, the raw .tsv files must be validated and cleaned.
This ensures correct column formatting and replaces invalid or malformed values.

All preprocessing scripts are located in the data/ folder.

1️⃣ Step 1 – Validate and clean `title.basics.tsv`

Run the check_title.basics.py script to verify column types, handle malformed rows, and produce two output files:

title.basics.cleaned.csv → valid and cleaned data
title.basics.errors.csv → rows that failed validation

Example:

   cd data
   python3 check_title.basics.py

2️⃣ Step 2 – Convert "\N" values to PostgreSQL NULL format

Next, run the null.py script on the cleaned file. This replaces all "\N" strings with proper \N for PostgreSQL import.

Example:

   python3 null.py title.basics.cleaned.csv

This creates a new file:

   title.basics.cleaned.null.csv

🏗️ Database Setup

Create a PostgreSQL database:
```
createdb moviesimdb
```

Create the required tables:

psql -d moviesimdb -f sql/create_tables.sql

Import the movie titles:

\copy moviesimdb(tconst, titleType, primaryTitle, originalTitle, isAdult, startYear, endYear, runtimeMinutes, genres) 
FROM '/home/path/to/your/file/title.basics.cleaned.null.csv' 
WITH (FORMAT csv, HEADER true, NULL '\N');

Import the movie ratings:

\copy ratings(tconst, averageRating, numVotes) 
FROM '/home/path/to/your/file/title.ratings.tsv' 
WITH (FORMAT csv, HEADER true, DELIMITER E'\t', NULL '\N');

Verify that the tables were populated correctly:

SELECT COUNT(*) FROM moviesimdb LIMIT 10;
SELECT COUNT(*) FROM ratings LIMIT 10;

🐍 Python Setup

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Run the CLI:
```
python3 -m app.main
```

🖥️ Example CLI Output

Here is an example output of the Cinemadb app:

🧰 Technologies Used

PostgreSQL 15+
Python 3.10+
psycopg2 (PostgreSQL driver)
tabulate (CLI table formatting)
Linux (Kubuntu)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Movie Exploration System (PostgreSQL + Python CLI)

🚀 Features

🧩 About the Project

🧱 Project Structure

🧼 Data Cleaning (Preprocessing)

1️⃣ Step 1 – Validate and clean `title.basics.tsv`

2️⃣ Step 2 – Convert "\N" values to PostgreSQL NULL format

🏗️ Database Setup

🐍 Python Setup

🖥️ Example CLI Output

🧰 Technologies Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
app		app
data		data
sql		sql
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.png		example.png
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎬 Movie Exploration System (PostgreSQL + Python CLI)

🚀 Features

🧩 About the Project

🧱 Project Structure

🧼 Data Cleaning (Preprocessing)

1️⃣ Step 1 – Validate and clean title.basics.tsv

2️⃣ Step 2 – Convert "\N" values to PostgreSQL NULL format

🏗️ Database Setup

🐍 Python Setup

🖥️ Example CLI Output

🧰 Technologies Used

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1️⃣ Step 1 – Validate and clean `title.basics.tsv`

Packages