📊 Netflix Titles Data Analysis & Data Cleaning

📌 Project Overview

This project focuses on cleaning, exploring, and visualizing the Netflix Titles dataset.
The goal is to demonstrate a data analyst–oriented workflow, starting from raw data and ending with clear, interpretable insights supported by visualizations.

Instead of building predictive models, this project emphasizes:

data cleaning
exploratory data analysis (EDA)
effective data visualization

📁 Dataset

Source: Netflix Titles Dataset
Content: Movies and TV Shows available on Netflix
Main features:
- type (Movie / TV Show)
- release_year
- duration
- rating
- country
- listed_in (genres)
- date_added

🧹 Data Cleaning Process

The raw dataset contains missing values, inconsistent formats, and multi-value categorical columns.
The following cleaning steps were applied:

Removed duplicate records
Standardized column names
Converted date columns to proper datetime format
Handled missing values using simple and explainable strategies
Cleaned and separated the duration column
Identified and handled multi-label categorical features (listed_in)
Dropped columns that were not useful for analysis

The focus was on clarity, reproducibility, and realistic data cleaning decisions.

🔍 Exploratory Data Analysis (EDA)

After cleaning, exploratory analysis was performed to understand patterns and trends in the data.

Key questions explored:

How is Netflix content distributed between Movies and TV Shows?
How has Netflix content changed over time?
Which content ratings are most common?
What are the most frequent genres?
How are movie durations distributed?

📊 Key Visual Insights

🎬 Movies vs TV Shows Distribution

Movies dominate the Netflix catalog compared to TV Shows.

📈 Content Growth Over Time

The number of titles added to Netflix increased significantly after 2015.

⏱ Movie Duration Distribution

Most movies cluster around 90-120 minutes (standard feature-length).

🎭 Top 10 Genres

International Movies and Dramas are the most common genres on Netflix.

📊 Movie vs TV Show Trend

Both content types have grown over time, with movies consistently leading.

📈 Key Insights

Netflix has rapidly expanded its content in recent years
Movies make up the majority of the catalog
Certain content ratings appear much more frequently than others
Movie durations show a clear clustering pattern

🛠 Tools & Libraries

Python
NumPy
Pandas
Matplotlib
Seaborn
Jupyter Notebook

📌 What I Learned

Cleaning real-world datasets with missing and inconsistent values
Handling categorical and multi-label features
Structuring an EDA-focused data analysis project
Communicating insights through clear visualizations

📂 Repository Structure

📦 netflix-data-analysis
┣ 📁 images
┃ ┣ type_distribution.png
┃ ┣ content_over_time.png
┃ ┣ release_year_histogram.png
┃ ┣ top_genres.png
┃ ┗ type_by_year_trend.png
┣ 📄 nb.ipynb
┣ 📄 netflix_titles.csv
┗ 📄 README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 Netflix Titles Data Analysis & Data Cleaning

📌 Project Overview

📁 Dataset

🧹 Data Cleaning Process