Skip to content

iremcimen/netflix-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📊 Netflix Titles Data Analysis & Data Cleaning

📌 Project Overview

This project focuses on cleaning, exploring, and visualizing the Netflix Titles dataset.
The goal is to demonstrate a data analyst–oriented workflow, starting from raw data and ending with clear, interpretable insights supported by visualizations.

Instead of building predictive models, this project emphasizes:

  • data cleaning
  • exploratory data analysis (EDA)
  • effective data visualization

📁 Dataset

  • Source: Netflix Titles Dataset
  • Content: Movies and TV Shows available on Netflix
  • Main features:
    • type (Movie / TV Show)
    • release_year
    • duration
    • rating
    • country
    • listed_in (genres)
    • date_added

🧹 Data Cleaning Process

The raw dataset contains missing values, inconsistent formats, and multi-value categorical columns.
The following cleaning steps were applied:

  • Removed duplicate records
  • Standardized column names
  • Converted date columns to proper datetime format
  • Handled missing values using simple and explainable strategies
  • Cleaned and separated the duration column
  • Identified and handled multi-label categorical features (listed_in)
  • Dropped columns that were not useful for analysis

The focus was on clarity, reproducibility, and realistic data cleaning decisions.


🔍 Exploratory Data Analysis (EDA)

After cleaning, exploratory analysis was performed to understand patterns and trends in the data.

Key questions explored:

  • How is Netflix content distributed between Movies and TV Shows?
  • How has Netflix content changed over time?
  • Which content ratings are most common?
  • What are the most frequent genres?
  • How are movie durations distributed?

📊 Key Visual Insights

🎬 Movies vs TV Shows Distribution

Movies vs TV Shows

Movies dominate the Netflix catalog compared to TV Shows.


📈 Content Growth Over Time

Content Over Time

The number of titles added to Netflix increased significantly after 2015.


⏱ Movie Duration Distribution

Movie Duration

Most movies cluster around 90-120 minutes (standard feature-length).


🎭 Top 10 Genres

Top Genres

International Movies and Dramas are the most common genres on Netflix.


📊 Movie vs TV Show Trend

Type Trend

Both content types have grown over time, with movies consistently leading.


📈 Key Insights

  • Netflix has rapidly expanded its content in recent years
  • Movies make up the majority of the catalog
  • Certain content ratings appear much more frequently than others
  • Movie durations show a clear clustering pattern

🛠 Tools & Libraries

  • Python
  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn
  • Jupyter Notebook

📌 What I Learned

  • Cleaning real-world datasets with missing and inconsistent values
  • Handling categorical and multi-label features
  • Structuring an EDA-focused data analysis project
  • Communicating insights through clear visualizations

📂 Repository Structure

📦 netflix-data-analysis
┣ 📁 images
┃ ┣ type_distribution.png
┃ ┣ content_over_time.png
┃ ┣ release_year_histogram.png
┃ ┣ top_genres.png
┃ ┗ type_by_year_trend.png
┣ 📄 nb.ipynb
┣ 📄 netflix_titles.csv
┗ 📄 README.md

About

Data cleaning and exploratory data analysis of the Netflix Titles dataset with clear visual insights.

Resources

Stars

Watchers

Forks

Contributors