This project focuses on cleaning, exploring, and visualizing the Netflix Titles dataset.
The goal is to demonstrate a data analyst–oriented workflow, starting from raw data and ending with clear, interpretable insights supported by visualizations.
Instead of building predictive models, this project emphasizes:
- data cleaning
- exploratory data analysis (EDA)
- effective data visualization
- Source: Netflix Titles Dataset
- Content: Movies and TV Shows available on Netflix
- Main features:
type(Movie / TV Show)release_yeardurationratingcountrylisted_in(genres)date_added
The raw dataset contains missing values, inconsistent formats, and multi-value categorical columns.
The following cleaning steps were applied:
- Removed duplicate records
- Standardized column names
- Converted date columns to proper
datetimeformat - Handled missing values using simple and explainable strategies
- Cleaned and separated the
durationcolumn - Identified and handled multi-label categorical features (
listed_in) - Dropped columns that were not useful for analysis
The focus was on clarity, reproducibility, and realistic data cleaning decisions.
After cleaning, exploratory analysis was performed to understand patterns and trends in the data.
- How is Netflix content distributed between Movies and TV Shows?
- How has Netflix content changed over time?
- Which content ratings are most common?
- What are the most frequent genres?
- How are movie durations distributed?
Movies dominate the Netflix catalog compared to TV Shows.
The number of titles added to Netflix increased significantly after 2015.
Most movies cluster around 90-120 minutes (standard feature-length).
International Movies and Dramas are the most common genres on Netflix.
Both content types have grown over time, with movies consistently leading.
- Netflix has rapidly expanded its content in recent years
- Movies make up the majority of the catalog
- Certain content ratings appear much more frequently than others
- Movie durations show a clear clustering pattern
- Python
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Jupyter Notebook
- Cleaning real-world datasets with missing and inconsistent values
- Handling categorical and multi-label features
- Structuring an EDA-focused data analysis project
- Communicating insights through clear visualizations
📦 netflix-data-analysis
┣ 📁 images
┃ ┣ type_distribution.png
┃ ┣ content_over_time.png
┃ ┣ release_year_histogram.png
┃ ┣ top_genres.png
┃ ┗ type_by_year_trend.png
┣ 📄 nb.ipynb
┣ 📄 netflix_titles.csv
┗ 📄 README.md




