Skip to content

akhtarrkhan/Movie-Analysis-Correlation-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Movies Correlation Analysis

Overview

This project analyzes relationships between different movie features such as budget, gross revenue, votes, score and runtime using Python.

The main goal is to understand which factors are most strongly correlated with a movie’s gross revenue.


Dataset

  • Source: Kaggle Movies Dataset : https://www.kaggle.com/danielgrijalvas/movies
  • Contains information on movies including:
    • name, rating, genre, year, released
    • score, votes
    • director, writer, star
    • country, budget, gross, company, runtime

Tools Used

  • Python
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Jupyter Notebook (VS Code)

Workflow

  1. Loaded and explored the dataset
  2. Checked missing values
  3. Cleaned data (dropped missing values for analysis)
  4. Converted data types where required
  5. Extracted correct year from the released column
  6. Created scatter plot of budget vs gross
  7. Built regression plot to observe trend
  8. Generated correlation matrix for numeric features
  9. Identified highly correlated feature pairs

Key Findings

  • Budget and gross revenue show strong positive correlation
  • Votes also have strong correlation with gross
  • Other numeric features show moderate or weak relationships
  • Correlation analysis helps identify key drivers of movie revenue

Visualization

  • Scatter plot (Budget vs Gross)
  • Regression plot
  • Heatmap of correlation matrix

Important Note

Only numeric features were used for correlation analysis to ensure meaningful results.


Files

  • movies_analysis_correlation.ipynb → main project notebook
  • movies.csv → dataset

How to Run

  1. Download the repository
  2. Open the notebook in VS Code or Jupyter
  3. Install required libraries
  4. Run all cells

Future Improvements

  • Handle missing data more strategically instead of dropping
  • Apply log transformation for skewed variables
  • Build predictive models for gross revenue
  • Explore genre/company-level insights

Author

Akhtar R Khan

About

This project analyzes relationships between different movie features such as budget, gross revenue, votes, score and runtime using Python. The main goal is to understand which factors are most strongly correlated with a movie’s gross revenue.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors