Skip to content

Latest commit

 

History

History
93 lines (64 loc) · 2.04 KB

File metadata and controls

93 lines (64 loc) · 2.04 KB

Movies Correlation Analysis

Overview

This project analyzes relationships between different movie features such as budget, gross revenue, votes, score and runtime using Python.

The main goal is to understand which factors are most strongly correlated with a movie’s gross revenue.


Dataset

  • Source: Kaggle Movies Dataset : https://www.kaggle.com/danielgrijalvas/movies
  • Contains information on movies including:
    • name, rating, genre, year, released
    • score, votes
    • director, writer, star
    • country, budget, gross, company, runtime

Tools Used

  • Python
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Jupyter Notebook (VS Code)

Workflow

  1. Loaded and explored the dataset
  2. Checked missing values
  3. Cleaned data (dropped missing values for analysis)
  4. Converted data types where required
  5. Extracted correct year from the released column
  6. Created scatter plot of budget vs gross
  7. Built regression plot to observe trend
  8. Generated correlation matrix for numeric features
  9. Identified highly correlated feature pairs

Key Findings

  • Budget and gross revenue show strong positive correlation
  • Votes also have strong correlation with gross
  • Other numeric features show moderate or weak relationships
  • Correlation analysis helps identify key drivers of movie revenue

Visualization

  • Scatter plot (Budget vs Gross)
  • Regression plot
  • Heatmap of correlation matrix

Important Note

Only numeric features were used for correlation analysis to ensure meaningful results.


Files

  • movies_analysis_correlation.ipynb → main project notebook
  • movies.csv → dataset

How to Run

  1. Download the repository
  2. Open the notebook in VS Code or Jupyter
  3. Install required libraries
  4. Run all cells

Future Improvements

  • Handle missing data more strategically instead of dropping
  • Apply log transformation for skewed variables
  • Build predictive models for gross revenue
  • Explore genre/company-level insights

Author

Akhtar R Khan