Skip to content

Reel Patterns is a data science project exploring hidden patterns in movie data. Using large datasets, we analyze actor communities, franchise sequel dynamics, and the link between film success and soundtrack popularity.

License

Notifications You must be signed in to change notification settings

OrF8/Reel-Patterns

Repository files navigation

🎬 Reel Patterns

Reel Patterns is a data science project that uncovers hidden structures and surprising trends in cinema.
Instead of asking β€œWhat makes a movie successful?”, we dive into unconventional questions like:

  • Which actors form tight-knit cliques?
  • When should a franchise stop producing sequels before brand fatigue sets in?
  • Do blockbuster movies also create hit soundtracks?

This project was created by Or Forshmit, Noam Kimhi and Adir Tuval as part of the course 67978: A Needle in a Data Haystack – Introduction to Data Science at the Hebrew University of Jerusalem (HUJI).

Full paper available here.

πŸŽ“ Final Grade: 100

license repo-top-language

streamlit pandas numpy matplotlib seaborn plotly networkx
scipy spotipy dotenv tqdm rapidfuzz requests


πŸ”— Table of Contents


πŸ“ Overview

Reel Patterns explores cinema data through three unconventional lenses:

Actor communities – uncovering hidden cliques and bridge actors using collaboration networks.

Franchise dynamics – analyzing how sequel performance evolves and identifying when franchises fall into brand fatigue.

Soundtrack correlations – testing whether popular soundtracks align with box-office success and audience ratings.

By combining large-scale datasets with graph analysis, statistical modeling, and interactive dashboards, we reveal structures and trends that go beyond traditional movie success metrics.


πŸ‘Ύ Features

  • Actor community detection using collaboration graphs
  • Franchise sequel performance analysis
  • Soundtrack vs. movie success correlations
  • Interactive visualizations with Streamlit in the web app

    Watch Demo

Reel Patterns – Demo


πŸ“ Project Structure

└── Reel-Patterns/
    β”œβ”€β”€ Curtain call, please
    β”‚   β”œβ”€β”€ constants.py
    β”‚   β”œβ”€β”€ curtain_call_visualizations.py
    β”‚   β”œβ”€β”€ preprocessing.py
    β”‚   └── query_wikidata_script.py
    β”œβ”€β”€ README
    β”œβ”€β”€ Reel Hits Meet Real Hits
    β”‚   β”œβ”€β”€ organize_data.py
    β”‚   └── reel_hits_viz.py
    β”œβ”€β”€ What Can I Say, We Cliqued
    β”‚   β”œβ”€β”€ constants.py
    β”‚   β”œβ”€β”€ organize_data.py
    β”‚   └── streamlit_app.py
    β”œβ”€β”€ data
    β”‚   β”œβ”€β”€ collabs.csv
    β”‚   β”œβ”€β”€ LICENSE.md
    β”‚   └── entire_data_link
    β”œβ”€β”€ figures
    β”‚   β”œβ”€β”€ corr_pop_rating.png
    β”‚   β”œβ”€β”€ movies_per_sequel_index.png
    β”‚   β”œβ”€β”€ prob_of_success_audience_rating.png
    β”‚   β”œβ”€β”€ prob_of_success_critic_rating.png
    β”‚   β”œβ”€β”€ prob_of_success_roi.png
    β”‚   β”œβ”€β”€ reel_hits_pearson_heatmap.png
    β”‚   └── reel_hits_spearman_heatmap.png
    β”œβ”€β”€ LICENSE
    └── requirements.txt

πŸ“‚ Project Index

REEL-PATTERNS
Root
requirements.txt ❯ Python dependencies required to run the project
LICENSE ❯ License for the code and non-data files (MIT)
Reel Hits Meet Real Hits
organize_data.py ❯ Collects and prepares soundtrack and movie data for analysis
reel_hits_viz.py ❯ Creates visualizations of correlations between soundtracks and movies
Curtain call, please
query_wikidata_script.py ❯ Script to fetch additional metadata from Wikidata
constants.py ❯ Constants and parameters for franchise analysis
curtain_call_visualizations.py ❯ Visualizations of sequel success and franchise dynamics
preprocessing.py ❯ Data cleaning and preparation for sequel performance analysis
What Can I Say, We Cliqued
streamlit_app.py ❯ Interactive Streamlit dashboard for actor collaboration networks
constants.py ❯ Constants and parameters for actor network analysis
organize_data.py ❯ Prepares collaboration data for graph-based analysis

πŸš€ Getting Started

β˜‘οΈ Prerequisites

Before getting started with Reel-Patterns, ensure your runtime environment meets the following requirements:

  • Python 3.9
  • pip

Also, make sure to download the available data from Google Drive.

βš™οΈ Installation

Install Reel-Patterns the following way:

  1. Clone the Reel-Patterns repository:
git clone https://github.com/OrF8/Reel-Patterns
  1. Navigate to the project directory:
cd Reel-Patterns
  1. Install the project dependencies using :
pip install -r requirements.txt

πŸ€– Usage

Most modules in this repository are designed to collect, preprocess, and analyze data, and then generate plots or visualizations. They are not meant to be long-running services, but rather scripts that prepare results and figures.
The main exception is the Streamlit app, which allows interactive exploration of results.

You can run one of the modules directly, for example:

python "Reel Hits Meet Real Hits/reel_hits_viz.py"

This will collect the relevant data, process it, and produce the associated plots.

To explore the results of the What can I say? We cliqued section interactively, you can use the web app, or you can run the app locally:

python -m streamlit run "What Can I Say, We Cliqued\streamlit_app.py"

πŸ”° Contributing

⚠️ Contribution Policy

This is an academic course project, not a community-driven open-source project.
We are not seeking pull requests or code contributions.
However, we welcome:

  • Feedback on our analysis and results
  • Suggestions for further exploration
  • Questions or discussions about the methods used
Contributor Graph


πŸŽ— License

This project’s code and non-dataset material is licensed under the MIT License.
See the LICENSE file for details.
Datasets are provided under their original licenses (see data/LICENSE.md).

πŸ“Š Data Licensing

This project combines multiple external datasets. To stay compliant with licensing and API terms, we distinguish between data we can share directly and data you must fetch yourself.

βœ… Included in this repository

⚠️ Not included (must be fetched by users)

  • Spotify API data β†’ Due to Spotify Developer Terms of Service, we cannot redistribute Spotify-derived datasets (e.g., album or track popularity).
    Instead, we provide code using Spotipy so you can re-fetch the data yourself with your own API key.

    Notice: Our script uses heuristics to guess the correct soundtrack album. Because of that, it made some mistakes (about 5%), and we had to manually fix them.

Note: All datasets were processed (cleaned, merged, filtered) for analysis in this project.
Processing does not change their original licensing terms. More information can be found in the data license file.


πŸ™Œ Acknowledgments


About

Reel Patterns is a data science project exploring hidden patterns in movie data. Using large datasets, we analyze actor communities, franchise sequel dynamics, and the link between film success and soundtrack popularity.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •  

Languages