Reel Patterns is a data science project that uncovers hidden structures and surprising trends in cinema.
Instead of asking βWhat makes a movie successful?β, we dive into unconventional questions like:
- Which actors form tight-knit cliques?
- When should a franchise stop producing sequels before brand fatigue sets in?
- Do blockbuster movies also create hit soundtracks?
This project was created by Or Forshmit, Noam Kimhi and Adir Tuval as part of the course 67978: A Needle in a Data Haystack β Introduction to Data Science at the Hebrew University of Jerusalem (HUJI).
Full paper available here.
π Final Grade: 100
- π Overview
- πΎ Features
- π Project Structure
- π Getting Started
- π° Contributing
- π License
- π Acknowledgments
Reel Patterns explores cinema data through three unconventional lenses:
Actor communities β uncovering hidden cliques and bridge actors using collaboration networks.
Franchise dynamics β analyzing how sequel performance evolves and identifying when franchises fall into brand fatigue.
Soundtrack correlations β testing whether popular soundtracks align with box-office success and audience ratings.
By combining large-scale datasets with graph analysis, statistical modeling, and interactive dashboards, we reveal structures and trends that go beyond traditional movie success metrics.
- Actor community detection using collaboration graphs
- Franchise sequel performance analysis
- Soundtrack vs. movie success correlations
- Interactive visualizations with Streamlit in the web app
βββ Reel-Patterns/
βββ Curtain call, please
β βββ constants.py
β βββ curtain_call_visualizations.py
β βββ preprocessing.py
β βββ query_wikidata_script.py
βββ README
βββ Reel Hits Meet Real Hits
β βββ organize_data.py
β βββ reel_hits_viz.py
βββ What Can I Say, We Cliqued
β βββ constants.py
β βββ organize_data.py
β βββ streamlit_app.py
βββ data
β βββ collabs.csv
β βββ LICENSE.md
β βββ entire_data_link
βββ figures
β βββ corr_pop_rating.png
β βββ movies_per_sequel_index.png
β βββ prob_of_success_audience_rating.png
β βββ prob_of_success_critic_rating.png
β βββ prob_of_success_roi.png
β βββ reel_hits_pearson_heatmap.png
β βββ reel_hits_spearman_heatmap.png
βββ LICENSE
βββ requirements.txtREEL-PATTERNS
Root
requirements.txt β― Python dependencies required to run the projectLICENSE β― License for the code and non-data files (MIT)
Reel Hits Meet Real Hits
organize_data.py β― Collects and prepares soundtrack and movie data for analysisreel_hits_viz.py β― Creates visualizations of correlations between soundtracks and movies
Curtain call, please
query_wikidata_script.py β― Script to fetch additional metadata from Wikidataconstants.py β― Constants and parameters for franchise analysiscurtain_call_visualizations.py β― Visualizations of sequel success and franchise dynamicspreprocessing.py β― Data cleaning and preparation for sequel performance analysis
What Can I Say, We Cliqued
streamlit_app.py β― Interactive Streamlit dashboard for actor collaboration networksconstants.py β― Constants and parameters for actor network analysisorganize_data.py β― Prepares collaboration data for graph-based analysis
Before getting started with Reel-Patterns, ensure your runtime environment meets the following requirements:
- Python 3.9
- pip
Also, make sure to download the available data from Google Drive.
Install Reel-Patterns the following way:
- Clone the Reel-Patterns repository:
git clone https://github.com/OrF8/Reel-Patterns- Navigate to the project directory:
cd Reel-Patternspip install -r requirements.txtMost modules in this repository are designed to collect, preprocess, and analyze data, and then generate plots or visualizations.
They are not meant to be long-running services, but rather scripts that prepare results and figures.
The main exception is the Streamlit app, which allows interactive exploration of results.
You can run one of the modules directly, for example:
python "Reel Hits Meet Real Hits/reel_hits_viz.py"This will collect the relevant data, process it, and produce the associated plots.
To explore the results of the What can I say? We cliqued section interactively, you can use the web app,
or you can run the app locally:
python -m streamlit run "What Can I Say, We Cliqued\streamlit_app.py"- π¬ Join the Discussions: Share your insights, provide feedback, or ask questions.
- π Report Issues: Submit bugs found for the
Reel Patternsproject.
This is an academic course project, not a community-driven open-source project.
We are not seeking pull requests or code contributions.
However, we welcome:
- Feedback on our analysis and results
- Suggestions for further exploration
- Questions or discussions about the methods used
This projectβs code and non-dataset material is licensed under the MIT License.
See the LICENSE file for details.
Datasets are provided under their original licenses (see data/LICENSE.md).
This project combines multiple external datasets. To stay compliant with licensing and API terms, we distinguish between data we can share directly and data you must fetch yourself.
- Kaggle TMDB dataset β Licensed under ODC Attribution License (ODC-By v1.0).
- Kaggle RT dataset β Licensed under CC0 1.0 (Public Domain).
- IMDb datasets β Provided under IMDb non-commercial terms.
Shared here strictly for academic and research purposes only. - Wikidata β Licensed under CC0 1.0 (Public Domain).
- Spotify API data β Due to Spotify Developer Terms of Service, we cannot redistribute Spotify-derived datasets (e.g., album or track popularity).
Instead, we provide code using Spotipy so you can re-fetch the data yourself with your own API key.Notice: Our script uses heuristics to guess the correct soundtrack album. Because of that, it made some mistakes (about 5%), and we had to manually fix them.
Note: All datasets were processed (cleaned, merged, filtered) for analysis in this project.
Processing does not change their original licensing terms.
More information can be found in the data license file.
- Information courtesy of IMDb. Used with permission.
- Rotten Tomatoes dataset on Kaggle.
- TMDB dataset on Kaggle.
- Wikidata (CC0 public domain data).
- Spotify API for soundtrack data (queried via Spotipy).
- Streamlit for powering the interactive web app.