🎬 Reel Patterns

Reel Patterns is a data science project that uncovers hidden structures and surprising trends in cinema.
Instead of asking “What makes a movie successful?”, we dive into unconventional questions like:

Which actors form tight-knit cliques?
When should a franchise stop producing sequels before brand fatigue sets in?
Do blockbuster movies also create hit soundtracks?

This project was created by Or Forshmit, Noam Kimhi and Adir Tuval as part of the course 67978: A Needle in a Data Haystack – Introduction to Data Science at the Hebrew University of Jerusalem (HUJI).

Full paper available here.

🎓 Final Grade: 100

📍 Overview

Reel Patterns explores cinema data through three unconventional lenses:

Actor communities – uncovering hidden cliques and bridge actors using collaboration networks.

Franchise dynamics – analyzing how sequel performance evolves and identifying when franchises fall into brand fatigue.

Soundtrack correlations – testing whether popular soundtracks align with box-office success and audience ratings.

By combining large-scale datasets with graph analysis, statistical modeling, and interactive dashboards, we reveal structures and trends that go beyond traditional movie success metrics.

👾 Features

Actor community detection using collaboration graphs
Franchise sequel performance analysis
Soundtrack vs. movie success correlations
Interactive visualizations with Streamlit in the web app

📁 Project Structure

└── Reel-Patterns/
    ├── Curtain call, please
    │   ├── constants.py
    │   ├── curtain_call_visualizations.py
    │   ├── preprocessing.py
    │   └── query_wikidata_script.py
    ├── README
    ├── Reel Hits Meet Real Hits
    │   ├── organize_data.py
    │   └── reel_hits_viz.py
    ├── What Can I Say, We Cliqued
    │   ├── constants.py
    │   ├── organize_data.py
    │   └── streamlit_app.py
    ├── data
    │   ├── collabs.csv
    │   ├── LICENSE.md
    │   └── entire_data_link
    ├── figures
    │   ├── corr_pop_rating.png
    │   ├── movies_per_sequel_index.png
    │   ├── prob_of_success_audience_rating.png
    │   ├── prob_of_success_critic_rating.png
    │   ├── prob_of_success_roi.png
    │   ├── reel_hits_pearson_heatmap.png
    │   └── reel_hits_spearman_heatmap.png
    ├── LICENSE
    └── requirements.txt

📂 Project Index

REEL-PATTERNS

Root

requirements.txt ❯ Python dependencies required to run the project

LICENSE ❯ License for the code and non-data files (MIT)

Reel Hits Meet Real Hits

organize_data.py ❯ Collects and prepares soundtrack and movie data for analysis

reel_hits_viz.py ❯ Creates visualizations of correlations between soundtracks and movies

Curtain call, please

query_wikidata_script.py ❯ Script to fetch additional metadata from Wikidata

constants.py ❯ Constants and parameters for franchise analysis

curtain_call_visualizations.py ❯ Visualizations of sequel success and franchise dynamics

preprocessing.py ❯ Data cleaning and preparation for sequel performance analysis

What Can I Say, We Cliqued

streamlit_app.py ❯ Interactive Streamlit dashboard for actor collaboration networks

constants.py ❯ Constants and parameters for actor network analysis

organize_data.py ❯ Prepares collaboration data for graph-based analysis

🚀 Getting Started

☑️ Prerequisites

Before getting started with Reel-Patterns, ensure your runtime environment meets the following requirements:

Python 3.9
pip

Also, make sure to download the available data from Google Drive.

⚙️ Installation

Install Reel-Patterns the following way:

Clone the Reel-Patterns repository:

git clone https://github.com/OrF8/Reel-Patterns

Navigate to the project directory:

cd Reel-Patterns

Install the project dependencies using :

pip install -r requirements.txt

🤖 Usage

Most modules in this repository are designed to collect, preprocess, and analyze data, and then generate plots or visualizations. They are not meant to be long-running services, but rather scripts that prepare results and figures.
The main exception is the Streamlit app, which allows interactive exploration of results.

You can run one of the modules directly, for example:

python "Reel Hits Meet Real Hits/reel_hits_viz.py"

This will collect the relevant data, process it, and produce the associated plots.

To explore the results of the What can I say? We cliqued section interactively, you can use the web app, or you can run the app locally:

python -m streamlit run "What Can I Say, We Cliqued\streamlit_app.py"

🔰 Contributing

💬 Join the Discussions: Share your insights, provide feedback, or ask questions.
🐛 Report Issues: Submit bugs found for the Reel Patterns project.

⚠️ Contribution Policy

This is an academic course project, not a community-driven open-source project.
We are not seeking pull requests or code contributions.
However, we welcome:

Feedback on our analysis and results
Suggestions for further exploration
Questions or discussions about the methods used

Contributor Graph

🎗 License

This project’s code and non-dataset material is licensed under the MIT License.
See the LICENSE file for details.
Datasets are provided under their original licenses (see data/LICENSE.md).

📊 Data Licensing

This project combines multiple external datasets. To stay compliant with licensing and API terms, we distinguish between data we can share directly and data you must fetch yourself.

✅ Included in this repository

Kaggle TMDB dataset → Licensed under ODC Attribution License (ODC-By v1.0).
Kaggle RT dataset → Licensed under CC0 1.0 (Public Domain).
IMDb datasets → Provided under IMDb non-commercial terms.
Shared here strictly for academic and research purposes only.
Wikidata → Licensed under CC0 1.0 (Public Domain).

⚠️ Not included (must be fetched by users)

Spotify API data → Due to Spotify Developer Terms of Service, we cannot redistribute Spotify-derived datasets (e.g., album or track popularity).
Instead, we provide code using Spotipy so you can re-fetch the data yourself with your own API key.

Notice: Our script uses heuristics to guess the correct soundtrack album. Because of that, it made some mistakes (about 5%), and we had to manually fix them.

Note: All datasets were processed (cleaned, merged, filtered) for analysis in this project.
Processing does not change their original licensing terms. More information can be found in the data license file.

🙌 Acknowledgments

Information courtesy of IMDb. Used with permission.
Rotten Tomatoes dataset on Kaggle.
TMDB dataset on Kaggle.
Wikidata (CC0 public domain data).
Spotify API for soundtrack data (queried via Spotipy).
Streamlit for powering the interactive web app.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.streamlit		.streamlit
Curtain Call, Please		Curtain Call, Please
Reel Hits Meet Real Hits		Reel Hits Meet Real Hits
What Can I Say, We Cliqued		What Can I Say, We Cliqued
data		data
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Reel Patterns - by Noam Kimhi, Or Forshmit and Adir Tuval.pdf		Reel Patterns - by Noam Kimhi, Or Forshmit and Adir Tuval.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎬 Reel Patterns

🔗 Table of Contents

📍 Overview

👾 Features

📁 Project Structure

📂 Project Index

🚀 Getting Started

☑️ Prerequisites

⚙️ Installation

🤖 Usage

🔰 Contributing

⚠️ Contribution Policy

🎗 License

📊 Data Licensing

✅ Included in this repository

⚠️ Not included (must be fetched by users)

🙌 Acknowledgments

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

requirements.txt	`❯ Python dependencies required to run the project`
LICENSE	`❯ License for the code and non-data files (MIT)`

organize_data.py	`❯ Collects and prepares soundtrack and movie data for analysis`
reel_hits_viz.py	`❯ Creates visualizations of correlations between soundtracks and movies`

query_wikidata_script.py	`❯ Script to fetch additional metadata from Wikidata`
constants.py	`❯ Constants and parameters for franchise analysis`
curtain_call_visualizations.py	`❯ Visualizations of sequel success and franchise dynamics`
preprocessing.py	`❯ Data cleaning and preparation for sequel performance analysis`

streamlit_app.py	`❯ Interactive Streamlit dashboard for actor collaboration networks`
constants.py	`❯ Constants and parameters for actor network analysis`
organize_data.py	`❯ Prepares collaboration data for graph-based analysis`

License

OrF8/Reel-Patterns

Folders and files

Latest commit

History

Repository files navigation

🎬 Reel Patterns

🔗 Table of Contents

📍 Overview

👾 Features

📁 Project Structure

📂 Project Index

🚀 Getting Started

☑️ Prerequisites

⚙️ Installation

🤖 Usage

🔰 Contributing

⚠️ Contribution Policy

🎗 License

📊 Data Licensing

✅ Included in this repository

⚠️ Not included (must be fetched by users)

🙌 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages