Predictive win probability for National Hockey League (NHL) games using neural networks trained on game state and play-by-play data.
This project was originally developed alongside Alex Hagood for CPT_S 475: Data Science at Washington State University taught by Assefaw Gebremedhin. It was inspired by visualizations like ESPN's Win Probability graph for basketball and football. Similarly, this model provides live estimates of NHL game outcomes.
The play-by-play data is sourced directly from the NHL API using hockey-scraper and represents every in-game event between the 2007-2008 and 2023-2024 seasons.
Additional libraries include:
- Keras and TensorFlow for neural networks implementation.
- Dash and Plotly for web dashboard and visualization.
The three regulation periods of each game are divided into 30 second time slices representing the game state at the end of each interval, including each team's Elo, total score, shots on goal, hits, strength, and penalty minutes. Regulation and overtime periods are split into sliding windows of 3 plays to determine the plays leading to scoring opportunities for golden goal overtime. Slices and plays are fed into their own LSTM neural network to separately calculate regulation and overtime win probabilities, which are concatenated for the final graph.
The Dash app allows users to generate graphs by filtering by Home and Away Teams, then selecting a Game from the dropdown.
Built and tested with Python 3.11.9 but may support other versions.
To just run the dashboard with the included data and models, install packages from requirements.txt
pip install -r requirements.txtTo scrape and clean the NHL data and retrain the models, install both requirements.txt and requirements-dev.txt
pip install -r requirements.txt requirements-dev.txtpython app.pyThe web app will be available at http://localhost:8050/
The container uses the python:3.11.9-slim image and installs packages required for prediction and visualization.
docker build -t nhl-meter .
docker run -h localhost -p 8050:8050 -d nhl-meter- Scrape play-by-play and shift data from NHL API.
Note that this may take multiple days to complete
python ./dev/data/scrape.py
- Handle raw data
python ./dev/data/raw_to_reduced.py
- Train the regulation model using lstm.ipynb and the overtime model using lstm_ot.ipynb. The new models are ready to be used with the dashboard.
To run the test files using pytest:
pytest tests/test_callbacks.py
pytest tests/test_prediction.pytest_callbacks.py will test the Dash app functionality.
test_prediction.py will run the appropriate win prediction models on all games (~22,000) in the dataset.
