Skip to content

Commit 6e0bc86

Browse files
committed
Merge branch 'main' of github.com:Donsven/Predictive_Modeling
feat: added data folders.
2 parents e78f3c9 + 5aa5d4b commit 6e0bc86

4 files changed

Lines changed: 138 additions & 0 deletions

File tree

TODO_PIPELINES/Logan_Pipelines.MD

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Logan's Pipeline
2+
3+
## API's
4+
Kalshi: https://docs.kalshi.com/welcome
5+
- Official Kalshi API for data on markets,as well as ability to execute trades
6+
7+
AlphaVantage: https://www.alphavantage.co/?gad_source=1&gad_campaignid=22373825259&gbraid=0AAAAA-6A8dPvpXp69gCs4SmebapE-mJk2&gclid=CjwKCAiAw9vIBhBBEiwAraSATkGjWa40za-SG4O4qmMSt-BuigN8prHF3KIa2RwbuNDU_Y8Qi1SmVhoC_k4QAvD_BwE#about
8+
- provides realtime and historical financial market data includes other asset classes(Stocks,ETFs, Mutual Funds, Crypto)
9+
10+
OddsJam : https://oddsjam.com/odds-api
11+
- Offers real time betting lines for over 100 different sports book
12+
13+
Yfinance : https://ranaroussi.github.io/yfinance/
14+
- unoffical Yahoo finance api to get financial data from Yahoo Finance
15+
16+
SportRadar : https://developer.sportradar.com/getting-started/docs/get-started
17+
- API for all sports data including NBA, NFL, NCAAAF etc
18+
19+
## High Level Control Flow
20+
21+
1. Collect Data from APIs + store it in database
22+
- MongoDb
23+
- PostgreSQL
24+
2. Clean and Preprocess Data
25+
- NumPy
26+
- Pandas
27+
3. Train the model using data
28+
- Scikit Learn
29+
- Tensor Flow
30+
- Tensorboard (experimentation and testing)
31+
4. Backtest our model on historical data and evaluate performance
32+
- Matplotlib
33+
5. Deploy Model and adjust bet sizes + handle risk mangement
34+
35+
36+

TODO_PIPELINES/lillie_pipeline.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
## APIs/Data Resources:
2+
1. https://collegefootballdata.com/
3+
- advertises free data on NCAA football
4+
- not sure if it is just FBS or FCS also
5+
- not sure on its speed for updates or accuracy but seems like a starting place
6+
2. https://www.basketball-reference.com/
7+
- another supposedly free site on NBA, ABA, G League, and WNBA
8+
- team, player, league information
9+
- not sure how the data is downloaded
10+
3. https://www.sports-reference.com/
11+
- more general sports from basketball-reference
12+
- includes more sports include baseball, football pro and college, basketball pro and college
13+
- not sure on update time but has good historical data
14+
- not betting specific
15+
4. https://www.kaggle.com/datasets/ehallmar/nba-historical-stats-and-betting-data
16+
- money lines betting information for NBA games
17+
5. https://www.kaggle.com/datasets/scottfree/sports-lines
18+
- betting information for line, over/under, and game results for select seasons of select sports
19+
- offer variety and also an AlphaPy python model to analyze the trend data in the game results
20+
21+
## High Level WorkFlow
22+
1. Collect data from APIs or data resources
23+
2. Filter and Clean Data into desired values and parameters
24+
3. Split the data into train and test
25+
4. Fit a linear regression model
26+
5. Evaluate accuracy --> RSME, MAE, R^2
27+
6. Adjust and improve
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# APIs
2+
## Basketbal APIs
3+
NBA: https://github.com/swar/nba_api
4+
Scraping: https://www.basketball-reference.com/
5+
6+
## Baseball APIs
7+
MLB: https://statsapi.mlb.com/
8+
9+
## Kalshi API
10+
https://docs.kalshi.com/welcome?redirect=%2F
11+
12+
# Control Flow
13+
14+
## 1. Collect Data
15+
Get NBA + MLB data from APIs and get injury list too maybe. Use Python for API calls with aiohttp async calls.
16+
17+
## 2. Clean Data and Normalization
18+
Pandas and Numpy to filter and clean data.
19+
20+
## 3. Feature Engineering
21+
Past performance (face-offs), Offensive/defensive ratings, Player availability, Pitch speed/batting stats (for MLB), Kalshi market metrics (price, volume, spread)
22+
23+
Scikit-Learn
24+
25+
## 4. Model Training
26+
Logistic Regression, Random Forest, Time-series models, Simple NN
27+
28+
PyTorch, Scikit-Learn
29+
30+
## 5. Test Preditctions on Kalshi/Fine-Tuning
31+
For a few days, put custom parameters and see what our model will predict. Wait for outcome to come and see how our model did
32+
33+
FastAPI, Docker
34+
35+
## 6. Track Record
36+
Track PnL
37+
38+
PostgreSQL/SQLite or Amazon S3 Bucket
39+
40+
## 7. Evaluate Performance
41+
Plot different graphs/metrics of wins vs. losses or net money
42+
43+
Pandas/Matplotlib or Streamlit

TODO_PIPELINES/sri_pipelines.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
Sports:
2+
* https://www.mysportsfeeds.com/
3+
* https://www.thesportsdb.com/
4+
* https://developer.sportradar.com/getting-started/docs/get-started
5+
6+
Finance:
7+
* https://www.alphavantage.co/
8+
* https://finnhubio.github.io/
9+
* https://developer.yahoo.com/api/
10+
11+
Control flow:
12+
1. Data Retrieval
13+
- Python scripts to fetch sports and market data from the API
14+
- Store the API responses in a database (MongoDB)
15+
16+
2. Data Extraction/Processing
17+
- Extract relevant features such as player information, team statistics, market data, etc (using Numpy and Pandas)
18+
- Store the tables using AWS if needed
19+
20+
3. Model Training
21+
- Model training logic, weights assignment and updating, etc.
22+
- Scikit-learn and TensorFlow
23+
24+
4. Model Deployment
25+
- Flask or FastAPI for real-time predictions given a user request
26+
27+
5. Frontend
28+
- React.js and Next.js to allow users to pick between sports and finance markets, input queries, view predictions, and visualize trends
29+
- No need for user accounts and login/logout authentication (yet)
30+
31+
Potential additions:
32+
After an event predicted by the model occurs, retrain the model based on the result of the event.

0 commit comments

Comments
 (0)