A Python-based web scraping and API project that:
- Automatically collects soccer statistics from OCSL website
- Stores data in a SQLite database
- Provides stats via API endpoints for Ottawa Falcons soccer club teams
Note: This project is currently under active development. Some features may change or be incomplete.
- Python
- Flask
- SQLite
- AWS EC2
- Nginx
- Poetry
- Terraform
- GitHub Actions
-
Ensure
poetryis installed. See system requirements/installation guide -
Install dependencies
poetry install-
Create local
/instance/config.pybased on/instance/config.example.py -
Initialize database
poetry run init_db- Seed database with test data
poetry run seed_dev_db- Start development server
poetry run devUse iPdb for interactive debugging:
import ipdb; ipdb.set_trace()Automated via GitHub Actions:
terraform-plan.ymlruns on pull requests to mainterraform-apply.ymlruns on merges to main, updates infrastucturedeploy.ymlruns on merges to main, updates production code
- Install dependencies
sudo dnf install git python3
pip install poetry
sudo dnf install nginx- Setup api and scheduler services with systemd
/etc/systemd/system/falcons-stats-api.service
[Unit]
Description=Gunicorn service for Falcons Stats Flask API
After=network.target
[Service]
User=ssm-user
WorkingDirectory=/home/ssm-user/falcons-stats
ExecStart=/home/ssm-user/.local/bin/poetry run gunicorn --workers 2 --bind 0.0.0.0:8080 'falcons_stats:create_app()'
Restart=always
[Install]
WantedBy=multi-user.target/etc/systemd/system/falcons-stats-scheduler.service
[Unit]
Description=Falcons Stats Scheduler Service
After=network.target
[Service]
User=ssm-user
WorkingDirectory=/home/ssm-user/falcons-stats
ExecStart=/home/ssm-user/.local/bin/poetry run scheduler
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target- Nginx Reverse Proxy Configuration
sudo apt update
sudo apt install -y nginx
/etc/nginx/conf.d/falcons-stats-api.conf
server {
listen 80;
location / {
proxy_pass http://127.0.0.1:8080;
}
}-
Copy
instance/config.example.pytoinstance/config.pyand update with production values -
Run these commands to start services:
Enable and Start Services
sudo systemctl daemon-reload
sudo systemctl enable falcons-stats-api falcons-stats-scheduler nginx
sudo systemctl start falcons-stats-api falcons-stats-scheduler nginxNow, services are handled by systemd and should restart whenever server restarts/crashes.
Check Service Status
sudo systemctl status falcons-stats-api
sudo systemctl status falcons-stats-schedulerNote: Most deployment tasks are now automated through GitHub Actions
- Structured, machine-parsable JSON logging (mostly followed best practices from this guide)
- CloudWatch support in production
Access leading scorers via:
/leading-scorers
Access leading kepers via:
/leading-keepers
- Replace mock data with actual data in scrapers (waiting on season to start for HTML tables to be present)
- Finish adding seeds for all Falcons teams/divs (waiting on all teams to register)
- Better error handling and logging (scrapers, api endpoints)
- Add tests, and include in pipeline
- Add support for database migrations
- Scrape more data (schedules, team standings, etc)
- Capture stats from all divisions/teams
- Run scheduler service as more granular background jobs
- Enhance observability with performance metrics, execution tracking, and resource monitoring for scheduled tasks