Skip to content

Latest commit

 

History

History
248 lines (194 loc) · 5.55 KB

File metadata and controls

248 lines (194 loc) · 5.55 KB

WhoVoted Performance Optimization Guide

Quick Start

# Pull latest code
cd /opt/whovoted && git pull origin main

# Run optimization (interactive, with backups)
bash deploy/optimize_master.sh

# Monitor progress in another terminal
bash deploy/check_status.sh watch

What Gets Optimized

Problem 1: Household Popup (30s → <1s)

  • Issue: Address matching scans 500K+ rows
  • Fix: Indexes on lat/lng and address columns
  • Impact: 30x faster

Problem 2: Gazette Never Loads (∞ → <5s)

  • Issue: Correlated subqueries for new voters and flips
  • Fix: Denormalized columns + pre-computed cache
  • Impact: 100x+ faster

Problem 3: District Campaigns Slow

  • Issue: Multi-county queries with temp tables
  • Fix: Indexes + denormalization
  • Impact: 10x faster

Optimization Steps

Step 1: Add Indexes (30 seconds, safe)

/opt/whovoted/venv/bin/python3 deploy/optimize_step1_indexes.py

What it does:

  • Adds 9 critical indexes
  • Safe to run anytime
  • Immediate 2-5x speedup

Indexes added:

  • voters(lat, lng) - Household lookups
  • voters(address) - Address matching
  • voter_elections(vuid, election_date, party_voted) - History queries
  • Gender, age, county filters

Step 2: Denormalization (5-10 minutes, safe)

/opt/whovoted/venv/bin/python3 deploy/optimize_safe_denormalization.py

What it does:

  • Adds computed columns (NEVER modifies existing data)
  • Computes: is_new_voter, previous_party, has_flipped
  • Creates indexes on new columns

Safety:

  • Original data NEVER touched
  • Only ADDS new columns
  • Fully reversible
  • Creates backup automatically

Impact:

  • Gazette queries: 5 min → 5 sec
  • No more correlated subqueries

Step 3: Pre-compute Gazette (2-5 minutes)

/opt/whovoted/venv/bin/python3 deploy/optimize_step2_gazette.py

What it does:

  • Computes all gazette stats once
  • Saves to static JSON file
  • Gazette loads instantly

When to run:

  • After each scraper run
  • After geocoding batch completes

Monitoring

Check Status Once

bash deploy/check_status.sh

Watch Status (Auto-refresh)

bash deploy/check_status.sh watch

Shows:

  • Current stage
  • Progress bar
  • Elapsed time
  • Recent steps
  • Error details (if any)

Status File Location

/opt/whovoted/data/optimization_status.json

Safety & Rollback

Automatic Backups

Master script creates timestamped backups:

/opt/whovoted/data/whovoted.db.backup.YYYYMMDD_HHMMSS

Manual Backup

cp /opt/whovoted/data/whovoted.db /opt/whovoted/data/whovoted.db.backup

Rollback Denormalization

ALTER TABLE voter_elections DROP COLUMN is_new_voter;
ALTER TABLE voter_elections DROP COLUMN previous_party;
ALTER TABLE voter_elections DROP COLUMN previous_election_date;
ALTER TABLE voter_elections DROP COLUMN has_flipped;

Restore from Backup

cp /opt/whovoted/data/whovoted.db.backup /opt/whovoted/data/whovoted.db
sudo supervisorctl restart whovoted

Integration with Scraper

EVR Scraper

Already integrated - runs optimization after successful scrape.

Geocoding

Already integrated - runs optimization after batch completes.

Manual Upload

Run after processing:

/opt/whovoted/venv/bin/python3 deploy/optimize_step2_gazette.py

Testing

Test Household Popup

  1. Open map
  2. Zoom in to see individual markers
  3. Click a household
  4. Should load in <1 second

Test Gazette

  1. Click newspaper icon
  2. Should load instantly (if pre-computed)
  3. Or <10 seconds (if computing live)

Test District Campaigns

  1. Select Congressional District 15
  2. Should load stats in <5 seconds

Troubleshooting

"Database is locked"

  • App is running - stop it first:
    sudo supervisorctl stop whovoted
    # Run optimization
    sudo supervisorctl start whovoted

Optimization Stuck

  • Check status: bash deploy/check_status.sh
  • Check process: ps aux | grep optimize
  • Kill if needed: pkill -f optimize

Data Integrity Check Failed

  • Restore from backup immediately
  • Report issue before re-running

Gazette Still Slow

  • Check if denormalization ran:
    SELECT is_new_voter FROM voter_elections LIMIT 1;
  • If column doesn't exist, run Step 2

Performance Targets

Operation Before After Target
Household popup 30s <1s
Gazette load Never <5s
District stats 30s+ <5s
Heatmap load 5s 2s

Maintenance

After Each Scraper Run

Optimization runs automatically via hooks.

Manual Trigger

/opt/whovoted/venv/bin/python3 deploy/optimize_step2_gazette.py

Weekly

Check database size and consider VACUUM:

sqlite3 /opt/whovoted/data/whovoted.db "VACUUM;"

Architecture

Data Flow

Scraper → voter_elections (source data)
         ↓
Denormalization → computed columns (is_new_voter, etc)
         ↓
Gazette Pre-compute → static JSON cache
         ↓
API → Instant response

Tables

  • voters - Source data (NEVER modified)
  • voter_elections - Source data + computed columns
  • election_stats_cache - Pre-aggregated stats (future)
  • household_groups - Pre-computed households (future)

Files

  • /opt/whovoted/data/whovoted.db - Main database
  • /opt/whovoted/public/cache/gazette_insights.json - Cached gazette
  • /opt/whovoted/data/optimization_status.json - Status tracking

Support

If optimization fails or data is corrupted:

  1. Stop the app
  2. Restore from backup
  3. Check logs
  4. Report issue with status file contents