find_scores.py calculates leverage scores for election races to help prioritize donations based on potential impact. The system combines multiple data sources to rank races by their donation leverage.
Leverage Score = Competitiveness × Saturation
The system starts by fetching all upcoming elections from Civic Engine API:
- Filters by date: Default shows races in next 18 months (configurable)
- Filters out past elections (configurable)
- Extracts individual races from elections
- Calculates days until election for prioritization
The system uses a tiered approach to determine competitiveness:
How it works:
- Searches Kalshi API for markets matching the race
- Validates market matches (state, office, district, year)
- Selects best matching market from all results
- Calculates competitiveness from market prices
- Weighted by match quality - poor matches are downweighted
For General Elections (2 candidates):
- Uses binary market price (e.g., "Will Democrat win?")
- Formula:
1 - abs(price - 50) / 50 - Example: 50% price = 1.0 (most competitive), 80% price = 0.4 (less competitive)
For Primary Elections (3+ candidates):
- Uses entropy-based calculation considering ALL candidates
- Formula:
0.6 × entropy_score + 0.4 × gap_score - Entropy:
-Σ(p_i × log(p_i))where p_i is probability of candidate i - Higher entropy = more evenly distributed = more competitive
- Also considers gap between top 2 candidates
- Adjusts for number of candidates (more candidates = more competitive due to vote splitting)
What if Kalshi market is a poor match?
-
Competitiveness: System still uses it, but downweights it:
- Weight = match_score if valid, or match_score × 0.5 if invalid
- Poor matches contribute less to final competitiveness score
- Warnings are displayed (e.g., "Year mismatch", "District mismatch")
- Market validation status shows "POOR MATCH"
- Data quality may be marked as "low" or "medium"
-
Saturation: Poor matches are NOT used for saturation:
- Only good matches (match_score ≥ 0.6 and is_valid = True) are used
- Poor matches result in saturation = None → 1.0 (neutral, no penalty)
- This prevents using wrong-race market activity for saturation calculation
Data Quality Indicators:
- High: Market volume > 100, good match
- Medium: Market volume 10-100, or poor match
- Low: Market volume < 10, or very poor match
When used: Always attempted, combined with other sources
How it works:
- Queries Civic Engine API for the position (e.g., "U.S. House - NC District 2")
- Gets all past races for that position
- Extracts winners from
Candidacy.result = "WON" - Maps winners to parties using FEC API
- Calculates competitiveness based on party consistency
Competitiveness Calculation:
- Same party wins all elections → Low competitiveness (0.3) - safe seat
- Parties alternate frequently → High competitiveness (0.8) - competitive
- Multiple parties → Very competitive (0.9)
Weight:
- Based on number of elections found: 0.3 for 1 election, up to 1.0 for 4+ elections
- More elections = higher weight = more confidence
Data Quality:
- High: 3+ election cycles found
- Medium: 2 election cycles found
- Low: 1 election cycle found
Works for: All race types that have positions in Civic Engine (House, Senate, State Senate, State House, Governor, etc.)
When used: Always attempted, combined with other sources
How it works:
- Extracts state from race name
- For county races: Attempts to use county-specific data (future improvement - currently aggregates)
- For state/federal races: Aggregates county-level party affiliation data for the state
- Calculates average party split (Dem/Rep ratio)
- Competitiveness = how close to 50/50 split
Weight:
- Lower weight (0.2) since it's less accurate than Kalshi/historical
- State-level aggregation is less precise than district-specific data
Limitations:
- State-level aggregation for most races (not district-specific)
- County-specific matching not yet implemented (would require county name → FIPS mapping)
- Less accurate than Kalshi or historical data
- Data quality: Medium
County Races:
- Currently uses state-level aggregation (known limitation)
- Future improvement: Extract county name from race name and match to specific FIPS code
- Would provide more accurate competitiveness for county-level races
New Approach: All three sources (Kalshi, Historical, NANDA) are collected and combined with weighted averaging.
How it works:
- Collect all available sources (some may be missing)
- Each source has a weight:
- Kalshi: match_score (0.0-1.0), downweighted if poor match
- Historical: 0.3-1.0 based on number of elections
- NANDA: 0.2 (fixed, lower weight)
- Normalize weights so they sum to 1.0
- Calculate weighted average:
Σ(comp_i × weight_i) / Σ(weight_i)
Benefits:
- If one source is missing, others get more weight automatically
- Multiple sources provide more robust estimate
- Poor Kalshi matches contribute less but still provide some signal
- Historical data can supplement or validate Kalshi
Example:
- Kalshi (good match, weight 0.8) + Historical (3 elections, weight 0.7) + NANDA (weight 0.2)
- Total weight = 1.7, normalized to 1.0
- Final = (Kalshi × 0.47) + (Historical × 0.41) + (NANDA × 0.12)
Default: If no sources available at all, uses 0.5 (moderate competitiveness)
Saturation measures how much fundraising has already occurred (inverse relationship).
Data Source: FEC API (Federal Election Commission)
How it works:
- Determines FEC cycle from election year
- Queries FEC API for all candidates in the race
- Sums total receipts (fundraising) for all candidates
- Calculates saturation score:
1 / log(1 + total_receipts)
Formula:
- $0 raised → saturation = 1.0 (highest, no saturation)
- $1M raised → saturation ≈ 0.14
- $10M raised → saturation ≈ 0.10
- $100M raised → saturation ≈ 0.09
Data Quality: High (actual campaign finance data)
Error Handling:
- Retry logic with exponential backoff (3 attempts)
- Handles rate limiting (429 errors)
- Distinguishes API errors from "no data" (returns conservative 0.5 for errors)
Data Source: Kalshi market volume/spread as proxy
IMPORTANT: This proxy ONLY works when a Kalshi market exists for the race (even if it's a poor match).
How it works:
- If Kalshi market found → uses market volume and bid-ask spread
- Formula:
log(1 + spread) / log(1 + volume) - Logic:
- Low volume + high spread = less market attention = lower saturation = higher score
- High volume + low spread = more market attention = higher saturation = lower score
What if Kalshi market is a poor match?
- System still uses it for saturation calculation
- Validation warnings are displayed
- Data quality may be marked as "medium" or "low"
- Warning: "Kalshi market volume/spread used as proxy - not actual fundraising data"
What if NO Kalshi market exists OR match is poor?
- Saturation cannot be calculated (set to None)
- Later set to 1.0 (neutral, no penalty) in leverage calculation
- Warning: "No saturation data available - Kalshi market not found" or "Poor Kalshi market match - saturation not calculated"
- Rationale: Using market activity from the wrong race doesn't make sense for saturation, so we use neutral value instead
Data Quality: Medium (proxy data, not actual finance data)
Data Source: None available
How it works:
- FEC doesn't cover local races
- Kalshi rarely has markets for local races
- NANDA doesn't cover local races
- Result: No saturation data available
Handling:
- Saturation set to None initially
- Later set to 1.0 (neutral, no penalty) in leverage calculation
- Warning: "No saturation data available for local race - no data sources cover local races"
- Leverage score = Competitiveness × 1.0 (only competitiveness matters)
Data Quality: None
Leverage Score = Competitiveness × Saturation
Special Cases:
- If saturation is None (no data): Set to 1.0 (neutral, no penalty)
- Time-based boost: Races within 90 days get 10% boost, 180 days get 5% boost
Detection:
- Kalshi markets with 3+ candidates indicate primary
- Civic Engine may classify as primary election type
Competitiveness Calculation:
- Uses entropy-based formula considering ALL candidates
- Accounts for vote splitting (more candidates = more competitive)
- Formula:
0.6 × entropy + 0.4 × gap_score - Entropy measures distribution of probabilities across all candidates
Saturation:
- Same as general elections (FEC for federal, Kalshi proxy for state)
Detection:
- Kalshi markets with 1-2 candidates (binary markets)
- Most common election type
Competitiveness Calculation:
- Uses binary market price
- Formula:
1 - abs(price - 50) / 50 - 50% = most competitive, 0% or 100% = least competitive
Saturation:
- FEC data for federal races
- Kalshi proxy for state races (if market exists)
Current Handling:
- Treated same as general elections
- No special handling (could be improved in future)
Limitations:
- May not distinguish between primary and runoff
- Uses same competitiveness calculation as general elections
Current Handling:
- Treated same as general elections
- No special handling (could be improved in future)
Limitations:
- May not distinguish recall from regular elections
- Uses same calculation methods
Types: President, U.S. Senate, U.S. House of Representatives
Competitiveness:
- Tier 1: Kalshi markets (if available)
- Tier 2: Historical election results (Civic Engine + FEC party mapping)
- Tier 3: NANDA state-level data (fallback)
Saturation:
- Always uses FEC data (actual campaign finance receipts)
- High data quality
- Retry logic for API failures
Advantages:
- Best data quality (actual finance data)
- Historical data available for most races
- Kalshi markets usually available
Types: Governor, State Senate, State House, Attorney General, Secretary of State
Competitiveness:
- Tier 1: Kalshi markets (if available)
- Tier 2: Historical election results (Civic Engine + FEC party mapping)
- Tier 3: NANDA state-level data (fallback)
Saturation:
- Kalshi proxy (if Kalshi market exists)
- Uses market volume and spread as indicators
- No saturation data if no Kalshi market exists (set to neutral 1.0)
Advantages:
- Historical data often available
- Kalshi markets sometimes available
Limitations:
- No actual campaign finance data (uses proxy)
- Proxy only works when Kalshi market exists
Types: County Supervisor, Sheriff, District Attorney, County Clerk, etc.
Competitiveness:
- Tier 1: Kalshi markets (rarely available)
- Tier 2: Historical election results (if position exists in Civic Engine)
- Tier 3: NANDA state-level data (fallback, less accurate)
Saturation:
- No data available (FEC doesn't cover, Kalshi rarely has markets)
- Set to neutral 1.0 (no penalty)
- Warning displayed
Limitations:
- Limited data sources
- May not have historical data in Civic Engine
- NANDA is state-level, not county-specific
Types: Mayor, City Council, City Attorney, etc.
Competitiveness:
- Tier 1: Kalshi markets (very rarely available)
- Tier 2: Historical election results (if position exists in Civic Engine)
- Tier 3: NANDA state-level data (fallback, less accurate)
Saturation:
- No data available (FEC doesn't cover, Kalshi rarely has markets)
- Set to neutral 1.0 (no penalty)
- Warning displayed
Limitations:
- Very limited data sources
- May not have historical data in Civic Engine
- NANDA is state-level, not city-specific
Types: School Board, Water District, Special Districts, etc.
Competitiveness:
- Tier 1: Kalshi markets (almost never available)
- Tier 2: Historical election results (if position exists in Civic Engine)
- Tier 3: NANDA state-level data (fallback, less accurate)
Saturation:
- No data available (FEC doesn't cover, Kalshi almost never has markets)
- Set to neutral 1.0 (no penalty)
- Warning displayed
Limitations:
- Extremely limited data sources
- May not have historical data in Civic Engine
- NANDA is state-level, not district-specific
When a Kalshi market is found, the system validates it using validate_kalshi_market_match():
Validation Criteria:
-
State Match (30% of score)
- Checks if state abbreviation or full name appears in market title/ticker
- Example: "TN" or "Tennessee" in market for Tennessee race
-
Office Type Match (30% of score)
- For House: Looks for "HOUSE" in ticker or "house" in title
- For Senate: Looks for "SENATE" in ticker or "senate" in title
-
District Match (20% of score, House only)
- Checks if district number appears in market
- Example: District 7 in "HOUSETN7S" or "TN 7"
-
Year Match (20% of score)
- Checks if election year appears in market
- Allows 2-year difference (e.g., 2024 market for 2026 race)
Validation Result:
- Good Match: All required criteria met (state + office, + district for House)
- Poor Match: Some criteria missing or mismatched
- Match Score: 0.0 to 1.0 (higher = better match)
Competitiveness:
- System still uses the market price to calculate competitiveness
- Validation warnings are displayed
- Data quality may be reduced (high → medium → low)
- Score may be less reliable
Saturation (State Races):
- System still uses market volume/spread for saturation proxy
- Validation warnings are displayed
- Data quality marked as "medium" (proxy data)
- Warning: "Kalshi market volume/spread used as proxy - not actual fundraising data"
Example Output:
⚠️ Kalshi Market Validation: POOR MATCH (score: 0.30)
📋 Market Validation Details:
- Year not found in market: looking for 2025
- District mismatch: looking for district 7, market may be for different district
Competitiveness:
- Falls back to historical election results (Tier 2)
- If no historical data, uses NANDA (Tier 3)
- If no NANDA, uses default 0.5
Saturation (State/Local Races):
- Cannot calculate saturation (no data source)
- Set to None, then 1.0 (neutral) in leverage calculation
- Warning: "No saturation data available"
- High: Kalshi market with volume > 100, or 3+ historical elections
- Medium: Kalshi market with volume 10-100, or 2 historical elections, or NANDA data
- Low: Kalshi market with volume < 10, or 1 historical election, or default value
- None: No data available (shouldn't happen, defaults to 0.5)
- High: FEC data (actual receipts) for federal races
- Medium: Kalshi proxy (market volume/spread) for state races
- Low: Kalshi proxy with low volume (< 10)
- None: No data available (local races, state races without Kalshi)
Race: "U.S. House of Representatives - North Carolina 2nd Congressional District"
Competitiveness:
- ✅ Kalshi market found: "NC-02 2024 General" (GOOD MATCH, score: 0.85)
- Market price: 52% → Competitiveness = 0.96 (very competitive)
- Data quality: High (volume > 100)
Saturation:
- ✅ FEC data available (cycle 2024)
- Total receipts: $8,500,000
- Saturation = 1 / log(1 + 8,500,000) ≈ 0.10
- Data quality: High (actual finance data)
Leverage Score: 0.96 × 0.10 = 0.096
Race: "Georgia State Senate - District 35"
Competitiveness:
⚠️ Kalshi market found: "Georgia Republican Senate nominee" (POOR MATCH, score: 0.30)- Market price: 40% → Competitiveness = 0.80 (competitive)
- Data quality: Medium (poor match, but market exists)
Saturation:
⚠️ Kalshi proxy used (market volume: 50, spread: 8)- Saturation = log(1 + 8) / log(1 + 50) ≈ 0.42
- Data quality: Medium (proxy data, not actual finance)
- Warning: "Kalshi market volume/spread used as proxy - not actual fundraising data"
Leverage Score: 0.80 × 0.42 = 0.336
Race: "Alabama State House - District 38"
Competitiveness:
- ❌ No Kalshi market found
- ✅ Historical data found: 3 elections (2020, 2022, 2024)
- All won by same party (REP) → Competitiveness = 0.30 (safe seat)
- Data quality: High (3+ elections)
Saturation:
- ❌ No Kalshi market → No saturation data
- Set to 1.0 (neutral, no penalty)
- Warning: "No saturation data available - Kalshi market not found"
- Data quality: None
Leverage Score: 0.30 × 1.0 = 0.30
Race: "San Francisco City Council - District 3"
Competitiveness:
- ❌ No Kalshi market found
- ❌ No historical data in Civic Engine (local positions may not be tracked)
- ✅ NANDA data used (California state-level)
- Competitiveness = 0.65 (based on state party split)
- Data quality: Medium (state-level, not district-specific)
Saturation:
- ❌ No FEC data (doesn't cover local races)
- ❌ No Kalshi market
- Set to 1.0 (neutral, no penalty)
- Warning: "No saturation data available for local race - no data sources cover local races"
- Data quality: None
Leverage Score: 0.65 × 1.0 = 0.65
- Only works when market exists: If no Kalshi market, no saturation data for state/local races
- Not actual finance data: Proxy based on market attention, not real fundraising
- Poor matches still used: System uses poor matches but flags them with warnings
- No saturation data: FEC, Kalshi, and NANDA don't cover local races
- Limited competitiveness data: May only have state-level NANDA data
- May not have historical data: Civic Engine may not track all local positions
- May not distinguish primary from general: System uses same calculation
- Runoff handling: No special handling for runoff elections
- Multi-candidate primaries: Entropy calculation helps but may not capture all nuances
- Federal races: Best data quality (FEC + Kalshi + Historical)
- State races: Medium data quality (Kalshi proxy, may not have markets)
- Local races: Lowest data quality (limited sources, state-level aggregation)
- Federal races with Kalshi markets and FEC data
- State races with good Kalshi matches and historical data
- Look for "high" data quality indicators
- State races with poor Kalshi matches
- Races with only historical data (no Kalshi)
- Check warnings for data quality issues
- Local races (city council, county)
- Races with only NANDA data
- Races with "low" or "none" data quality
- Consider these scores as rough estimates
- Best for: Federal and state races with good data quality
- Use with caution: Local races, races with poor Kalshi matches
- Always check: Data quality indicators and warnings
- Better primary detection: Distinguish primary from general elections
- Runoff handling: Special handling for runoff elections
- Local race data: Integrate local campaign finance databases if available
- Better Kalshi matching: Improve validation to reject very poor matches
- Multi-cycle analysis: Consider multiple election cycles for better competitiveness assessment
- FEC API: Retry logic with exponential backoff (3 attempts, 1s/2s/4s delays)
- Handles 429 rate limit errors automatically
- Civic Engine API: Retry logic with exponential backoff (3 attempts)
- Distinguishes API errors from "no data" cases
- API errors return conservative defaults (0.5) instead of treating as "no fundraising"
- Warnings displayed for all data quality issues
- Processes races sequentially (could be parallelized in future)
- Caches NANDA data in memory
- Limits output to top 20 races by default
find_scores.py provides a comprehensive system for ranking election races by donation leverage. It works best for federal and state races with good data quality, and gracefully handles races with limited or no data by using neutral values and clear warnings.
The system prioritizes transparency through data quality indicators and warnings, allowing users to make informed decisions about which races to prioritize for donations.