Skip to content

Latest commit

 

History

History
165 lines (126 loc) · 5.28 KB

File metadata and controls

165 lines (126 loc) · 5.28 KB

District Assignment Status Report

Summary

Successfully downloaded and parsed all Texas Legislature district reference files on the server. Created precinct-to-district lookup system and updated 2.6 million voter records.

What Was Accomplished

1. Downloaded District Reference Files ✓

All files successfully downloaded to /opt/whovoted/data/district_reference/:

  • Congressional Districts (PLANC2333) - 38 districts
  • State Senate Districts (PLANS2168) - 31 districts
  • State House Districts (PLANH2316) - 150 districts

2. Parsed District Data ✓

  • Congressional: 38 districts, 10,106 precincts mapped
  • State Senate: 31 districts, counties mapped (precincts file needs different parsing)
  • State House: 150 districts, counties mapped (precincts file needs different parsing)

3. Created Lookup System ✓

  • Built precinct_district_lookup table with 9,654 county-precinct combinations
  • Indexed for fast O(1) lookups
  • Ready to assign districts based on County + Precinct

4. Updated Voter Records ✓

  • Processed 2,610,558 voters
  • 100% have county data
  • 100% have precinct data
  • Updated congressional districts for voters where precinct data matched

Current Issues

Issue 1: Low Congressional District Assignment (1.1%)

Problem: Only 28,091 out of 2.6M voters got congressional districts assigned

Root Cause: Precinct format mismatch

  • Lookup table has precincts like: "0001", "0024", "0036"
  • Voter table may have precincts like: "1", "24", "36" (without leading zeros)
  • Or county names don't match exactly (case, spacing, special characters)

Solution: Need to normalize precinct formats before matching

Issue 2: No State Senate/House Precinct Data

Problem: State Senate and House precinct files weren't parsed

Root Cause: The r365 precinct files for Senate/House have different structure than Congressional

Files on Server:

# Need to find these files:
ls -la /opt/whovoted/data/district_reference/*r365* | grep -E "(PLANS|PLANH)"

Solution: Update parser to handle Senate/House precinct file format

Issue 3: Voting History Column Names

Problem: Script expects voted_2024_general but actual column names are different

Need to check: What are the actual column names for voting history?

Next Steps

Step 1: Fix Precinct Format Matching

Create normalization function:

def normalize_precinct(precinct):
    """Normalize precinct format for matching."""
    if not precinct:
        return None
    # Remove leading zeros: "0001" -> "1"
    # Or add leading zeros: "1" -> "0001"
    # Standardize format based on what's in voter table
    return precinct.strip().lstrip('0') or '0'

Step 2: Check Actual Data Formats

-- Check precinct formats in voter table
SELECT DISTINCT precinct 
FROM voters 
WHERE county = 'Hidalgo' 
LIMIT 20;

-- Check precinct formats in lookup table
SELECT DISTINCT precinct 
FROM precinct_district_lookup 
WHERE county = 'Hidalgo' 
LIMIT 20;

-- Check voting history columns
PRAGMA table_info(voters);

Step 3: Parse State Senate/House Precincts

Find and parse the r365 files for Senate and House districts

Step 4: Re-run Assignment

After fixing format matching, re-run the assignment to get >95% coverage

Files Created

  1. deploy/parse_district_files_fixed.py - Parses XLS files
  2. deploy/build_vuid_district_lookup.py - Creates lookup and assigns districts
  3. deploy/fix_all_district_assignments.sh - Master script
  4. deploy/add_district_columns_to_voters.py - Adds missing columns
  5. DISTRICT_ASSIGNMENT_MASTER_PLAN.md - Complete plan
  6. DISTRICT_ASSIGNMENT_STATUS.md - This status report

Database State

Tables Created

  • precinct_district_lookup - 9,654 entries
  • Columns added to voters:
    • state_senate_district
    • state_house_district
    • congressional_district (already existed)

Voter Statistics

  • Total voters: 2,610,558
  • With county: 2,610,558 (100%)
  • With precinct: 2,610,557 (100%)
  • With congressional district: 28,091 (1.1%) ⚠️ LOW
  • With state senate district: 0 (0%)
  • With state house district: 0 (0%)
  • With geocoded address: 469,766 (18%)

Immediate Action Required

Run diagnostic queries to understand the format mismatch:

ssh -i deploy/whovoted-key.pem ubuntu@politiquera.com
cd /opt/whovoted

# Check precinct formats
python3 << 'EOF'
import sqlite3
conn = sqlite3.connect('data/whovoted.db')
cursor = conn.cursor()

print("Sample precincts from voters table (Hidalgo County):")
cursor.execute("SELECT DISTINCT precinct FROM voters WHERE county = 'Hidalgo' LIMIT 10")
for row in cursor.fetchall():
    print(f"  '{row[0]}'")

print("\nSample precincts from lookup table (Hidalgo County):")
cursor.execute("SELECT DISTINCT precinct FROM precinct_district_lookup WHERE county LIKE '%Hidalgo%' LIMIT 10")
for row in cursor.fetchall():
    print(f"  '{row[0]}'")

print("\nSample county names from voters:")
cursor.execute("SELECT DISTINCT county FROM voters LIMIT 10")
for row in cursor.fetchall():
    print(f"  '{row[0]}'")

print("\nSample county names from lookup:")
cursor.execute("SELECT DISTINCT county FROM precinct_district_lookup LIMIT 10")
for row in cursor.fetchall():
    print(f"  '{row[0]}'")

conn.close()
EOF

This will show us exactly what format differences exist and how to fix them.