District Assignment Status Report

Summary

Successfully downloaded and parsed all Texas Legislature district reference files on the server. Created precinct-to-district lookup system and updated 2.6 million voter records.

What Was Accomplished

1. Downloaded District Reference Files ✓

All files successfully downloaded to /opt/whovoted/data/district_reference/:

Congressional Districts (PLANC2333) - 38 districts
State Senate Districts (PLANS2168) - 31 districts
State House Districts (PLANH2316) - 150 districts

2. Parsed District Data ✓

Congressional: 38 districts, 10,106 precincts mapped
State Senate: 31 districts, counties mapped (precincts file needs different parsing)
State House: 150 districts, counties mapped (precincts file needs different parsing)

3. Created Lookup System ✓

Built precinct_district_lookup table with 9,654 county-precinct combinations
Indexed for fast O(1) lookups
Ready to assign districts based on County + Precinct

4. Updated Voter Records ✓

Processed 2,610,558 voters
100% have county data
100% have precinct data
Updated congressional districts for voters where precinct data matched

Current Issues

Issue 1: Low Congressional District Assignment (1.1%)

Problem: Only 28,091 out of 2.6M voters got congressional districts assigned

Root Cause: Precinct format mismatch

Lookup table has precincts like: "0001", "0024", "0036"
Voter table may have precincts like: "1", "24", "36" (without leading zeros)
Or county names don't match exactly (case, spacing, special characters)

Solution: Need to normalize precinct formats before matching

Issue 2: No State Senate/House Precinct Data

Problem: State Senate and House precinct files weren't parsed

Root Cause: The r365 precinct files for Senate/House have different structure than Congressional

Files on Server:

# Need to find these files:
ls -la /opt/whovoted/data/district_reference/*r365* | grep -E "(PLANS|PLANH)"

Solution: Update parser to handle Senate/House precinct file format

Issue 3: Voting History Column Names

Problem: Script expects voted_2024_general but actual column names are different

Need to check: What are the actual column names for voting history?

Next Steps

Step 1: Fix Precinct Format Matching

Create normalization function:

def normalize_precinct(precinct):
    """Normalize precinct format for matching."""
    if not precinct:
        return None
    # Remove leading zeros: "0001" -> "1"
    # Or add leading zeros: "1" -> "0001"
    # Standardize format based on what's in voter table
    return precinct.strip().lstrip('0') or '0'

Step 2: Check Actual Data Formats

-- Check precinct formats in voter table
SELECT DISTINCT precinct 
FROM voters 
WHERE county = 'Hidalgo' 
LIMIT 20;

-- Check precinct formats in lookup table
SELECT DISTINCT precinct 
FROM precinct_district_lookup 
WHERE county = 'Hidalgo' 
LIMIT 20;

-- Check voting history columns
PRAGMA table_info(voters);

Step 3: Parse State Senate/House Precincts

Find and parse the r365 files for Senate and House districts

Step 4: Re-run Assignment

After fixing format matching, re-run the assignment to get >95% coverage

Files Created

✓ deploy/parse_district_files_fixed.py - Parses XLS files
✓ deploy/build_vuid_district_lookup.py - Creates lookup and assigns districts
✓ deploy/fix_all_district_assignments.sh - Master script
✓ deploy/add_district_columns_to_voters.py - Adds missing columns
✓ DISTRICT_ASSIGNMENT_MASTER_PLAN.md - Complete plan
✓ DISTRICT_ASSIGNMENT_STATUS.md - This status report

Database State

Tables Created

precinct_district_lookup - 9,654 entries
Columns added to voters:
- state_senate_district
- state_house_district
- congressional_district (already existed)

Voter Statistics

Total voters: 2,610,558
With county: 2,610,558 (100%)
With precinct: 2,610,557 (100%)
With congressional district: 28,091 (1.1%) ⚠️ LOW
With state senate district: 0 (0%)
With state house district: 0 (0%)
With geocoded address: 469,766 (18%)

Immediate Action Required

Run diagnostic queries to understand the format mismatch:

ssh -i deploy/whovoted-key.pem ubuntu@politiquera.com
cd /opt/whovoted

# Check precinct formats
python3 << 'EOF'
import sqlite3
conn = sqlite3.connect('data/whovoted.db')
cursor = conn.cursor()

print("Sample precincts from voters table (Hidalgo County):")
cursor.execute("SELECT DISTINCT precinct FROM voters WHERE county = 'Hidalgo' LIMIT 10")
for row in cursor.fetchall():
    print(f"  '{row[0]}'")

print("\nSample precincts from lookup table (Hidalgo County):")
cursor.execute("SELECT DISTINCT precinct FROM precinct_district_lookup WHERE county LIKE '%Hidalgo%' LIMIT 10")
for row in cursor.fetchall():
    print(f"  '{row[0]}'")

print("\nSample county names from voters:")
cursor.execute("SELECT DISTINCT county FROM voters LIMIT 10")
for row in cursor.fetchall():
    print(f"  '{row[0]}'")

print("\nSample county names from lookup:")
cursor.execute("SELECT DISTINCT county FROM precinct_district_lookup LIMIT 10")
for row in cursor.fetchall():
    print(f"  '{row[0]}'")

conn.close()
EOF

This will show us exactly what format differences exist and how to fix them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

District Assignment Status Report

Summary

What Was Accomplished

1. Downloaded District Reference Files ✓

2. Parsed District Data ✓

3. Created Lookup System ✓

4. Updated Voter Records ✓

Current Issues

Issue 1: Low Congressional District Assignment (1.1%)

Issue 2: No State Senate/House Precinct Data

Issue 3: Voting History Column Names

Next Steps

Step 1: Fix Precinct Format Matching

Step 2: Check Actual Data Formats

Step 3: Parse State Senate/House Precincts

Step 4: Re-run Assignment

Files Created

Database State

Tables Created

Voter Statistics

Immediate Action Required

FilesExpand file tree

DISTRICT_ASSIGNMENT_STATUS.md

Latest commit

History

DISTRICT_ASSIGNMENT_STATUS.md

File metadata and controls

District Assignment Status Report

Summary

What Was Accomplished

1. Downloaded District Reference Files ✓

2. Parsed District Data ✓

3. Created Lookup System ✓

4. Updated Voter Records ✓

Current Issues

Issue 1: Low Congressional District Assignment (1.1%)

Issue 2: No State Senate/House Precinct Data

Issue 3: Voting History Column Names

Next Steps

Step 1: Fix Precinct Format Matching

Step 2: Check Actual Data Formats

Step 3: Parse State Senate/House Precincts

Step 4: Re-run Assignment

Files Created

Database State

Tables Created

Voter Statistics

Immediate Action Required