LLM Timeout Fix - Complete

Problem Summary

The AI-powered search feature was hanging when accessed through the web browser, returning a 500 Internal Server Error. The issue occurred specifically when:

User was authenticated (Google OAuth)
Request reached the /api/llm/query endpoint
Logs showed "Question: ..." but never proceeded to "Converting question to SQL..."
Direct Python tests worked perfectly
HTTP requests through gunicorn workers hung indefinitely

Root Cause

The Ollama LLM calls were blocking gunicorn worker threads without any timeout mechanism. When called through HTTP/gunicorn, the synchronous ollama.generate() calls would hang, blocking the worker and preventing any response to the client.

The original code used signal-based timeouts (signal.SIGALRM), which:

Only work on Unix systems
Don't work reliably in multi-threaded environments (like gunicorn workers)
Can interfere with other signal handlers

Solution Implemented

1. Threading-Based Timeouts

Replaced signal-based timeouts with a threading approach that works on all platforms and in threaded environments:

def run_with_timeout(func, args=(), kwargs=None, timeout=30):
    """Run a function with a timeout using threading."""
    result = [None]
    exception = [None]
    
    def target():
        try:
            result[0] = func(*args, **kwargs)
        except Exception as e:
            exception[0] = e
    
    thread = threading.Thread(target=target)
    thread.daemon = True
    thread.start()
    thread.join(timeout)
    
    if thread.is_alive():
        raise TimeoutError(f"Operation timed out after {timeout} seconds")
    
    if exception[0]:
        raise exception[0]
    
    return result[0]

2. Timeout Configuration

SQL Generation: 30-second timeout
- Critical operation that must complete
- Returns error to user if timeout occurs
Explanation Generation: 15-second timeout
- Non-critical, falls back to simple text
- "Found X voter(s) matching your criteria"
Suggestion Generation: 15-second timeout
- Non-critical, returns empty array if timeout
- User can still see results without suggestions

3. Enhanced Logging

Added comprehensive logging to track request flow:

logger.info("Calling Ollama with 30s timeout...")
response = run_with_timeout(call_ollama, timeout=30)
logger.info("Ollama call completed")

This helps identify exactly where requests hang if issues persist.

4. Graceful Error Handling

All timeout errors are caught and handled gracefully:

except TimeoutError as e:
    logger.error(f"LLM query generation timed out: {e}")
    return {
        'error': 'Query generation timed out. Ollama may be overloaded or not responding.',
        'sql': None,
        'question': question
    }

Files Modified

backend/llm_query.py
- Added run_with_timeout() helper function
- Updated question_to_sql() with 30s timeout
- Updated explain_results() with 15s timeout
- Updated suggest_followups() with 15s timeout
- Enhanced error handling for TimeoutError
deploy/test_user_queries.py (new)
- Tests exact user queries
- Comprehensive step-by-step testing
- Timing measurements for each operation
deploy/complete_llm_fix.sh (new)
- Complete deployment script
- Pulls code, clears cache, restarts services
- Runs tests and verifies endpoint
- Shows detailed progress
deploy/RUN_ON_SERVER.txt (new)
- Instructions for running deployment
- Alternative manual commands
- Troubleshooting guide

Testing

Local Testing (without Ollama)

cd WhoVoted
python deploy/test_user_queries.py

Expected: Will show Ollama not installed (normal for local dev)

Server Testing

ssh user@server
cd /var/www/politiquera
bash deploy/complete_llm_fix.sh

Expected output:

✓ Code updated
✓ Cache cleared
✓ Ollama verified
✓ Gunicorn restarted
✓ Tests passed
✓ Endpoint responding

Browser Testing

Go to https://politiquera.com
Sign in with Google
Click brain icon (🧠)
Test queries:
- "Show me Female voters in TX-15 who voted in 2024 but not 2026"
- "Show me which of my neighbors are Republican"

Expected: Results within 30-45 seconds

Deployment Instructions

Quick Deployment

cd /var/www/politiquera && bash deploy/complete_llm_fix.sh

Manual Deployment

# Pull code
cd /var/www/politiquera
git pull origin main

# Clear cache
find backend -type d -name __pycache__ -exec rm -rf {} +
find backend -name "*.pyc" -delete

# Restart gunicorn
pkill -9 -f "gunicorn.*app:app"
cd backend
source venv/bin/activate
nohup gunicorn -w 4 -b 127.0.0.1:5000 --timeout 120 app:app > logs/gunicorn.log 2>&1 &

# Test
cd /var/www/politiquera
python3 deploy/test_user_queries.py

Monitoring

Watch Logs in Real-Time

tail -f /var/www/politiquera/backend/logs/error.log

Check Gunicorn Status

ps aux | grep gunicorn

Check Ollama Status

ollama list

Test Ollama Directly

ollama run llama3.2:latest "Say hello"

Troubleshooting

If Still Hanging

Check logs for timeout messages

grep -i "timeout" /var/www/politiquera/backend/logs/error.log

Verify Ollama is responsive
```
time ollama list
```
Should complete in < 1 second
Test Ollama generation
```
time ollama run llama3.2:latest "Say hello"
```
Should complete in < 5 seconds

Check Ollama service

systemctl status ollama  # if using systemd

Restart Ollama if needed

systemctl restart ollama  # if using systemd
# or
pkill ollama && ollama serve &

If Getting Timeout Errors

This is actually good - it means the timeout is working! The issue is now with Ollama itself:

Check Ollama logs
Verify model is downloaded: ollama list
Try pulling model again: ollama pull llama3.2:latest
Check system resources (CPU, RAM, disk)
Consider using a smaller model if resources are limited

Success Criteria

✓ No more indefinite hanging
✓ Timeout errors returned within 30 seconds
✓ User sees error message instead of infinite loading
✓ Worker threads don't get stuck
✓ Other requests continue to work even if LLM times out

Next Steps

Deploy to production using complete_llm_fix.sh
Test with both user queries
Monitor logs for any timeout occurrences
If timeouts are frequent, investigate Ollama performance
Consider optimizing Ollama configuration or using smaller model

Commit History

ed6d22c - Fix LLM hanging issue with threading-based timeouts
a311b3f - Add deployment script for LLM fix
e5a7ae6 - Add comprehensive test and deployment scripts

Related Files

backend/llm_query.py - Core LLM functionality with timeouts
backend/app.py - LLM endpoint (lines 4720-4850)
public/search.js - Frontend AI search interface
deploy/test_user_queries.py - Test script for user queries
deploy/complete_llm_fix.sh - Complete deployment script
deploy/RUN_ON_SERVER.txt - Server deployment instructions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Timeout Fix - Complete

Problem Summary

Root Cause

Solution Implemented

1. Threading-Based Timeouts

2. Timeout Configuration

3. Enhanced Logging

4. Graceful Error Handling

Files Modified

Testing

Local Testing (without Ollama)

Server Testing

Browser Testing

Deployment Instructions

Quick Deployment

Manual Deployment

Monitoring

Watch Logs in Real-Time

Check Gunicorn Status

Check Ollama Status

Test Ollama Directly

Troubleshooting

If Still Hanging

If Getting Timeout Errors

Success Criteria

Next Steps

Commit History

Related Files

FilesExpand file tree

LLM_TIMEOUT_FIX_COMPLETE.md

Latest commit

History

LLM_TIMEOUT_FIX_COMPLETE.md

File metadata and controls

LLM Timeout Fix - Complete

Problem Summary

Root Cause

Solution Implemented

1. Threading-Based Timeouts

2. Timeout Configuration

3. Enhanced Logging

4. Graceful Error Handling

Files Modified

Testing

Local Testing (without Ollama)

Server Testing

Browser Testing

Deployment Instructions

Quick Deployment

Manual Deployment

Monitoring

Watch Logs in Real-Time

Check Gunicorn Status

Check Ollama Status

Test Ollama Directly

Troubleshooting

If Still Hanging

If Getting Timeout Errors

Success Criteria

Next Steps

Commit History

Related Files