The AI-powered search feature was hanging when accessed through the web browser, returning a 500 Internal Server Error. The issue occurred specifically when:
- User was authenticated (Google OAuth)
- Request reached the
/api/llm/queryendpoint - Logs showed "Question: ..." but never proceeded to "Converting question to SQL..."
- Direct Python tests worked perfectly
- HTTP requests through gunicorn workers hung indefinitely
The Ollama LLM calls were blocking gunicorn worker threads without any timeout mechanism. When called through HTTP/gunicorn, the synchronous ollama.generate() calls would hang, blocking the worker and preventing any response to the client.
The original code used signal-based timeouts (signal.SIGALRM), which:
- Only work on Unix systems
- Don't work reliably in multi-threaded environments (like gunicorn workers)
- Can interfere with other signal handlers
Replaced signal-based timeouts with a threading approach that works on all platforms and in threaded environments:
def run_with_timeout(func, args=(), kwargs=None, timeout=30):
"""Run a function with a timeout using threading."""
result = [None]
exception = [None]
def target():
try:
result[0] = func(*args, **kwargs)
except Exception as e:
exception[0] = e
thread = threading.Thread(target=target)
thread.daemon = True
thread.start()
thread.join(timeout)
if thread.is_alive():
raise TimeoutError(f"Operation timed out after {timeout} seconds")
if exception[0]:
raise exception[0]
return result[0]-
SQL Generation: 30-second timeout
- Critical operation that must complete
- Returns error to user if timeout occurs
-
Explanation Generation: 15-second timeout
- Non-critical, falls back to simple text
- "Found X voter(s) matching your criteria"
-
Suggestion Generation: 15-second timeout
- Non-critical, returns empty array if timeout
- User can still see results without suggestions
Added comprehensive logging to track request flow:
logger.info("Calling Ollama with 30s timeout...")
response = run_with_timeout(call_ollama, timeout=30)
logger.info("Ollama call completed")This helps identify exactly where requests hang if issues persist.
All timeout errors are caught and handled gracefully:
except TimeoutError as e:
logger.error(f"LLM query generation timed out: {e}")
return {
'error': 'Query generation timed out. Ollama may be overloaded or not responding.',
'sql': None,
'question': question
}-
backend/llm_query.py
- Added
run_with_timeout()helper function - Updated
question_to_sql()with 30s timeout - Updated
explain_results()with 15s timeout - Updated
suggest_followups()with 15s timeout - Enhanced error handling for TimeoutError
- Added
-
deploy/test_user_queries.py (new)
- Tests exact user queries
- Comprehensive step-by-step testing
- Timing measurements for each operation
-
deploy/complete_llm_fix.sh (new)
- Complete deployment script
- Pulls code, clears cache, restarts services
- Runs tests and verifies endpoint
- Shows detailed progress
-
deploy/RUN_ON_SERVER.txt (new)
- Instructions for running deployment
- Alternative manual commands
- Troubleshooting guide
cd WhoVoted
python deploy/test_user_queries.pyExpected: Will show Ollama not installed (normal for local dev)
ssh user@server
cd /var/www/politiquera
bash deploy/complete_llm_fix.shExpected output:
- ✓ Code updated
- ✓ Cache cleared
- ✓ Ollama verified
- ✓ Gunicorn restarted
- ✓ Tests passed
- ✓ Endpoint responding
- Go to https://politiquera.com
- Sign in with Google
- Click brain icon (🧠)
- Test queries:
- "Show me Female voters in TX-15 who voted in 2024 but not 2026"
- "Show me which of my neighbors are Republican"
Expected: Results within 30-45 seconds
cd /var/www/politiquera && bash deploy/complete_llm_fix.sh# Pull code
cd /var/www/politiquera
git pull origin main
# Clear cache
find backend -type d -name __pycache__ -exec rm -rf {} +
find backend -name "*.pyc" -delete
# Restart gunicorn
pkill -9 -f "gunicorn.*app:app"
cd backend
source venv/bin/activate
nohup gunicorn -w 4 -b 127.0.0.1:5000 --timeout 120 app:app > logs/gunicorn.log 2>&1 &
# Test
cd /var/www/politiquera
python3 deploy/test_user_queries.pytail -f /var/www/politiquera/backend/logs/error.logps aux | grep gunicornollama listollama run llama3.2:latest "Say hello"-
Check logs for timeout messages
grep -i "timeout" /var/www/politiquera/backend/logs/error.log -
Verify Ollama is responsive
time ollama listShould complete in < 1 second
-
Test Ollama generation
time ollama run llama3.2:latest "Say hello"
Should complete in < 5 seconds
-
Check Ollama service
systemctl status ollama # if using systemd -
Restart Ollama if needed
systemctl restart ollama # if using systemd # or pkill ollama && ollama serve &
This is actually good - it means the timeout is working! The issue is now with Ollama itself:
- Check Ollama logs
- Verify model is downloaded:
ollama list - Try pulling model again:
ollama pull llama3.2:latest - Check system resources (CPU, RAM, disk)
- Consider using a smaller model if resources are limited
- ✓ No more indefinite hanging
- ✓ Timeout errors returned within 30 seconds
- ✓ User sees error message instead of infinite loading
- ✓ Worker threads don't get stuck
- ✓ Other requests continue to work even if LLM times out
- Deploy to production using
complete_llm_fix.sh - Test with both user queries
- Monitor logs for any timeout occurrences
- If timeouts are frequent, investigate Ollama performance
- Consider optimizing Ollama configuration or using smaller model
ed6d22c- Fix LLM hanging issue with threading-based timeoutsa311b3f- Add deployment script for LLM fixe5a7ae6- Add comprehensive test and deployment scripts
backend/llm_query.py- Core LLM functionality with timeoutsbackend/app.py- LLM endpoint (lines 4720-4850)public/search.js- Frontend AI search interfacedeploy/test_user_queries.py- Test script for user queriesdeploy/complete_llm_fix.sh- Complete deployment scriptdeploy/RUN_ON_SERVER.txt- Server deployment instructions