RUNBOOK.md

Quick reference for common issues, fixes, and troubleshooting procedures for the Facebook Messenger AI Bot.

This runbook should be updated whenever new issues are discovered or procedures change. Keep it synchronized with the codebase and operational reality. See AGENTS.md and TESTING.md for instructions on when and how to update this runbook.

Common Issues & Fixes

Issue: Agent returns off-topic responses

Symptoms:

Agent responses are not based on the reference document
Agent provides information outside the knowledge base
High escalation rate due to out-of-scope queries

Diagnosis:

Check agent confidence scores in Logfire logs:

# If using Logfire cloud: View logs in Logfire dashboard
# If local: Check structured logs for confidence scores
grep -i "confidence" logs/*.log | grep -E "confidence=[0-9.]+"

Review agent service logs for low confidence patterns:

grep "low confidence\|escalat\|requires_escalation" logs/*.log

Check reference document is properly loaded:

uv run python -m src.cli.setup_cli verify  # If CLI has verify command

Fix:

Review reference document content for completeness
Update reference document if source website changed:
- Re-run CLI setup: uv run python -m src.cli.setup_cli setup
- Or manually update via Supabase dashboard
Verify agent system prompt in src/services/agent_service.py line 49
Adjust confidence threshold (currently 0.7) in agent logic if needed

Test with evaluation set:

uv run pytest tests/e2e/test_webhook_message_flow.py -v

Issue: Copilot SDK unavailable / fallback to OpenAI

Symptoms:

Health check returns non-200 status
Response latency increases significantly
Fallback to OpenAI is being used frequently

Diagnosis:

Check Copilot SDK availability:

curl -X GET http://localhost:5909/health -v

Check if Copilot CLI is running:
```
ps aux | grep copilot
```

Review Logfire logs for health check failures:

# Search for Copilot health check logs
grep -i "copilot.*health\|copilot.*available\|copilot.*unavailable" logs/*.log
# Or check Logfire dashboard for structured logs with correlation IDs

Check Copilot service logs in src/services/copilot_service.py initialization

Fix:

Verify GitHub Copilot CLI is installed:
```
github-copilot --version
```
Start Copilot CLI if not running:
```
github-copilot-cli start
```
Check COPILOT_CLI_HOST environment variable:
```
echo $COPILOT_CLI_HOST
```
Verify port 5909 is not blocked:
```
lsof -i :5909
```
Restart the FastAPI application if health check still fails
Monitor fallback rate - if > 5%, investigate Copilot CLI stability

Issue: Facebook API errors / rate limiting

Symptoms:

Messages not being sent to users
HTTP 429 (Too Many Requests) errors
HTTP 401 (Unauthorized) errors
HTTP 403 (Forbidden) errors

Diagnosis:

Check Facebook API response in logs:

grep "facebook_service\|send_message" logs/*.log | grep -i "error\|429\|401\|403"

Verify Facebook page access token:

echo $FACEBOOK_PAGE_ACCESS_TOKEN | wc -c  # Should be ~100+ characters

Check token expiration with Facebook Graph API:

curl "https://graph.facebook.com/v18.0/debug_token?input_token={token}&access_token={token}"

Check rate limit status:
```
grep "rate_limit\|429" logs/*.log
```

Fix:

For 401/403 errors (invalid or expired token):
- Generate new page access token from Facebook App Dashboard
- Update FACEBOOK_PAGE_ACCESS_TOKEN in Railway environment
- Restart application
For 429 errors (rate limiting):
- Implement exponential backoff (already done in src/services/facebook_service.py)
- Reduce concurrent message sends if applicable
- Check if another process is sending messages on same page
- Contact Facebook support if limit is too restrictive
For other HTTP errors:
- Check Facebook API documentation for error codes
- Verify page ID and recipient IDs are correct
- Check message content (too long, invalid format)

Issue: Database connection failures

Symptoms:

"Failed to connect to database" errors
Queries timing out
Connection pool exhausted errors
500 Internal Server Error on webhook requests

Diagnosis:

Check Supabase connection status:

curl -X POST https://<supabase-url>/rest/v1/rpc/health_check \
  -H "apikey: $SUPABASE_SERVICE_KEY"

Verify Supabase credentials:

echo "URL: $SUPABASE_URL"
echo "KEY length: ${#SUPABASE_SERVICE_KEY}"

Check database connection logs:

grep "database\|connection\|timeout" logs/*.log | grep -i error

Verify network connectivity:
```
ping <supabase-url>
```

Fix:

Verify SUPABASE_URL and SUPABASE_SERVICE_KEY in environment:
```
# In Railway dashboard: Settings → Environment
```
Check Supabase project status in Supabase dashboard
Restart application to refresh connection pool:
```
# In Railway: Deployment → Restart
```
Check if connection pool is exhausted:
- Review repository.py connection pooling settings
- May need to increase pool size if high concurrency
Check for long-running queries blocking connections
Review Supabase logs in Supabase dashboard for errors

Issue: Low confidence scores / high escalation rates

Symptoms:

Escalation rate > 20% (alert threshold)
Agent confidence consistently < 0.7
Many messages being routed to human review

Diagnosis:

Check escalation rate in logs:

grep "requires_escalation" logs/*.log | grep true | wc -l

Review escalation reasons:
```
grep "escalation_reason" logs/*.log
```

Analyze confidence score distribution:

grep "confidence" logs/*.log | cut -d= -f2 | sort -n | tail -20

Check if reference document is empty or corrupted:

# Via Supabase: SELECT * FROM reference_documents LIMIT 1;

Fix:

If reference document is incomplete:
- Run setup again: uv run python -m src.cli.setup_cli setup
- Verify website is still accessible and content is relevant
- Check for website structure changes
If confidence threshold is too strict:
- Review current threshold (0.7) in src/services/agent_service.py line 57
- Consider lowering if threshold is unreasonably high
- Document any changes to RUNBOOK.md
If question types are out of scope:
- Update agent system prompt with new use cases
- Add examples to the agent context
- Review GUARDRAILS.md for escalation rules

Monitor after changes:

uv run pytest tests/integration/test_agent_integration.py -v

Issue: Message processing failures

Symptoms:

Webhook returns 200 but message not processed
Agent service times out
Message appears in logs but no response sent

Diagnosis:

Check webhook processing logs:

grep "POST /webhook\|process_message" logs/*.log

Verify message payload format:

grep "MessengerWebhookPayload" logs/*.log | grep -i error

Check for async task completion:

grep "background_task\|BackgroundTasks" logs/*.log

Check agent service performance:

grep "Agent response latency\|timeout" logs/*.log

Fix:

Verify webhook payload is valid JSON:
- Check Facebook webhook documentation
- Test with manual webhook call: uv run pytest tests/e2e/test_webhook_verification.py
Check agent service latency:
- If > 2s, investigate Copilot SDK response time
- May need to increase timeouts in config
Verify background tasks are completing:
- Check FastAPI app state: app.state.background_tasks
- Ensure no unhandled exceptions in task
Review message processing flow in src/api/webhook.py

Issue: Webhook verification fails

Symptoms:

Facebook webhook setup fails during verification
403 Forbidden on GET /webhook
Facebook shows "Webhook URL couldn't be validated"

Diagnosis:

Check verify token:
```
echo $FACEBOOK_VERIFY_TOKEN
```

Check webhook endpoint is responding:

curl -X GET "http://localhost:8000/webhook?hub.mode=subscribe&hub.challenge=test&hub.verify_token=$FACEBOOK_VERIFY_TOKEN"

Check webhook logs:

grep "verify_webhook\|webhook verification" logs/*.log

Fix:

Verify FACEBOOK_VERIFY_TOKEN matches in both:
- .env file locally
- Railway environment variables
- Facebook App Webhook Settings
Ensure webhook URL is publicly accessible:
- Railway URL should be: https://<railway-url>/webhook
- Test with curl from another machine
Check webhook endpoint code in src/api/webhook.py:
- Verify verification logic is correct
- Ensure token comparison is exact (case-sensitive)

Test webhook verification:

uv run pytest tests/e2e/test_webhook_verification.py -v

Debug Commands

Health Checks

# Check application health
curl http://localhost:8000/health

# Check Copilot SDK health
curl http://localhost:5909/health

# Check Supabase connection
curl -X POST https://<supabase-url>/rest/v1/rpc/health_check \
  -H "apikey: $SUPABASE_SERVICE_KEY"

# Check Facebook API connectivity
curl "https://graph.facebook.com/v18.0/me?access_token=$FACEBOOK_PAGE_ACCESS_TOKEN"

Viewing Logs

Local Development (Console Logs):

# View application logs with debug level
LOG_LEVEL=DEBUG uv run uvicorn src.main:app --reload

# Search logs for errors
grep -i error logs/*.log

# Search logs for specific service
grep "CopilotService\|FacebookService\|AgentService" logs/*.log

# Filter by log level
grep "\[ERROR\]\|\[WARNING\]" logs/*.log

# Real-time log monitoring
tail -f logs/*.log

Logfire Structured Logs:

# Logfire automatically instruments FastAPI, Pydantic, and services
# All logs are structured with correlation IDs for request tracing

# Search for correlation IDs to trace a request
grep "correlation_id" logs/*.log

# View request/response traces
grep "POST /webhook\|GET /health" logs/*.log

# View agent execution traces
grep "Processing agent response\|Agent response generated" logs/*.log

# View database operation timing
grep "database_query\|query_duration" logs/*.log

# If using Logfire cloud (with LOGFIRE_TOKEN set):
# - View logs in Logfire dashboard
# - Use correlation IDs to trace complete request flows
# - Filter by service, log level, or time range

Database Queries

# Connect to Supabase (via Supabase CLI)
supabase db pull

# Query bot configurations
# Via Supabase dashboard: SELECT * FROM bot_configurations;

# Query recent messages
# Via Supabase dashboard: SELECT * FROM message_history ORDER BY created_at DESC LIMIT 100;

# Query reference documents
# Via Supabase dashboard: SELECT id, bot_config_id, LENGTH(content) as content_size FROM reference_documents;

Test REPL conversation persistence

Test conversations from Test the bot (in-flow) or uv run python -m src.cli.setup_cli test are stored in Supabase.

Tables: test_sessions (one per REPL run: reference_doc_id, source_url, tone) and test_messages (each user/bot exchange).
Session ID: When a test REPL starts, the CLI prints Session ID: <uuid> — view in Supabase: test_sessions / test_messages. Use that UUID to filter.
View in Supabase: Open Table Editor (or SQL) → test_sessions for config, test_messages for history. Filter test_messages by test_session_id = the echoed session ID to see the current run.
If Supabase is unavailable during a test run, the CLI warns and the REPL continues without persisting.

Agent Service Debugging

# Run agent with debug output
LOG_LEVEL=DEBUG uv run uvicorn src.main:app --reload

# Test agent with sample message
uv run python -c "
from src.services.agent_service import MessengerAgentService
from src.services.copilot_service import CopilotService
from src.models.agent_models import AgentContext
import asyncio

async def test():
    copilot = CopilotService('http://localhost:5909')
    agent = MessengerAgentService(copilot)
    context = AgentContext(
        bot_config_id='test-123',
        reference_doc='# Test\nThis is a test document.',
        tone='professional',
        recent_messages=[]
    )
    response = await agent.respond(context, 'What is this about?')
    print(f'Response: {response}')

asyncio.run(test())
"

# Run evaluation tests
uv run pytest tests/unit/test_agent_service.py -v --hypothesis-show-statistics

# View agent execution traces in Logfire
grep -i "processing agent response\|agent response generated\|confidence\|escalation" logs/*.log

# Trace a specific request by correlation ID
# (Get correlation_id from webhook logs, then search)
grep "correlation_id=<id>" logs/*.log

Copilot SDK Debugging

# Check Copilot SDK status
uv run python -c "
from src.services.copilot_service import CopilotService
import asyncio

async def check():
    copilot = CopilotService('http://localhost:5909')
    available = await copilot.is_available()
    print(f'Copilot available: {available}')

asyncio.run(check())
"

# Test Copilot fallback to OpenAI
COPILOT_ENABLED=false uv run uvicorn src.main:app

# Monitor Copilot response times in Logfire logs
grep -i "copilot.*response\|copilot.*timing\|copilot.*duration" logs/*.log

# Check for fallback events
grep -i "fallback.*openai\|using.*openai.*fallback" logs/*.log

# View Copilot health check logs
grep -i "copilot.*health\|copilot.*available" logs/*.log

Facebook API Testing

# Test webhook by sending test message
uv run pytest tests/e2e/test_webhook_message_flow.py -v

# Verify page token is valid
curl "https://graph.facebook.com/v18.0/debug_token?input_token=$FACEBOOK_PAGE_ACCESS_TOKEN&access_token=$FACEBOOK_PAGE_ACCESS_TOKEN"

# Send test message to Facebook
curl -X POST https://graph.facebook.com/v18.0/me/messages \
  -H "Content-Type: application/json" \
  -d "{
    \"recipient\": {\"id\": \"<user_id>\"},
    \"message\": {\"text\": \"Test message\"}
  }" \
  -d "access_token=$FACEBOOK_PAGE_ACCESS_TOKEN"

Testing & Coverage

# Run all tests
uv run pytest

# Run with coverage report
uv run pytest --cov=src --cov-report=term-missing

# Run specific test category
uv run pytest tests/unit/ -v
uv run pytest tests/integration/ -v
uv run pytest tests/e2e/ -v

# Run Hypothesis tests with statistics
uv run pytest tests/unit/test_hypothesis.py -v --hypothesis-show-statistics

# Run stateful tests
uv run pytest tests/stateful/ -v

Alert Thresholds

Alert	Threshold	Action	Monitoring
Response Latency (p95)	> 2 seconds	Investigate Copilot SDK or database	Check Logfire traces for `agent_response_latency_seconds` or request timing
Error Rate	> 2% for 5 min	Page on-call engineer	Monitor HTTP 5xx errors in Logfire logs or Sentry
Escalation Rate	> 20%	Review agent prompt & reference doc	Count `requires_escalation=true` in Logfire logs
Copilot SDK Availability	< 99% uptime	Restart Copilot CLI or fallback to OpenAI	Monitor health check frequency in Logfire logs
Copilot Fallback Rate	> 5%	Investigate Copilot SDK stability	Count OpenAI fallback events in Logfire logs
Facebook API Error Rate	> 10% (any HTTP error)	Check token/rate limits	Monitor HTTP errors in Logfire `facebook_service` logs
Database Connection Failures	> 3 consecutive	Restart app or check Supabase	Monitor connection timeouts in Logfire logs
Message Processing Timeout	> 30 seconds	Investigate bottleneck (Copilot, DB, FB API)	Check task processing latency in Logfire traces
Logfire Logging Failures	Any	Check Logfire configuration	Monitor for missing correlation IDs or structured log format issues

Service-Specific Troubleshooting

CopilotService (`src/services/copilot_service.py`)

Health Check:

curl http://localhost:5909/health

Common Issues:

Service unavailable: Check Copilot CLI is running, port 5909 is accessible
Timeout errors: Increase timeout (default 2 seconds) in is_available() method
Fallback behavior: Automatically uses OpenAI when unavailable, logged with logfire.info()

Fallback Logic:

If is_available() returns False, agent uses OpenAI instead of Copilot
Fallback is logged with structured Logfire logs, monitor with: grep -i "fallback.*openai\|using.*openai.*fallback" logs/*.log
Logfire traces include timing, health check status, and fallback events with correlation IDs

Logfire Logging:

Health check timing and availability status logged
API call success/failure with response times
Fallback events include context (reason, timing, response)
All logs include correlation IDs for request tracing

MessengerAgentService (`src/services/agent_service.py`)

Common Issues:

Low confidence scores: Reference document incomplete or question out of scope
Off-topic responses: Agent system prompt needs examples of in-scope queries
Response too long: Responses are truncated to 300 characters (Facebook Messenger limit)
Escalation loops: Check escalation reason in logs

Debugging:

Add debug output: LOG_LEVEL=DEBUG in environment
Review agent logs: grep -i "messenger.*agent\|processing agent response\|agent response generated" logs/*.log
Check confidence threshold: Currently 0.7 in respond() method
View structured Logfire logs for confidence scores, escalation decisions, and timing
Trace complete request flow using correlation IDs from Logfire logs

FacebookService (`src/services/facebook_service.py`)

Common Issues:

401 Unauthorized: Token expired or invalid
429 Rate Limited: Too many requests in short time
400 Bad Request: Message format invalid or recipient ID wrong

Debugging:

# Check token validity
curl "https://graph.facebook.com/v18.0/me?access_token=$FACEBOOK_PAGE_ACCESS_TOKEN"

# Test message send
curl -X POST https://graph.facebook.com/v18.0/me/messages \
  -d "access_token=$FACEBOOK_PAGE_ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"recipient":{"id":"<user_id>"},"message":{"text":"test"}}'

Retry Logic:

Implements exponential backoff (max 3 retries)
Check retry logs: grep "retry\|backoff" logs/*.log

ScraperService (`src/services/scraper.py`)

Common Issues:

Timeout errors: Website too slow, network issues
Parse errors: HTML structure changed or unexpected format
Empty content: Website blocks scraping or content not found

Debugging:

# Test scraping manually
uv run python -c "
from src.services.scraper import scrape_website
import asyncio

async def test():
    chunks = await scrape_website('https://example.com')
    print(f'Chunks: {len(chunks)}')

asyncio.run(test())
"

# Check scraper logs
grep "ScraperService\|scrape_website\|chunk" logs/*.log

Repository (`src/db/repository.py`)

Common Issues:

Connection pool exhausted: Too many concurrent requests
Query timeouts: Large dataset or slow query
Foreign key constraints: Data integrity issues

Debugging:

# Check active connections in Supabase dashboard
# Settings → Database → Connections

# Monitor query performance in Logfire logs
grep -i "database.*query\|query.*duration\|database.*timing" logs/*.log

# Check for connection pool issues
grep -i "pool.*exhausted\|connection.*failed\|database.*error" logs/*.log

# View database operation traces with correlation IDs
grep "correlation_id" logs/*.log | grep -i "database\|repository"

Logfire Logging:

Database operation timing logged for all queries
Query success/failure rates tracked
Bot configuration lookups include timing
Message history storage metrics logged
All operations include correlation IDs for request tracing

Logfire Observability (`src/logging_config.py`)

Configuration:

Centralized logging setup in src/logging_config.py
Environment-aware: Console formatting for local, JSON for production
FastAPI and Pydantic instrumentation enabled automatically
Optional cloud logging with LOGFIRE_TOKEN environment variable

Common Issues:

Logs not appearing: Check LOG_LEVEL environment variable (default: INFO)
Missing correlation IDs: Ensure CorrelationIDMiddleware is first middleware
PII in logs: Verify logfire_enable_pii_masking=True in settings
Cloud logging not working: Check LOGFIRE_TOKEN is set correctly

Debugging:

# Check Logfire configuration
uv run python -c "
from src.config import get_settings
settings = get_settings()
print(f'Log Level: {settings.log_level}')
print(f'PII Masking: {settings.logfire_enable_pii_masking}')
print(f'Request Logging: {settings.logfire_enable_request_logging}')
print(f'Logfire Token Set: {bool(settings.logfire_token)}')
"

# Verify Logfire initialization
grep -i "logfire.*configure\|logfire.*initialized" logs/*.log

# Check for correlation IDs in logs
grep "correlation_id" logs/*.log | head -5

# View structured log format
tail -20 logs/*.log | grep -v "^$"

Logfire Features:

Request Tracing: Automatic FastAPI request/response tracing with timing
Pydantic Validation: Model validation errors logged automatically
PydanticAI Tracing: Agent execution and decision logging
Correlation IDs: Request tracing across all services
Structured Logs: JSON format in production for log aggregation
PII Masking: Automatic masking of sensitive data (tokens, PII)

Quick Reference

Environment Variable Checklist

# Required variables (will fail without these)
✓ FACEBOOK_PAGE_ACCESS_TOKEN    # Long-lived page token
✓ FACEBOOK_VERIFY_TOKEN         # Custom webhook token
✓ SUPABASE_URL                  # Supabase project URL
✓ SUPABASE_SERVICE_KEY          # Service role key

# Optional but recommended
✓ COPILOT_CLI_HOST              # Default: http://localhost:5909
✓ COPILOT_ENABLED               # Default: True
✓ OPENAI_API_KEY                # Fallback LLM
✓ LOGFIRE_TOKEN                 # Logfire cloud logging (optional, enables cloud dashboard)
✓ LOG_LEVEL                     # Default: INFO (DEBUG, INFO, WARNING, ERROR, CRITICAL)
✓ LOGFIRE_ENABLE_PII_MASKING    # Default: True (mask sensitive data in logs)
✓ LOGFIRE_ENABLE_REQUEST_LOGGING # Default: True (HTTP request/response logging)
✓ SENTRY_DSN                    # Error tracking (optional)

# Check all are set
env | grep -i facebook
env | grep -i supabase
env | grep -i copilot
env | grep -i openai

Common Error Messages & Solutions

Error	Cause	Solution
"Failed to connect to Supabase"	Invalid credentials or network issue	Verify SUPABASE_URL and SUPABASE_SERVICE_KEY
"Copilot SDK unavailable, using OpenAI fallback"	Copilot CLI not running or unreachable	Start GitHub Copilot CLI: `github-copilot-cli start`
"Facebook API returned 401"	Invalid or expired token	Generate new page access token in Facebook App
"Webhook verification failed"	Token mismatch	Verify FACEBOOK_VERIFY_TOKEN matches Facebook settings
"Agent confidence too low"	Question out of scope or poor reference doc	Review reference document and update if needed
"Database connection timeout"	Connection pool exhausted or network issue	Restart app or check Supabase status
"Logfire configuration failed"	Invalid Logfire token or network issue	Verify LOGFIRE_TOKEN (if using cloud) or check local logging setup
"Missing correlation ID in logs"	CorrelationIDMiddleware not properly configured	Ensure middleware is added first in `src/main.py`
"PII detected in logs"	PII masking disabled or misconfigured	Set `LOGFIRE_ENABLE_PII_MASKING=True` and verify `mask_pii()` usage

Deployment Verification Steps

Pre-deployment checklist:

# Format code
uv run ruff format .

# Run linter
uv run ruff check .

# Run all tests
uv run pytest --cov=src --cov-report=term-missing

Post-deployment (on Railway):

# Health check
curl https://<railway-url>/health

# Verify webhook endpoint
curl https://<railway-url>/webhook?hub.mode=subscribe&hub.verify_token=test&hub.challenge=test123

Environment variable verification:
- Check Railway Environment variables are set
- Verify no secrets in logs: grep -i token logs/*.log (should show masked tokens if PII masking enabled)
- Verify Logfire configuration: Check LOG_LEVEL, LOGFIRE_TOKEN (if using cloud), LOGFIRE_ENABLE_PII_MASKING

Webhook Configuration Verification

Facebook App Setup:
- Go to Facebook App → Messenger → Settings
- Webhook URL: https://<railway-url>/webhook
- Verify Token: Matches FACEBOOK_VERIFY_TOKEN
- Subscribe to: messages, messaging_postbacks

Test Webhook:

# Local testing
curl -X POST http://localhost:8000/webhook \
  -H "Content-Type: application/json" \
  -d '{"object":"page","entry":[{"messaging":[{"sender":{"id":"test"},"message":{"text":"hello"}}]}]}'

Verify in Facebook:
- Check webhook status in App Dashboard
- Should show "Verified" status
- Review recent requests/errors

Maintenance

This runbook is a living document and should be updated when:

New operational issues are discovered — Add to "Common Issues & Fixes" section
New debugging procedures are developed — Add to "Debug Commands" section
New alert thresholds are established — Update "Alert Thresholds" table
Service behavior changes — Update "Service-Specific Troubleshooting" section
Environment or deployment changes — Update relevant sections

For instructions on when and how to update this runbook, see:

AGENTS.md — "Operational Documentation Maintenance" section
TESTING.md — "Operational Issue Documentation" section

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUNBOOK.md

Common Issues & Fixes

Issue: Agent returns off-topic responses

Issue: Copilot SDK unavailable / fallback to OpenAI

Issue: Facebook API errors / rate limiting

Issue: Database connection failures

Issue: Low confidence scores / high escalation rates

Issue: Message processing failures

Issue: Webhook verification fails

Debug Commands

Health Checks

Viewing Logs

Database Queries

Test REPL conversation persistence

Agent Service Debugging

Copilot SDK Debugging

Facebook API Testing

Testing & Coverage

Alert Thresholds

Service-Specific Troubleshooting

CopilotService (`src/services/copilot_service.py`)

MessengerAgentService (`src/services/agent_service.py`)

FacebookService (`src/services/facebook_service.py`)

ScraperService (`src/services/scraper.py`)

Repository (`src/db/repository.py`)

Logfire Observability (`src/logging_config.py`)

Quick Reference

Environment Variable Checklist

Common Error Messages & Solutions

Deployment Verification Steps

Webhook Configuration Verification

Maintenance

FilesExpand file tree

RUNBOOK.md

Latest commit

History

RUNBOOK.md

File metadata and controls

RUNBOOK.md

Common Issues & Fixes

Issue: Agent returns off-topic responses

Issue: Copilot SDK unavailable / fallback to OpenAI

Issue: Facebook API errors / rate limiting

Issue: Database connection failures

Issue: Low confidence scores / high escalation rates

Issue: Message processing failures

Issue: Webhook verification fails

Debug Commands

Health Checks

Viewing Logs

Database Queries

Test REPL conversation persistence

Agent Service Debugging

Copilot SDK Debugging

Facebook API Testing

Testing & Coverage

Alert Thresholds

Service-Specific Troubleshooting

CopilotService (src/services/copilot_service.py)

MessengerAgentService (src/services/agent_service.py)

FacebookService (src/services/facebook_service.py)

ScraperService (src/services/scraper.py)

Repository (src/db/repository.py)

Logfire Observability (src/logging_config.py)

Quick Reference

Environment Variable Checklist

Common Error Messages & Solutions

Deployment Verification Steps

Webhook Configuration Verification

Maintenance

CopilotService (`src/services/copilot_service.py`)

MessengerAgentService (`src/services/agent_service.py`)

FacebookService (`src/services/facebook_service.py`)

ScraperService (`src/services/scraper.py`)

Repository (`src/db/repository.py`)

Logfire Observability (`src/logging_config.py`)