Skip to content

Production Deployment Guide

Nick edited this page Mar 10, 2026 · 3 revisions

PATAS Production Deployment Guide

This guide covers production deployment, configuration, monitoring, and maintenance for PATAS.


Pre-Deployment Checklist

Requirements

  • Database (PostgreSQL 12+ or SQLite for development)
  • Python 3.10+ environment
  • API keys (if using OpenAI/cloud models)
  • Network access to database and external APIs
  • Monitoring setup (Prometheus/Grafana recommended)
  • Backup strategy for database

Configuration

  • Review config.example.yaml and create production config
  • Set environment variables for sensitive data
  • Configure database connection string
  • Set up API keys (if using cloud models)
  • Configure logging level and output
  • Set up monitoring endpoints

Installation

Option 1: Docker (Recommended)

PATAS includes a production-ready Dockerfile with:

  • Multi-stage build for smaller image size
  • Non-root user for security
  • Health checks for orchestration
  • Proper signal handling

Build and run:

# Build production image
docker build -t patas:latest .

# Run with production settings
docker run -d \
  -p 8000:8000 \
  -e ENVIRONMENT=production \
  -e DATABASE_URL=postgresql+asyncpg://user:pass@host:5432/patas \
  -e API_KEYS=your-key:namespace \
  -e LLM_API_KEY=your-llm-key \
  patas:latest

Option 1b: Docker Compose (Full Stack)

Use docker-compose.prod.yml for a complete production stack:

# Create .env file with secrets
cat > .env << EOF
POSTGRES_PASSWORD=secure-password
API_KEYS=your-key:namespace
LLM_API_KEY=sk-your-key
GRAFANA_PASSWORD=admin
EOF

# Start all services
docker-compose -f docker-compose.prod.yml up -d

# Check status
docker-compose -f docker-compose.prod.yml ps

Included services:

  • PostgreSQL 15 with health checks
  • Redis for distributed caching/rate limiting
  • PATAS API with production configuration
  • Prometheus for metrics
  • Grafana for dashboards
  • Alertmanager for alerting

Option 2: Direct Installation

Install dependencies:

poetry install
# or
pip install -e .

Run application:

patas-api
# or
uvicorn app.api.run:app --host 0.0.0.0 --port 8000

Option 3: Kubernetes

See Scaling Guide for Kubernetes deployment examples.


Production Configuration

Database Setup

PostgreSQL configuration:

database:
  url: postgresql://user:password@host:5432/patas
  pool_size: 20
  max_overflow: 20
  pool_timeout: 30
  echo: false  # Disable SQL logging in production

Initialization:

# Initialize database schema
patas init-db
# or via API
curl -X POST http://localhost:8000/api/v1/init-db

Migrations:

# Check migration status
poetry run python scripts/run_migrations.py --status

# Run all pending migrations
poetry run python scripts/run_migrations.py

# Rollback specific migration (if needed)
poetry run python scripts/run_migrations.py --rollback 001

The migration runner:

  • Automatically detects pending migrations
  • Tracks applied migrations in schema_migrations table
  • Supports rollback with --rollback flag
  • Shows detailed status with --status flag

Security Configuration

API keys:

api_keys: "key1:namespace1,key2:namespace2"
default_rate_limit: 100  # Requests per minute

Environment variables:

export DATABASE_URL=postgresql://user:pass@host:5432/patas
export OPENAI_API_KEY=your-key-here
export PRIVACY_MODE=STRICT  # For on-premise

SSL/TLS:

  • Use HTTPS in production (nginx reverse proxy recommended)
  • Configure SSL certificates (Let's Encrypt, etc.)
  • Enable API key authentication

Monitoring Configuration

Prometheus metrics:

monitoring:
  metrics:
    enabled: true
    endpoint: "/metrics"
    port: 8000

Health checks:

  • Endpoint: /api/v1/health
  • Configure in load balancer/kubernetes
  • Check interval: 30 seconds

Logging:

logging:
  level: INFO  # Use WARNING in production for less noise
  file: "/var/log/patas/patas.log"
  rotation_size: 10485760  # 10MB
  backup_count: 5

Deployment Strategies

Blue-Green Deployment

Strategy:

  1. Deploy new version to "green" environment
  2. Run health checks and smoke tests
  3. Switch traffic from "blue" to "green"
  4. Keep "blue" as backup for rollback

Implementation:

  • Use load balancer to route traffic
  • Deploy new version alongside old version
  • Switch traffic gradually (canary deployment)

Rolling Deployment

Strategy:

  1. Deploy new version to subset of instances
  2. Verify health and metrics
  3. Gradually roll out to all instances
  4. Roll back if issues detected

Kubernetes:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0

Zero-Downtime Deployment

Requirements:

  • Stateless API (no session state)
  • Database migrations are backward compatible
  • Health checks configured
  • Load balancer with health checks

Process:

  1. Deploy new version alongside old version
  2. Run database migrations (if needed)
  3. Gradually shift traffic to new version
  4. Monitor metrics and errors
  5. Complete rollout or rollback if issues

Database Management

Backups

Using backup script:

# Run backup
./scripts/backup_database.sh

# Verify latest backup
./scripts/backup_database.sh --verify

# Restore from backup
./scripts/backup_database.sh --restore /backups/patas/patas_20240101.sql.gz

Backup script features:

  • Automatic compression (gzip)
  • Checksum verification (SHA-256)
  • Configurable retention (default: 30 days)
  • Optional S3 upload for off-site storage
  • Detailed logging

Environment variables:

export DATABASE_URL="postgresql+asyncpg://user:pass@host:5432/patas"
export BACKUP_DIR="/backups/patas"
export RETENTION_DAYS=30
export S3_BUCKET="my-backup-bucket"  # Optional

Backup schedule (cron):

# Daily backup at 2 AM
0 2 * * * /path/to/scripts/backup_database.sh >> /var/log/patas-backup.log 2>&1

Backup recommendations:

  • Daily full backups
  • Keep last 30 days locally
  • Upload to off-site storage (S3, GCS)
  • Test restore procedure monthly

Archiving

Archive old messages:

-- Create archive table
CREATE TABLE messages_archive (LIKE messages INCLUDING ALL);

-- Archive messages older than 90 days
INSERT INTO messages_archive
SELECT * FROM messages
WHERE created_at < NOW() - INTERVAL '90 days';

-- Delete archived messages
DELETE FROM messages
WHERE created_at < NOW() - INTERVAL '90 days';

Automated archiving:

# Run weekly via cron
0 2 * * 0 /path/to/archive_old_messages.sh

Maintenance

Vacuum and analyze:

-- Regular maintenance
VACUUM ANALYZE messages;
VACUUM ANALYZE patterns;
VACUUM ANALYZE rules;
VACUUM ANALYZE rule_evaluations;

Index maintenance:

  • Monitor index usage
  • Remove unused indexes
  • Add indexes for slow queries

Monitoring and Alerting

Key Metrics to Monitor

System health:

  • API latency (P95 < 500ms)
  • Error rate (< 1%)
  • System uptime (> 99.9%)
  • Database connection pool usage

Business metrics:

  • Patterns discovered per day
  • Rules active count
  • Average rule precision
  • False positive rate

Resource metrics:

  • CPU usage (< 70%)
  • Memory usage (< 80%)
  • Database connections (< 80% of pool)
  • Disk usage (< 80%)

Alerting Rules

Prometheus alerting:

groups:
  - name: patas_alerts
    rules:
      - alert: HighAPILatency
        expr: histogram_quantile(0.95, rate(patas_api_latency_seconds_bucket[5m])) > 1
        for: 5m
        annotations:
          summary: "High API latency detected"
      
      - alert: HighErrorRate
        expr: rate(patas_api_errors_total[5m]) > 0.01
        for: 5m
        annotations:
          summary: "High error rate detected"
      
      - alert: LowRulePrecision
        expr: avg(patas_rules_precision) < 0.90
        for: 1h
        annotations:
          summary: "Average rule precision below threshold"

Grafana Dashboards

Import dashboard:

  • Use provided grafana-dashboard.json
  • Import into Grafana
  • Customize based on your needs

Key dashboards:

  • System performance (latency, throughput, errors)
  • Business metrics (patterns, rules, precision)
  • Resource usage (CPU, memory, database)

Maintenance Procedures

Regular Maintenance

Daily:

  • Review error logs
  • Check system metrics
  • Monitor rule evaluations

Weekly:

  • Run pattern mining on new data
  • Review deprecated rules
  • Check database size and archiving needs

Monthly:

  • Review and optimize thresholds
  • Analyze performance trends
  • Test backup restore procedure
  • Review and update documentation

Upgrading PATAS

Process:

  1. Review release notes and breaking changes
  2. Test upgrade in staging environment
  3. Backup database
  4. Run database migrations
  5. Deploy new version (blue-green or rolling)
  6. Verify health and metrics
  7. Monitor for issues

Rollback plan:

  • Keep previous version available
  • Database migrations should be reversible
  • Have rollback procedure documented

Troubleshooting

Common issues:

  • High latency: Check database performance, increase resources
  • High error rate: Check logs, verify external services
  • Low pattern discovery: Adjust thresholds, check data quality
  • Database connection issues: Check pool size, network connectivity

Debug mode:

logging:
  level: DEBUG  # Enable for troubleshooting

Security Best Practices

API Security

  • Use HTTPS in production
  • Enable API key authentication
  • Implement rate limiting
  • Validate all inputs
  • Sanitize SQL queries (automatic in PATAS)

Data Security

  • Encrypt database connections (SSL/TLS)
  • Use strong database passwords
  • Enable PII redaction in STRICT mode
  • Regular security audits
  • Keep dependencies updated

Access Control

  • Limit database access to PATAS instances only
  • Use read-only database users for monitoring
  • Rotate API keys regularly
  • Use secrets management (Vault, AWS Secrets Manager, etc.)

See Secret Management for detailed best practices.


Performance Tuning

Database Tuning

PostgreSQL configuration:

shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 20MB
min_wal_size = 1GB
max_wal_size = 4GB

Application Tuning

Connection pooling:

database:
  pool_size: 20  # Adjust based on load
  max_overflow: 20

Caching:

embedding_cache:
  max_size: 100000
  ttl_seconds: 86400

See Performance Guide for detailed tuning.


Disaster Recovery

Backup Strategy

  • Daily database backups
  • Test restore procedure monthly
  • Keep backups off-site
  • Document recovery procedures

Recovery Procedures

Database recovery:

# Restore from backup
psql -h host -U user -d patas < backup.sql

Application recovery:

  • Redeploy from version control
  • Restore configuration
  • Verify health and metrics

RTO/RPO Targets

  • RTO (Recovery Time Objective): < 4 hours
  • RPO (Recovery Point Objective): < 24 hours (daily backups)

Additional Resources

For deployment questions or issues, please open an issue on GitHub.

Clone this wiki locally