Skip to content

Scaling Guide

Nick edited this page Mar 10, 2026 · 2 revisions

PATAS Scaling Guide

This guide covers horizontal scaling strategies, multi-instance deployment, and distributed processing for PATAS.


Scaling Architecture

Single Instance (Current)

Architecture:

[PATAS Instance]
    ↓
[PostgreSQL]
    ↓
[Redis Cache] (optional)

Limitations:

  • Single point of failure
  • Limited by single server resources
  • Pattern mining runs on single instance
  • Database becomes bottleneck at scale

Multi-Instance (Recommended for Production)

Architecture:

[Load Balancer]
    ↓
[PATAS Instance 1] [PATAS Instance 2] [PATAS Instance 3]
    ↓                    ↓                    ↓
[PostgreSQL Primary] ← [PostgreSQL Replica] ← [PostgreSQL Replica]
    ↓
[Redis Cluster]

Benefits:

  • High availability (no single point of failure)
  • Horizontal scaling (add instances as needed)
  • Load distribution
  • Database read scaling (read replicas)

Horizontal Scaling Strategies

1. API Layer Scaling

Stateless API design:

  • PATAS API is stateless (no session state)
  • Any instance can handle any request
  • No session affinity required

Load balancer configuration:

upstream patas_backend {
    least_conn;  # Use least connections algorithm
    server patas1:8000;
    server patas2:8000;
    server patas3:8000;
}

server {
    listen 80;
    location / {
        proxy_pass http://patas_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Health checks:

  • Configure health check endpoint: /api/v1/health
  • Remove unhealthy instances from pool
  • Automatic failover

Scaling triggers:

  • CPU usage > 70%
  • API latency > 500ms (P95)
  • Error rate > 1%
  • Request queue depth > 100

2. Database Scaling

Read replicas:

  • Use read replicas for evaluation queries
  • Reduces load on primary database
  • Improves concurrent evaluation performance

PostgreSQL replication setup:

-- On primary
CREATE PUBLICATION patas_publication;
ALTER PUBLICATION patas_publication ADD TABLE messages, patterns, rules, rule_evaluations;

-- On replica
CREATE SUBSCRIPTION patas_subscription
CONNECTION 'host=primary_host dbname=patas user=replicator'
PUBLICATION patas_publication;

Connection routing:

# Route read queries to replicas
read_db = AsyncSessionLocal(replica_url)
write_db = AsyncSessionLocal(primary_url)

# Use read_db for evaluation queries
# Use write_db for writes

Sharding (for very large deployments):

  • Partition messages table by timestamp
  • Shard by time windows (e.g., monthly shards)
  • Route queries to appropriate shard

3. Distributed Caching

Redis cluster:

  • Shared embedding cache across instances
  • Reduces redundant API calls
  • Improves performance

Redis cluster setup:

redis:
  cluster:
    enabled: true
    nodes:
      - redis1:6379
      - redis2:6379
      - redis3:6379

Cache configuration:

embedding_cache:
  provider: redis
  cluster: true
  ttl_seconds: 86400
  max_size: 1000000

4. Pattern Mining Scaling

Time window partitioning:

  • Split data into time windows (daily/weekly)
  • Process windows in parallel
  • Merge results after completion

Parallel processing:

import asyncio
from datetime import timedelta

async def parallel_mining(db, days=30, window_days=5):
    windows = [
        (i, i + window_days)
        for i in range(0, days, window_days)
    ]
    
    tasks = [
        mine_patterns_window(db, start, end)
        for start, end in windows
    ]
    
    results = await asyncio.gather(*tasks)
    return merge_patterns(results)

Coordination:

  • Use database locks to prevent concurrent mining conflicts
  • Coordinate via external scheduler (Kubernetes CronJob, etc.)
  • Or run on dedicated mining instance

Distributed mining (future):

  • Partition data by time windows
  • Process on different instances
  • Merge results centrally

Multi-Instance Deployment

Docker Compose Example

version: '3.8'

services:
  patas1:
    image: patas:latest
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/patas
      - REDIS_URL=redis://redis:6379
    ports:
      - "8001:8000"
  
  patas2:
    image: patas:latest
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/patas
      - REDIS_URL=redis://redis:6379
    ports:
      - "8002:8000"
  
  patas3:
    image: patas:latest
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/patas
      - REDIS_URL=redis://redis:6379
    ports:
      - "8003:8000"
  
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
  
  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=patas
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
    volumes:
      - postgres_data:/var/lib/postgresql/data
  
  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: patas
spec:
  replicas: 3
  selector:
    matchLabels:
      app: patas
  template:
    metadata:
      labels:
        app: patas
    spec:
      containers:
      - name: patas
        image: patas:latest
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: patas-secrets
              key: database-url
        - name: REDIS_URL
          value: "redis://redis-service:6379"
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "2Gi"
            cpu: "2"
          limits:
            memory: "4Gi"
            cpu: "4"
        livenessProbe:
          httpGet:
            path: /api/v1/health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /api/v1/health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: patas-service
spec:
  selector:
    app: patas
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: patas-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: patas
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Coordination and State Management

Pattern Mining Coordination

Option 1: Dedicated Mining Instance

  • One instance handles all pattern mining
  • Other instances handle API requests
  • Simplest approach, no coordination needed

Option 2: Database Locks

  • Use PostgreSQL advisory locks
  • Only one instance can mine at a time
  • Automatic coordination via database
async def mine_with_lock(db, days=7):
    async with db.begin():
        # Acquire advisory lock
        result = await db.execute(
            text("SELECT pg_try_advisory_lock(12345)")
        )
        if not result.scalar():
            raise Exception("Another instance is mining")
        
        try:
            # Run pattern mining
            return await mine_patterns(db, days=days)
        finally:
            # Release lock
            await db.execute(text("SELECT pg_advisory_unlock(12345)"))

Option 3: External Scheduler

  • Use Kubernetes CronJob, Airflow, etc.
  • Schedule pattern mining on single instance
  • No coordination needed

Rule Promotion Coordination

Database transactions:

  • Use database transactions for rule promotion
  • Prevents race conditions
  • Automatic rollback on conflicts
async def promote_rule_safely(db, rule_id):
    async with db.begin():
        # Check rule status
        rule = await db.get(Rule, rule_id)
        if rule.status != RuleStatus.SHADOW:
            return False
        
        # Promote rule
        rule.status = RuleStatus.ACTIVE
        await db.commit()
        return True

Monitoring Multi-Instance Deployments

Key Metrics

Per-instance metrics:

  • API latency per instance
  • Request rate per instance
  • Error rate per instance
  • Resource usage per instance

Aggregate metrics:

  • Total API requests across all instances
  • Average latency across instances
  • Database connection pool usage
  • Cache hit rate

Grafana Dashboard

Instance comparison:

  • Compare performance across instances
  • Identify underperforming instances
  • Track instance health

Load distribution:

  • Requests per instance
  • CPU/memory usage per instance
  • Database connections per instance

Best Practices

  1. Start with 3 instances for high availability
  2. Use load balancer with health checks
  3. Enable read replicas for database scaling
  4. Use Redis cluster for distributed caching
  5. Coordinate pattern mining (dedicated instance or locks)
  6. Monitor per-instance metrics for performance issues
  7. Scale based on metrics (CPU, latency, error rate)
  8. Use database transactions for state changes
  9. Archive old data to maintain performance
  10. Test failover scenarios regularly

Limitations and Future Enhancements

Current Limitations

  • Pattern mining is single-instance (can be parallelized by time windows)
  • No automatic service discovery
  • Manual coordination for pattern mining
  • Database becomes bottleneck at very large scales

Planned Enhancements

  • Distributed pattern mining: Automatic coordination across instances
  • Service discovery: Automatic instance registration/discovery
  • Database sharding: Automatic data partitioning
  • Message queue: Async processing for high-volume ingestion

Additional Resources

For scaling questions or issues, please open an issue on GitHub.

Clone this wiki locally