Scaling Guide

PATAS Scaling Guide

This guide covers horizontal scaling strategies, multi-instance deployment, and distributed processing for PATAS.

Scaling Architecture

Single Instance (Current)

Architecture:

[PATAS Instance]
    ↓
[PostgreSQL]
    ↓
[Redis Cache] (optional)

Limitations:

Single point of failure
Limited by single server resources
Pattern mining runs on single instance
Database becomes bottleneck at scale

Multi-Instance (Recommended for Production)

Architecture:

[Load Balancer]
    ↓
[PATAS Instance 1] [PATAS Instance 2] [PATAS Instance 3]
    ↓                    ↓                    ↓
[PostgreSQL Primary] ← [PostgreSQL Replica] ← [PostgreSQL Replica]
    ↓
[Redis Cluster]

Benefits:

High availability (no single point of failure)
Horizontal scaling (add instances as needed)
Load distribution
Database read scaling (read replicas)

Horizontal Scaling Strategies

1. API Layer Scaling

Stateless API design:

PATAS API is stateless (no session state)
Any instance can handle any request
No session affinity required

Load balancer configuration:

upstream patas_backend {
    least_conn;  # Use least connections algorithm
    server patas1:8000;
    server patas2:8000;
    server patas3:8000;
}

server {
    listen 80;
    location / {
        proxy_pass http://patas_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Health checks:

Configure health check endpoint: /api/v1/health
Remove unhealthy instances from pool
Automatic failover

Scaling triggers:

CPU usage > 70%
API latency > 500ms (P95)
Error rate > 1%
Request queue depth > 100

2. Database Scaling

Read replicas:

Use read replicas for evaluation queries
Reduces load on primary database
Improves concurrent evaluation performance

PostgreSQL replication setup:

-- On primary
CREATE PUBLICATION patas_publication;
ALTER PUBLICATION patas_publication ADD TABLE messages, patterns, rules, rule_evaluations;

-- On replica
CREATE SUBSCRIPTION patas_subscription
CONNECTION 'host=primary_host dbname=patas user=replicator'
PUBLICATION patas_publication;

Connection routing:

# Route read queries to replicas
read_db = AsyncSessionLocal(replica_url)
write_db = AsyncSessionLocal(primary_url)

# Use read_db for evaluation queries
# Use write_db for writes

Sharding (for very large deployments):

Partition messages table by timestamp
Shard by time windows (e.g., monthly shards)
Route queries to appropriate shard

3. Distributed Caching

Redis cluster:

Shared embedding cache across instances
Reduces redundant API calls
Improves performance

Redis cluster setup:

redis:
  cluster:
    enabled: true
    nodes:
      - redis1:6379
      - redis2:6379
      - redis3:6379

Cache configuration:

embedding_cache:
  provider: redis
  cluster: true
  ttl_seconds: 86400
  max_size: 1000000

4. Pattern Mining Scaling

Time window partitioning:

Split data into time windows (daily/weekly)
Process windows in parallel
Merge results after completion

Parallel processing:

import asyncio
from datetime import timedelta

async def parallel_mining(db, days=30, window_days=5):
    windows = [
        (i, i + window_days)
        for i in range(0, days, window_days)
    ]
    
    tasks = [
        mine_patterns_window(db, start, end)
        for start, end in windows
    ]
    
    results = await asyncio.gather(*tasks)
    return merge_patterns(results)

Coordination:

Use database locks to prevent concurrent mining conflicts
Coordinate via external scheduler (Kubernetes CronJob, etc.)
Or run on dedicated mining instance

Distributed mining (future):

Partition data by time windows
Process on different instances
Merge results centrally

Multi-Instance Deployment

Docker Compose Example

version: '3.8'

services:
  patas1:
    image: patas:latest
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/patas
      - REDIS_URL=redis://redis:6379
    ports:
      - "8001:8000"
  
  patas2:
    image: patas:latest
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/patas
      - REDIS_URL=redis://redis:6379
    ports:
      - "8002:8000"
  
  patas3:
    image: patas:latest
    environment:
      - DATABASE_URL=postgresql://user:pass@postgres:5432/patas
      - REDIS_URL=redis://redis:6379
    ports:
      - "8003:8000"
  
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
  
  postgres:
    image: postgres:15
    environment:
      - POSTGRES_DB=patas
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
    volumes:
      - postgres_data:/var/lib/postgresql/data
  
  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: patas
spec:
  replicas: 3
  selector:
    matchLabels:
      app: patas
  template:
    metadata:
      labels:
        app: patas
    spec:
      containers:
      - name: patas
        image: patas:latest
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: patas-secrets
              key: database-url
        - name: REDIS_URL
          value: "redis://redis-service:6379"
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "2Gi"
            cpu: "2"
          limits:
            memory: "4Gi"
            cpu: "4"
        livenessProbe:
          httpGet:
            path: /api/v1/health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /api/v1/health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: patas-service
spec:
  selector:
    app: patas
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: patas-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: patas
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Coordination and State Management

Pattern Mining Coordination

Option 1: Dedicated Mining Instance

One instance handles all pattern mining
Other instances handle API requests
Simplest approach, no coordination needed

Option 2: Database Locks

Use PostgreSQL advisory locks
Only one instance can mine at a time
Automatic coordination via database

async def mine_with_lock(db, days=7):
    async with db.begin():
        # Acquire advisory lock
        result = await db.execute(
            text("SELECT pg_try_advisory_lock(12345)")
        )
        if not result.scalar():
            raise Exception("Another instance is mining")
        
        try:
            # Run pattern mining
            return await mine_patterns(db, days=days)
        finally:
            # Release lock
            await db.execute(text("SELECT pg_advisory_unlock(12345)"))

Option 3: External Scheduler

Use Kubernetes CronJob, Airflow, etc.
Schedule pattern mining on single instance
No coordination needed

Rule Promotion Coordination

Database transactions:

Use database transactions for rule promotion
Prevents race conditions
Automatic rollback on conflicts

async def promote_rule_safely(db, rule_id):
    async with db.begin():
        # Check rule status
        rule = await db.get(Rule, rule_id)
        if rule.status != RuleStatus.SHADOW:
            return False
        
        # Promote rule
        rule.status = RuleStatus.ACTIVE
        await db.commit()
        return True

Monitoring Multi-Instance Deployments

Key Metrics

Per-instance metrics:

API latency per instance
Request rate per instance
Error rate per instance
Resource usage per instance

Aggregate metrics:

Total API requests across all instances
Average latency across instances
Database connection pool usage
Cache hit rate

Grafana Dashboard

Instance comparison:

Compare performance across instances
Identify underperforming instances
Track instance health

Load distribution:

Requests per instance
CPU/memory usage per instance
Database connections per instance

Best Practices

Start with 3 instances for high availability
Use load balancer with health checks
Enable read replicas for database scaling
Use Redis cluster for distributed caching
Coordinate pattern mining (dedicated instance or locks)
Monitor per-instance metrics for performance issues
Scale based on metrics (CPU, latency, error rate)
Use database transactions for state changes
Archive old data to maintain performance
Test failover scenarios regularly

Limitations and Future Enhancements

Current Limitations

Pattern mining is single-instance (can be parallelized by time windows)
No automatic service discovery
Manual coordination for pattern mining
Database becomes bottleneck at very large scales

Planned Enhancements

Distributed pattern mining: Automatic coordination across instances
Service discovery: Automatic instance registration/discovery
Database sharding: Automatic data partitioning
Message queue: Async processing for high-volume ingestion

Additional Resources

Performance Guide - Performance benchmarks
Deployment Guide - Production deployment
Monitoring Guide - Monitoring and alerting

For scaling questions or issues, please open an issue on GitHub.

Scaling Guide

PATAS Scaling Guide

Scaling Architecture

Single Instance (Current)

Multi-Instance (Recommended for Production)

Horizontal Scaling Strategies

1. API Layer Scaling

2. Database Scaling

3. Distributed Caching

4. Pattern Mining Scaling

Multi-Instance Deployment

Docker Compose Example

Kubernetes Deployment

Coordination and State Management

Pattern Mining Coordination

Rule Promotion Coordination

Monitoring Multi-Instance Deployments

Key Metrics

Grafana Dashboard

Best Practices

Limitations and Future Enhancements

Current Limitations

Planned Enhancements

Additional Resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!