-
Notifications
You must be signed in to change notification settings - Fork 0
Scaling Guide
Nick edited this page Mar 10, 2026
·
2 revisions
This guide covers horizontal scaling strategies, multi-instance deployment, and distributed processing for PATAS.
Architecture:
[PATAS Instance]
↓
[PostgreSQL]
↓
[Redis Cache] (optional)
Limitations:
- Single point of failure
- Limited by single server resources
- Pattern mining runs on single instance
- Database becomes bottleneck at scale
Architecture:
[Load Balancer]
↓
[PATAS Instance 1] [PATAS Instance 2] [PATAS Instance 3]
↓ ↓ ↓
[PostgreSQL Primary] ← [PostgreSQL Replica] ← [PostgreSQL Replica]
↓
[Redis Cluster]
Benefits:
- High availability (no single point of failure)
- Horizontal scaling (add instances as needed)
- Load distribution
- Database read scaling (read replicas)
Stateless API design:
- PATAS API is stateless (no session state)
- Any instance can handle any request
- No session affinity required
Load balancer configuration:
upstream patas_backend {
least_conn; # Use least connections algorithm
server patas1:8000;
server patas2:8000;
server patas3:8000;
}
server {
listen 80;
location / {
proxy_pass http://patas_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}Health checks:
- Configure health check endpoint:
/api/v1/health - Remove unhealthy instances from pool
- Automatic failover
Scaling triggers:
- CPU usage > 70%
- API latency > 500ms (P95)
- Error rate > 1%
- Request queue depth > 100
Read replicas:
- Use read replicas for evaluation queries
- Reduces load on primary database
- Improves concurrent evaluation performance
PostgreSQL replication setup:
-- On primary
CREATE PUBLICATION patas_publication;
ALTER PUBLICATION patas_publication ADD TABLE messages, patterns, rules, rule_evaluations;
-- On replica
CREATE SUBSCRIPTION patas_subscription
CONNECTION 'host=primary_host dbname=patas user=replicator'
PUBLICATION patas_publication;Connection routing:
# Route read queries to replicas
read_db = AsyncSessionLocal(replica_url)
write_db = AsyncSessionLocal(primary_url)
# Use read_db for evaluation queries
# Use write_db for writesSharding (for very large deployments):
- Partition messages table by timestamp
- Shard by time windows (e.g., monthly shards)
- Route queries to appropriate shard
Redis cluster:
- Shared embedding cache across instances
- Reduces redundant API calls
- Improves performance
Redis cluster setup:
redis:
cluster:
enabled: true
nodes:
- redis1:6379
- redis2:6379
- redis3:6379Cache configuration:
embedding_cache:
provider: redis
cluster: true
ttl_seconds: 86400
max_size: 1000000Time window partitioning:
- Split data into time windows (daily/weekly)
- Process windows in parallel
- Merge results after completion
Parallel processing:
import asyncio
from datetime import timedelta
async def parallel_mining(db, days=30, window_days=5):
windows = [
(i, i + window_days)
for i in range(0, days, window_days)
]
tasks = [
mine_patterns_window(db, start, end)
for start, end in windows
]
results = await asyncio.gather(*tasks)
return merge_patterns(results)Coordination:
- Use database locks to prevent concurrent mining conflicts
- Coordinate via external scheduler (Kubernetes CronJob, etc.)
- Or run on dedicated mining instance
Distributed mining (future):
- Partition data by time windows
- Process on different instances
- Merge results centrally
version: '3.8'
services:
patas1:
image: patas:latest
environment:
- DATABASE_URL=postgresql://user:pass@postgres:5432/patas
- REDIS_URL=redis://redis:6379
ports:
- "8001:8000"
patas2:
image: patas:latest
environment:
- DATABASE_URL=postgresql://user:pass@postgres:5432/patas
- REDIS_URL=redis://redis:6379
ports:
- "8002:8000"
patas3:
image: patas:latest
environment:
- DATABASE_URL=postgresql://user:pass@postgres:5432/patas
- REDIS_URL=redis://redis:6379
ports:
- "8003:8000"
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
postgres:
image: postgres:15
environment:
- POSTGRES_DB=patas
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
command: redis-server --appendonly yes
volumes:
- redis_data:/data
volumes:
postgres_data:
redis_data:apiVersion: apps/v1
kind: Deployment
metadata:
name: patas
spec:
replicas: 3
selector:
matchLabels:
app: patas
template:
metadata:
labels:
app: patas
spec:
containers:
- name: patas
image: patas:latest
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: patas-secrets
key: database-url
- name: REDIS_URL
value: "redis://redis-service:6379"
ports:
- containerPort: 8000
resources:
requests:
memory: "2Gi"
cpu: "2"
limits:
memory: "4Gi"
cpu: "4"
livenessProbe:
httpGet:
path: /api/v1/health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/v1/health
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: patas-service
spec:
selector:
app: patas
ports:
- port: 80
targetPort: 8000
type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: patas-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: patas
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Option 1: Dedicated Mining Instance
- One instance handles all pattern mining
- Other instances handle API requests
- Simplest approach, no coordination needed
Option 2: Database Locks
- Use PostgreSQL advisory locks
- Only one instance can mine at a time
- Automatic coordination via database
async def mine_with_lock(db, days=7):
async with db.begin():
# Acquire advisory lock
result = await db.execute(
text("SELECT pg_try_advisory_lock(12345)")
)
if not result.scalar():
raise Exception("Another instance is mining")
try:
# Run pattern mining
return await mine_patterns(db, days=days)
finally:
# Release lock
await db.execute(text("SELECT pg_advisory_unlock(12345)"))Option 3: External Scheduler
- Use Kubernetes CronJob, Airflow, etc.
- Schedule pattern mining on single instance
- No coordination needed
Database transactions:
- Use database transactions for rule promotion
- Prevents race conditions
- Automatic rollback on conflicts
async def promote_rule_safely(db, rule_id):
async with db.begin():
# Check rule status
rule = await db.get(Rule, rule_id)
if rule.status != RuleStatus.SHADOW:
return False
# Promote rule
rule.status = RuleStatus.ACTIVE
await db.commit()
return TruePer-instance metrics:
- API latency per instance
- Request rate per instance
- Error rate per instance
- Resource usage per instance
Aggregate metrics:
- Total API requests across all instances
- Average latency across instances
- Database connection pool usage
- Cache hit rate
Instance comparison:
- Compare performance across instances
- Identify underperforming instances
- Track instance health
Load distribution:
- Requests per instance
- CPU/memory usage per instance
- Database connections per instance
- Start with 3 instances for high availability
- Use load balancer with health checks
- Enable read replicas for database scaling
- Use Redis cluster for distributed caching
- Coordinate pattern mining (dedicated instance or locks)
- Monitor per-instance metrics for performance issues
- Scale based on metrics (CPU, latency, error rate)
- Use database transactions for state changes
- Archive old data to maintain performance
- Test failover scenarios regularly
- Pattern mining is single-instance (can be parallelized by time windows)
- No automatic service discovery
- Manual coordination for pattern mining
- Database becomes bottleneck at very large scales
- Distributed pattern mining: Automatic coordination across instances
- Service discovery: Automatic instance registration/discovery
- Database sharding: Automatic data partitioning
- Message queue: Async processing for high-volume ingestion
- Performance Guide - Performance benchmarks
- Deployment Guide - Production deployment
- Monitoring Guide - Monitoring and alerting
For scaling questions or issues, please open an issue on GitHub.