Skip to content

Performance Guide

Nick edited this page Mar 10, 2026 · 3 revisions

PATAS Performance Guide

This guide provides performance benchmarks, optimization strategies, and hardware recommendations for PATAS deployments.


Performance Benchmarks

Ingestion Performance

Baseline benchmarks (PostgreSQL, 8 CPU cores, 16GB RAM):

Message Volume Throughput Latency (P95) Notes
1K messages ~15,000 msg/s <5ms Small batches, optimal
10K messages ~13,000 msg/s <10ms Typical production load
100K messages ~12,000 msg/s <20ms Large batches
1M messages ~10,000 msg/s <50ms Sustained load

Factors affecting ingestion:

  • Database connection pool size (default: 20)
  • Batch size (default: 10,000 messages per batch)
  • Network latency to database
  • Database write performance

Optimization tips:

  • Increase connection pool for high-volume ingestion
  • Use database write replicas for read-heavy workloads
  • Enable database connection pooling (PgBouncer for PostgreSQL)

Pattern Mining Performance

Two-stage pipeline benchmarks (100K messages, 15% spam):

Configuration Stage 1 Time Stage 2 Time Total Time Cost Savings
Two-stage (no LLM, no embeddings) 5.2s 0s 5.2s 100%
Two-stage (embeddings only) 5.2s 1.8s 7.0s 70%
Two-stage (embeddings + LLM) 5.2s 12.5s 17.7s 60%
Single-stage (embeddings + LLM) N/A 45.3s 45.3s 0%

Scaling benchmarks:

Message Volume Two-Stage Time Single-Stage Time Speedup
100K messages 7.0s 45.3s 6.5x
1M messages 1.2 min 8.5 min 7.1x
10M messages 12 min 95 min 7.9x

Pattern discovery results:

  • 100K messages: 38 patterns, 26 rules in 7 seconds
  • 1M messages: ~350 patterns, ~240 rules in 1.2 minutes
  • 10M messages: ~3,200 patterns, ~2,200 rules in 12 minutes

Rule Evaluation Performance

Shadow evaluation benchmarks:

Rules Count Messages Evaluated Evaluation Time Throughput
10 rules 100K messages 0.8s 125 rules/sec
100 rules 100K messages 7.2s 14 rules/sec
1,000 rules 100K messages 68s 15 rules/sec

Parallel evaluation:

  • Rules can be evaluated in parallel (limited by database connections)
  • Recommended: Evaluate in batches of 50-100 rules
  • Use read replicas for evaluation queries

API Endpoint Performance

API latency benchmarks (P95, 8 CPU cores):

Endpoint Latency (P50) Latency (P95) Latency (P99) Throughput
/api/v1/health 2ms 5ms 10ms 10,000 req/s
/api/v1/messages/ingest 15ms 45ms 120ms 2,000 req/s
/api/v1/patterns/mine 7,000ms 8,500ms 12,000ms 0.1 req/s
/api/v1/rules/eval-shadow 800ms 1,200ms 2,000ms 1 req/s
/api/v1/analyze (1000 msgs) 500ms 750ms 1,200ms 2 req/s

Load testing results (1000 concurrent requests):

  • Health endpoint: 100% success, <10ms latency
  • Ingest endpoint: 99.8% success, <100ms P95 latency
  • Analyze endpoint: 99.5% success, <1s P95 latency

Hardware Recommendations

Small Deployment (<1M messages/day)

Requirements:

  • CPU: 4 cores
  • RAM: 8GB
  • Storage: 50GB SSD
  • Database: PostgreSQL (shared or dedicated)

Expected performance:

  • Ingestion: ~10,000 msg/s
  • Pattern mining: 100K messages in ~10s
  • Concurrent API requests: 100-200

Cost estimate:

  • Cloud: $50-100/month (AWS/GCP)
  • On-premise: Existing infrastructure

Medium Deployment (1-10M messages/day)

Requirements:

  • CPU: 8-16 cores
  • RAM: 16-32GB
  • Storage: 200GB SSD
  • Database: PostgreSQL (dedicated, 4-8 cores, 16GB RAM)
  • Cache: Redis (optional, 4GB)

Expected performance:

  • Ingestion: ~12,000 msg/s
  • Pattern mining: 1M messages in ~1.5 min
  • Concurrent API requests: 500-1000

Cost estimate:

  • Cloud: $200-500/month (AWS/GCP)
  • On-premise: $100-300/month (infrastructure)

Large Deployment (10-100M messages/day)

Requirements:

  • CPU: 16-32 cores
  • RAM: 32-64GB
  • Storage: 1TB SSD
  • Database: PostgreSQL cluster (primary + replicas)
  • Cache: Redis cluster (16GB+)
  • GPU: Optional, for embedding acceleration (24GB VRAM)

Expected performance:

  • Ingestion: ~13,000 msg/s (multiple instances)
  • Pattern mining: 10M messages in ~15 min
  • Concurrent API requests: 2000-5000

Cost estimate:

  • Cloud: $1,000-3,000/month (AWS/GCP)
  • On-premise: $500-1,500/month (infrastructure)

Very Large Deployment (>100M messages/day)

Requirements:

  • Architecture: Multi-instance, load balanced
  • CPU: 32+ cores per instance
  • RAM: 64GB+ per instance
  • Storage: 5TB+ SSD
  • Database: PostgreSQL cluster with sharding
  • Cache: Redis cluster (64GB+)
  • GPU: Recommended for embedding acceleration

Expected performance:

  • Ingestion: ~13,000 msg/s per instance (scale horizontally)
  • Pattern mining: Partitioned by time windows, parallel processing
  • Concurrent API requests: 10,000+ (with load balancer)

Cost estimate:

  • Cloud: $5,000-15,000/month (AWS/GCP)
  • On-premise: $2,000-8,000/month (infrastructure)

Optimization Strategies

Database Optimization

Connection pooling:

database:
  pool_size: 20  # Increase for high concurrency
  max_overflow: 20
  pool_timeout: 30

Indexing:

  • Automatic indexes on frequently queried columns
  • Consider additional indexes for custom queries
  • Monitor index usage and remove unused indexes

Partitioning:

  • Partition messages table by timestamp (monthly/quarterly)
  • Improves query performance for large datasets
  • Easier data archival

Read replicas:

  • Use read replicas for evaluation queries
  • Reduces load on primary database
  • Improves concurrent evaluation performance

Pattern Mining Optimization

Two-stage pipeline:

  • Always enable two-stage pipeline (70-90% cost savings)
  • Adjust suspiciousness_threshold based on spam distribution
    • Concentrated spam: 0.05-0.10
    • Distributed spam: 0.01-0.03 (default)

Chunk size tuning:

pattern_mining:
  stage1_chunk_size: 10000  # Increase for faster processing
  stage2_chunk_size: 1000   # Keep smaller for deep analysis

Semantic mining:

  • Use GPU-accelerated embeddings (5-10x faster)
  • Enable embedding cache (reduces redundant API calls)
  • Batch size optimization (2048 for OpenAI, 512-1024 for local)

LLM optimization:

  • Disable LLM if not needed (10-15% quality trade-off)
  • Use faster LLM models (gpt-4o-mini vs gpt-4)
  • Enable LLM response caching

API Optimization

Rate limiting:

rate_limiting:
  enabled: true
  requests_per_minute: 1000
  burst_size: 100

Caching:

  • Enable embedding cache (reduces API calls)
  • Cache rule evaluation results (TTL: 1 hour)
  • Use Redis for distributed caching

Load balancing:

  • Run multiple PATAS instances behind load balancer
  • Use session affinity (not required, stateless API)
  • Health checks for automatic failover

Memory Optimization

Embedding cache:

embedding_cache:
  max_size: 100000  # Adjust based on available memory
  ttl_seconds: 86400  # 24 hours

Memory usage per 1M messages:

  • Base system: ~500MB
  • Message storage: ~50MB
  • Embedding cache: ~1GB (if enabled)
  • Pattern mining: ~2GB peak (temporary)

Recommendations:

  • Monitor memory usage during pattern mining
  • Set cache size based on available memory
  • Archive old messages to reduce memory usage

Performance Monitoring

Key Metrics

System metrics:

  • patas_api_latency_seconds: API response latency
  • patas_api_requests_total: Total API requests
  • patas_api_errors_total: API errors
  • patas_patterns_discovered_total: Patterns discovered
  • patas_rules_active: Active rules count

Performance metrics:

  • patas_ingestion_throughput: Messages ingested per second
  • patas_mining_duration_seconds: Pattern mining duration
  • patas_evaluation_duration_seconds: Rule evaluation duration

Resource metrics:

  • patas_memory_usage_bytes: Memory usage
  • patas_cpu_usage_percent: CPU usage
  • patas_database_connections: Database connections

Monitoring Setup

Prometheus configuration:

scrape_configs:
  - job_name: 'patas'
    static_configs:
      - targets: ['localhost:8000']
    metrics_path: '/metrics'

Grafana dashboards:

  • See Monitoring Guide for ready-made dashboards
  • Custom dashboards can be created from Prometheus metrics

Alerting:

  • API latency > 1s (P95)
  • Error rate > 1%
  • Memory usage > 80%
  • Database connections > 80% of pool

Troubleshooting Performance Issues

Slow Ingestion

Symptoms:

  • Ingestion throughput < 5,000 msg/s
  • High database connection usage
  • Database write latency > 100ms

Solutions:

  1. Increase database connection pool
  2. Use database write replicas
  3. Optimize database indexes
  4. Increase batch size
  5. Check network latency to database

Slow Pattern Mining

Symptoms:

  • Pattern mining takes > 30 minutes for 1M messages
  • High CPU usage during mining
  • Memory usage spikes

Solutions:

  1. Enable two-stage pipeline
  2. Increase chunk sizes
  3. Disable LLM if not needed
  4. Use GPU for embeddings
  5. Optimize database queries
  6. Archive old messages

High API Latency

Symptoms:

  • API latency > 500ms (P95)
  • High error rates
  • Timeout errors

Solutions:

  1. Increase server resources (CPU/RAM)
  2. Enable caching
  3. Use load balancer with multiple instances
  4. Optimize database queries
  5. Check external service latency (embeddings/LLM)

Memory Issues

Symptoms:

  • Out of memory errors
  • High memory usage (>80%)
  • System swapping

Solutions:

  1. Reduce embedding cache size
  2. Archive old messages
  3. Increase server RAM
  4. Optimize batch sizes
  5. Disable unnecessary features

Benchmarking Tools

Stress Testing

Using provided stress test script:

poetry run python scripts/stress_test_production.py \
  --dataset data/production_your platform_logs.jsonl \
  --embedding-provider none \
  --llm-provider none

Load testing with Locust:

locust -f scripts/load_test.py --host=http://localhost:8000

Performance Profiling

Python profiling:

python -m cProfile -o profile.stats scripts/stress_test_production.py

Database query analysis:

  • Enable PostgreSQL slow query log
  • Use EXPLAIN ANALYZE for slow queries
  • Monitor database connection pool usage

Best Practices

  1. Always use two-stage pipeline for cost and performance benefits
  2. Enable embedding cache to reduce redundant API calls
  3. Use read replicas for evaluation queries
  4. Monitor key metrics (latency, throughput, error rate)
  5. Archive old messages to maintain performance
  6. Scale horizontally for very large deployments
  7. Use GPU for embedding acceleration if available
  8. Optimize database (indexing, partitioning, connection pooling)
  9. Set up alerting for performance degradation
  10. Regular benchmarking to track performance over time

Additional Resources

For performance questions or issues, please open an issue on GitHub.

Clone this wiki locally