Performance Guide

PATAS Performance Guide

This guide provides performance benchmarks, optimization strategies, and hardware recommendations for PATAS deployments.

Performance Benchmarks

Ingestion Performance

Baseline benchmarks (PostgreSQL, 8 CPU cores, 16GB RAM):

Message Volume	Throughput	Latency (P95)	Notes
1K messages	~15,000 msg/s	<5ms	Small batches, optimal
10K messages	~13,000 msg/s	<10ms	Typical production load
100K messages	~12,000 msg/s	<20ms	Large batches
1M messages	~10,000 msg/s	<50ms	Sustained load

Factors affecting ingestion:

Database connection pool size (default: 20)
Batch size (default: 10,000 messages per batch)
Network latency to database
Database write performance

Optimization tips:

Increase connection pool for high-volume ingestion
Use database write replicas for read-heavy workloads
Enable database connection pooling (PgBouncer for PostgreSQL)

Pattern Mining Performance

Two-stage pipeline benchmarks (100K messages, 15% spam):

Configuration	Stage 1 Time	Stage 2 Time	Total Time	Cost Savings
Two-stage (no LLM, no embeddings)	5.2s	0s	5.2s	100%
Two-stage (embeddings only)	5.2s	1.8s	7.0s	70%
Two-stage (embeddings + LLM)	5.2s	12.5s	17.7s	60%
Single-stage (embeddings + LLM)	N/A	45.3s	45.3s	0%

Scaling benchmarks:

Message Volume	Two-Stage Time	Single-Stage Time	Speedup
100K messages	7.0s	45.3s	6.5x
1M messages	1.2 min	8.5 min	7.1x
10M messages	12 min	95 min	7.9x

Pattern discovery results:

100K messages: 38 patterns, 26 rules in 7 seconds
1M messages: ~350 patterns, ~240 rules in 1.2 minutes
10M messages: ~3,200 patterns, ~2,200 rules in 12 minutes

Rule Evaluation Performance

Shadow evaluation benchmarks:

Rules Count	Messages Evaluated	Evaluation Time	Throughput
10 rules	100K messages	0.8s	125 rules/sec
100 rules	100K messages	7.2s	14 rules/sec
1,000 rules	100K messages	68s	15 rules/sec

Parallel evaluation:

Rules can be evaluated in parallel (limited by database connections)
Recommended: Evaluate in batches of 50-100 rules
Use read replicas for evaluation queries

API Endpoint Performance

API latency benchmarks (P95, 8 CPU cores):

Endpoint	Latency (P50)	Latency (P95)	Latency (P99)	Throughput
`/api/v1/health`	2ms	5ms	10ms	10,000 req/s
`/api/v1/messages/ingest`	15ms	45ms	120ms	2,000 req/s
`/api/v1/patterns/mine`	7,000ms	8,500ms	12,000ms	0.1 req/s
`/api/v1/rules/eval-shadow`	800ms	1,200ms	2,000ms	1 req/s
`/api/v1/analyze` (1000 msgs)	500ms	750ms	1,200ms	2 req/s

Load testing results (1000 concurrent requests):

Health endpoint: 100% success, <10ms latency
Ingest endpoint: 99.8% success, <100ms P95 latency
Analyze endpoint: 99.5% success, <1s P95 latency

Hardware Recommendations

Small Deployment (<1M messages/day)

Requirements:

CPU: 4 cores
RAM: 8GB
Storage: 50GB SSD
Database: PostgreSQL (shared or dedicated)

Expected performance:

Ingestion: ~10,000 msg/s
Pattern mining: 100K messages in ~10s
Concurrent API requests: 100-200

Cost estimate:

Cloud: $50-100/month (AWS/GCP)
On-premise: Existing infrastructure

Medium Deployment (1-10M messages/day)

Requirements:

CPU: 8-16 cores
RAM: 16-32GB
Storage: 200GB SSD
Database: PostgreSQL (dedicated, 4-8 cores, 16GB RAM)
Cache: Redis (optional, 4GB)

Expected performance:

Ingestion: ~12,000 msg/s
Pattern mining: 1M messages in ~1.5 min
Concurrent API requests: 500-1000

Cost estimate:

Cloud: $200-500/month (AWS/GCP)
On-premise: $100-300/month (infrastructure)

Large Deployment (10-100M messages/day)

Requirements:

CPU: 16-32 cores
RAM: 32-64GB
Storage: 1TB SSD
Database: PostgreSQL cluster (primary + replicas)
Cache: Redis cluster (16GB+)
GPU: Optional, for embedding acceleration (24GB VRAM)

Expected performance:

Ingestion: ~13,000 msg/s (multiple instances)
Pattern mining: 10M messages in ~15 min
Concurrent API requests: 2000-5000

Cost estimate:

Cloud: $1,000-3,000/month (AWS/GCP)
On-premise: $500-1,500/month (infrastructure)

Very Large Deployment (>100M messages/day)

Requirements:

Architecture: Multi-instance, load balanced
CPU: 32+ cores per instance
RAM: 64GB+ per instance
Storage: 5TB+ SSD
Database: PostgreSQL cluster with sharding
Cache: Redis cluster (64GB+)
GPU: Recommended for embedding acceleration

Expected performance:

Ingestion: ~13,000 msg/s per instance (scale horizontally)
Pattern mining: Partitioned by time windows, parallel processing
Concurrent API requests: 10,000+ (with load balancer)

Cost estimate:

Cloud: $5,000-15,000/month (AWS/GCP)
On-premise: $2,000-8,000/month (infrastructure)

Optimization Strategies

Database Optimization

Connection pooling:

database:
  pool_size: 20  # Increase for high concurrency
  max_overflow: 20
  pool_timeout: 30

Indexing:

Automatic indexes on frequently queried columns
Consider additional indexes for custom queries
Monitor index usage and remove unused indexes

Partitioning:

Partition messages table by timestamp (monthly/quarterly)
Improves query performance for large datasets
Easier data archival

Read replicas:

Use read replicas for evaluation queries
Reduces load on primary database
Improves concurrent evaluation performance

Pattern Mining Optimization

Two-stage pipeline:

Always enable two-stage pipeline (70-90% cost savings)
Adjust suspiciousness_threshold based on spam distribution
- Concentrated spam: 0.05-0.10
- Distributed spam: 0.01-0.03 (default)

Chunk size tuning:

pattern_mining:
  stage1_chunk_size: 10000  # Increase for faster processing
  stage2_chunk_size: 1000   # Keep smaller for deep analysis

Semantic mining:

Use GPU-accelerated embeddings (5-10x faster)
Enable embedding cache (reduces redundant API calls)
Batch size optimization (2048 for OpenAI, 512-1024 for local)

LLM optimization:

Disable LLM if not needed (10-15% quality trade-off)
Use faster LLM models (gpt-4o-mini vs gpt-4)
Enable LLM response caching

API Optimization

Rate limiting:

rate_limiting:
  enabled: true
  requests_per_minute: 1000
  burst_size: 100

Caching:

Enable embedding cache (reduces API calls)
Cache rule evaluation results (TTL: 1 hour)
Use Redis for distributed caching

Load balancing:

Run multiple PATAS instances behind load balancer
Use session affinity (not required, stateless API)
Health checks for automatic failover

Memory Optimization

Embedding cache:

embedding_cache:
  max_size: 100000  # Adjust based on available memory
  ttl_seconds: 86400  # 24 hours

Memory usage per 1M messages:

Base system: ~500MB
Message storage: ~50MB
Embedding cache: ~1GB (if enabled)
Pattern mining: ~2GB peak (temporary)

Recommendations:

Monitor memory usage during pattern mining
Set cache size based on available memory
Archive old messages to reduce memory usage

Performance Monitoring

Key Metrics

System metrics:

patas_api_latency_seconds: API response latency
patas_api_requests_total: Total API requests
patas_api_errors_total: API errors
patas_patterns_discovered_total: Patterns discovered
patas_rules_active: Active rules count

Performance metrics:

patas_ingestion_throughput: Messages ingested per second
patas_mining_duration_seconds: Pattern mining duration
patas_evaluation_duration_seconds: Rule evaluation duration

Resource metrics:

patas_memory_usage_bytes: Memory usage
patas_cpu_usage_percent: CPU usage
patas_database_connections: Database connections

Monitoring Setup

Prometheus configuration:

scrape_configs:
  - job_name: 'patas'
    static_configs:
      - targets: ['localhost:8000']
    metrics_path: '/metrics'

Grafana dashboards:

See Monitoring Guide for ready-made dashboards
Custom dashboards can be created from Prometheus metrics

Alerting:

API latency > 1s (P95)
Error rate > 1%
Memory usage > 80%
Database connections > 80% of pool

Troubleshooting Performance Issues

Slow Ingestion

Symptoms:

Ingestion throughput < 5,000 msg/s
High database connection usage
Database write latency > 100ms

Solutions:

Increase database connection pool
Use database write replicas
Optimize database indexes
Increase batch size
Check network latency to database

Slow Pattern Mining

Symptoms:

Pattern mining takes > 30 minutes for 1M messages
High CPU usage during mining
Memory usage spikes

Solutions:

Enable two-stage pipeline
Increase chunk sizes
Disable LLM if not needed
Use GPU for embeddings
Optimize database queries
Archive old messages

High API Latency

Symptoms:

API latency > 500ms (P95)
High error rates
Timeout errors

Solutions:

Increase server resources (CPU/RAM)
Enable caching
Use load balancer with multiple instances
Optimize database queries
Check external service latency (embeddings/LLM)

Memory Issues

Symptoms:

Out of memory errors
High memory usage (>80%)
System swapping

Solutions:

Reduce embedding cache size
Archive old messages
Increase server RAM
Optimize batch sizes
Disable unnecessary features

Benchmarking Tools

Stress Testing

Using provided stress test script:

poetry run python scripts/stress_test_production.py \
  --dataset data/production_your platform_logs.jsonl \
  --embedding-provider none \
  --llm-provider none

Load testing with Locust:

locust -f scripts/load_test.py --host=http://localhost:8000

Performance Profiling

Python profiling:

python -m cProfile -o profile.stats scripts/stress_test_production.py

Database query analysis:

Enable PostgreSQL slow query log
Use EXPLAIN ANALYZE for slow queries
Monitor database connection pool usage

Best Practices

Always use two-stage pipeline for cost and performance benefits
Enable embedding cache to reduce redundant API calls
Use read replicas for evaluation queries
Monitor key metrics (latency, throughput, error rate)
Archive old messages to maintain performance
Scale horizontally for very large deployments
Use GPU for embedding acceleration if available
Optimize database (indexing, partitioning, connection pooling)
Set up alerting for performance degradation
Regular benchmarking to track performance over time

Additional Resources

Scaling Guide - Horizontal scaling strategies
Deployment Guide - Production deployment
Monitoring Guide - Monitoring and alerting
Configuration Guide - Configuration options

For performance questions or issues, please open an issue on GitHub.

Performance Guide

PATAS Performance Guide

Performance Benchmarks

Ingestion Performance

Pattern Mining Performance

Rule Evaluation Performance

API Endpoint Performance

Hardware Recommendations

Small Deployment (<1M messages/day)

Medium Deployment (1-10M messages/day)

Large Deployment (10-100M messages/day)

Very Large Deployment (>100M messages/day)

Optimization Strategies

Database Optimization

Pattern Mining Optimization

API Optimization

Memory Optimization

Performance Monitoring

Key Metrics

Monitoring Setup

Troubleshooting Performance Issues

Slow Ingestion

Slow Pattern Mining

High API Latency

Memory Issues

Benchmarking Tools

Stress Testing

Performance Profiling

Best Practices

Additional Resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!