-
Notifications
You must be signed in to change notification settings - Fork 0
Performance Guide
This guide provides performance benchmarks, optimization strategies, and hardware recommendations for PATAS deployments.
Baseline benchmarks (PostgreSQL, 8 CPU cores, 16GB RAM):
| Message Volume | Throughput | Latency (P95) | Notes |
|---|---|---|---|
| 1K messages | ~15,000 msg/s | <5ms | Small batches, optimal |
| 10K messages | ~13,000 msg/s | <10ms | Typical production load |
| 100K messages | ~12,000 msg/s | <20ms | Large batches |
| 1M messages | ~10,000 msg/s | <50ms | Sustained load |
Factors affecting ingestion:
- Database connection pool size (default: 20)
- Batch size (default: 10,000 messages per batch)
- Network latency to database
- Database write performance
Optimization tips:
- Increase connection pool for high-volume ingestion
- Use database write replicas for read-heavy workloads
- Enable database connection pooling (PgBouncer for PostgreSQL)
Two-stage pipeline benchmarks (100K messages, 15% spam):
| Configuration | Stage 1 Time | Stage 2 Time | Total Time | Cost Savings |
|---|---|---|---|---|
| Two-stage (no LLM, no embeddings) | 5.2s | 0s | 5.2s | 100% |
| Two-stage (embeddings only) | 5.2s | 1.8s | 7.0s | 70% |
| Two-stage (embeddings + LLM) | 5.2s | 12.5s | 17.7s | 60% |
| Single-stage (embeddings + LLM) | N/A | 45.3s | 45.3s | 0% |
Scaling benchmarks:
| Message Volume | Two-Stage Time | Single-Stage Time | Speedup |
|---|---|---|---|
| 100K messages | 7.0s | 45.3s | 6.5x |
| 1M messages | 1.2 min | 8.5 min | 7.1x |
| 10M messages | 12 min | 95 min | 7.9x |
Pattern discovery results:
- 100K messages: 38 patterns, 26 rules in 7 seconds
- 1M messages: ~350 patterns, ~240 rules in 1.2 minutes
- 10M messages: ~3,200 patterns, ~2,200 rules in 12 minutes
Shadow evaluation benchmarks:
| Rules Count | Messages Evaluated | Evaluation Time | Throughput |
|---|---|---|---|
| 10 rules | 100K messages | 0.8s | 125 rules/sec |
| 100 rules | 100K messages | 7.2s | 14 rules/sec |
| 1,000 rules | 100K messages | 68s | 15 rules/sec |
Parallel evaluation:
- Rules can be evaluated in parallel (limited by database connections)
- Recommended: Evaluate in batches of 50-100 rules
- Use read replicas for evaluation queries
API latency benchmarks (P95, 8 CPU cores):
| Endpoint | Latency (P50) | Latency (P95) | Latency (P99) | Throughput |
|---|---|---|---|---|
/api/v1/health |
2ms | 5ms | 10ms | 10,000 req/s |
/api/v1/messages/ingest |
15ms | 45ms | 120ms | 2,000 req/s |
/api/v1/patterns/mine |
7,000ms | 8,500ms | 12,000ms | 0.1 req/s |
/api/v1/rules/eval-shadow |
800ms | 1,200ms | 2,000ms | 1 req/s |
/api/v1/analyze (1000 msgs) |
500ms | 750ms | 1,200ms | 2 req/s |
Load testing results (1000 concurrent requests):
- Health endpoint: 100% success, <10ms latency
- Ingest endpoint: 99.8% success, <100ms P95 latency
- Analyze endpoint: 99.5% success, <1s P95 latency
Requirements:
- CPU: 4 cores
- RAM: 8GB
- Storage: 50GB SSD
- Database: PostgreSQL (shared or dedicated)
Expected performance:
- Ingestion: ~10,000 msg/s
- Pattern mining: 100K messages in ~10s
- Concurrent API requests: 100-200
Cost estimate:
- Cloud: $50-100/month (AWS/GCP)
- On-premise: Existing infrastructure
Requirements:
- CPU: 8-16 cores
- RAM: 16-32GB
- Storage: 200GB SSD
- Database: PostgreSQL (dedicated, 4-8 cores, 16GB RAM)
- Cache: Redis (optional, 4GB)
Expected performance:
- Ingestion: ~12,000 msg/s
- Pattern mining: 1M messages in ~1.5 min
- Concurrent API requests: 500-1000
Cost estimate:
- Cloud: $200-500/month (AWS/GCP)
- On-premise: $100-300/month (infrastructure)
Requirements:
- CPU: 16-32 cores
- RAM: 32-64GB
- Storage: 1TB SSD
- Database: PostgreSQL cluster (primary + replicas)
- Cache: Redis cluster (16GB+)
- GPU: Optional, for embedding acceleration (24GB VRAM)
Expected performance:
- Ingestion: ~13,000 msg/s (multiple instances)
- Pattern mining: 10M messages in ~15 min
- Concurrent API requests: 2000-5000
Cost estimate:
- Cloud: $1,000-3,000/month (AWS/GCP)
- On-premise: $500-1,500/month (infrastructure)
Requirements:
- Architecture: Multi-instance, load balanced
- CPU: 32+ cores per instance
- RAM: 64GB+ per instance
- Storage: 5TB+ SSD
- Database: PostgreSQL cluster with sharding
- Cache: Redis cluster (64GB+)
- GPU: Recommended for embedding acceleration
Expected performance:
- Ingestion: ~13,000 msg/s per instance (scale horizontally)
- Pattern mining: Partitioned by time windows, parallel processing
- Concurrent API requests: 10,000+ (with load balancer)
Cost estimate:
- Cloud: $5,000-15,000/month (AWS/GCP)
- On-premise: $2,000-8,000/month (infrastructure)
Connection pooling:
database:
pool_size: 20 # Increase for high concurrency
max_overflow: 20
pool_timeout: 30Indexing:
- Automatic indexes on frequently queried columns
- Consider additional indexes for custom queries
- Monitor index usage and remove unused indexes
Partitioning:
- Partition
messagestable by timestamp (monthly/quarterly) - Improves query performance for large datasets
- Easier data archival
Read replicas:
- Use read replicas for evaluation queries
- Reduces load on primary database
- Improves concurrent evaluation performance
Two-stage pipeline:
- Always enable two-stage pipeline (70-90% cost savings)
- Adjust
suspiciousness_thresholdbased on spam distribution- Concentrated spam: 0.05-0.10
- Distributed spam: 0.01-0.03 (default)
Chunk size tuning:
pattern_mining:
stage1_chunk_size: 10000 # Increase for faster processing
stage2_chunk_size: 1000 # Keep smaller for deep analysisSemantic mining:
- Use GPU-accelerated embeddings (5-10x faster)
- Enable embedding cache (reduces redundant API calls)
- Batch size optimization (2048 for OpenAI, 512-1024 for local)
LLM optimization:
- Disable LLM if not needed (10-15% quality trade-off)
- Use faster LLM models (gpt-4o-mini vs gpt-4)
- Enable LLM response caching
Rate limiting:
rate_limiting:
enabled: true
requests_per_minute: 1000
burst_size: 100Caching:
- Enable embedding cache (reduces API calls)
- Cache rule evaluation results (TTL: 1 hour)
- Use Redis for distributed caching
Load balancing:
- Run multiple PATAS instances behind load balancer
- Use session affinity (not required, stateless API)
- Health checks for automatic failover
Embedding cache:
embedding_cache:
max_size: 100000 # Adjust based on available memory
ttl_seconds: 86400 # 24 hoursMemory usage per 1M messages:
- Base system: ~500MB
- Message storage: ~50MB
- Embedding cache: ~1GB (if enabled)
- Pattern mining: ~2GB peak (temporary)
Recommendations:
- Monitor memory usage during pattern mining
- Set cache size based on available memory
- Archive old messages to reduce memory usage
System metrics:
-
patas_api_latency_seconds: API response latency -
patas_api_requests_total: Total API requests -
patas_api_errors_total: API errors -
patas_patterns_discovered_total: Patterns discovered -
patas_rules_active: Active rules count
Performance metrics:
-
patas_ingestion_throughput: Messages ingested per second -
patas_mining_duration_seconds: Pattern mining duration -
patas_evaluation_duration_seconds: Rule evaluation duration
Resource metrics:
-
patas_memory_usage_bytes: Memory usage -
patas_cpu_usage_percent: CPU usage -
patas_database_connections: Database connections
Prometheus configuration:
scrape_configs:
- job_name: 'patas'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/metrics'Grafana dashboards:
- See Monitoring Guide for ready-made dashboards
- Custom dashboards can be created from Prometheus metrics
Alerting:
- API latency > 1s (P95)
- Error rate > 1%
- Memory usage > 80%
- Database connections > 80% of pool
Symptoms:
- Ingestion throughput < 5,000 msg/s
- High database connection usage
- Database write latency > 100ms
Solutions:
- Increase database connection pool
- Use database write replicas
- Optimize database indexes
- Increase batch size
- Check network latency to database
Symptoms:
- Pattern mining takes > 30 minutes for 1M messages
- High CPU usage during mining
- Memory usage spikes
Solutions:
- Enable two-stage pipeline
- Increase chunk sizes
- Disable LLM if not needed
- Use GPU for embeddings
- Optimize database queries
- Archive old messages
Symptoms:
- API latency > 500ms (P95)
- High error rates
- Timeout errors
Solutions:
- Increase server resources (CPU/RAM)
- Enable caching
- Use load balancer with multiple instances
- Optimize database queries
- Check external service latency (embeddings/LLM)
Symptoms:
- Out of memory errors
- High memory usage (>80%)
- System swapping
Solutions:
- Reduce embedding cache size
- Archive old messages
- Increase server RAM
- Optimize batch sizes
- Disable unnecessary features
Using provided stress test script:
poetry run python scripts/stress_test_production.py \
--dataset data/production_your platform_logs.jsonl \
--embedding-provider none \
--llm-provider noneLoad testing with Locust:
locust -f scripts/load_test.py --host=http://localhost:8000Python profiling:
python -m cProfile -o profile.stats scripts/stress_test_production.pyDatabase query analysis:
- Enable PostgreSQL slow query log
- Use
EXPLAIN ANALYZEfor slow queries - Monitor database connection pool usage
- Always use two-stage pipeline for cost and performance benefits
- Enable embedding cache to reduce redundant API calls
- Use read replicas for evaluation queries
- Monitor key metrics (latency, throughput, error rate)
- Archive old messages to maintain performance
- Scale horizontally for very large deployments
- Use GPU for embedding acceleration if available
- Optimize database (indexing, partitioning, connection pooling)
- Set up alerting for performance degradation
- Regular benchmarking to track performance over time
- Scaling Guide - Horizontal scaling strategies
- Deployment Guide - Production deployment
- Monitoring Guide - Monitoring and alerting
- Configuration Guide - Configuration options
For performance questions or issues, please open an issue on GitHub.