Alerting

PATAS Alerting Guide

This guide describes the alerting rules configured for PATAS monitoring.

Alert Rules

High False Positive Rate

Alert: HighFalsePositiveRate
Severity: Warning
Condition: False positive rate > 10% over 5 minutes
Description: Indicates that rules are incorrectly flagging legitimate messages as spam.

Action:

Review recent rule evaluations
Check for rules with high ham_hits
Consider deprecating problematic rules

Rule Precision Degradation

Alert: RulePrecisionDegradation
Severity: Critical
Condition: Rule precision drops by >10% compared to previous evaluation
Description: A rule's precision has significantly degraded, indicating it may need to be deprecated.

Action:

Review the specific rule's evaluation metrics
Check for changes in spam patterns
Consider deprecating the rule if degradation persists

Service Unavailable

Alert: PATASServiceDown
Severity: Critical
Condition: PATAS API endpoint is not responding
Description: The PATAS service is down or unreachable.

Action:

Check service health and logs
Verify database connectivity
Check for resource constraints (CPU, memory, disk)

High API Latency

Alert: HighAPILatency
Severity: Warning
Condition: 95th percentile API latency > 2 seconds over 5 minutes
Description: API responses are slower than expected.

Action:

Check database query performance
Review LLM/embedding API response times
Check for resource constraints
Consider scaling horizontally

Pattern Mining Failure

Alert: PatternMiningFailure
Severity: Warning
Condition: Pattern mining errors detected in last 5 minutes
Description: Pattern mining operations are encountering errors.

Action:

Check pattern mining logs
Verify database connectivity
Check LLM/embedding service availability
Review checkpoint status

Low Rule Coverage

Alert: LowRuleCoverage
Severity: Info
Condition: Average rule coverage < 1% for 30 minutes
Description: Rules are not matching a significant portion of spam traffic.

Action:

Review pattern mining results
Check if new spam patterns have emerged
Consider running pattern mining more frequently

Configuration

Alert rules are defined in alerts.yml and can be customized based on your requirements.

Thresholds

Adjust thresholds in alerts.yml based on your use case:

False Positive Rate: Default 10% (0.1) - adjust based on acceptable FP rate
Precision Drop: Default 10% (0.1) - adjust based on acceptable degradation
API Latency: Default 2 seconds - adjust based on SLA requirements
Rule Coverage: Default 1% (0.01) - adjust based on expected coverage

AlertManager Integration

Configure AlertManager to route alerts to your notification channels:

Update alerts.yml with your notification receivers
Configure email, Slack, PagerDuty, or other integrations
Test alert routing

Best Practices

Start Conservative: Begin with higher thresholds and adjust based on false positives
Monitor Trends: Watch for gradual degradation, not just threshold breaches
Document Actions: Keep a runbook for common alert scenarios
Regular Review: Periodically review and adjust alert thresholds based on historical data

Alerting

PATAS Alerting Guide

Alert Rules

High False Positive Rate

Rule Precision Degradation

Service Unavailable

High API Latency

Pattern Mining Failure

Low Rule Coverage

Configuration

Thresholds

AlertManager Integration

Best Practices

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally