-
Notifications
You must be signed in to change notification settings - Fork 0
Alerting
This guide describes the alerting rules configured for PATAS monitoring.
Alert: HighFalsePositiveRate
Severity: Warning
Condition: False positive rate > 10% over 5 minutes
Description: Indicates that rules are incorrectly flagging legitimate messages as spam.
Action:
- Review recent rule evaluations
- Check for rules with high ham_hits
- Consider deprecating problematic rules
Alert: RulePrecisionDegradation
Severity: Critical
Condition: Rule precision drops by >10% compared to previous evaluation
Description: A rule's precision has significantly degraded, indicating it may need to be deprecated.
Action:
- Review the specific rule's evaluation metrics
- Check for changes in spam patterns
- Consider deprecating the rule if degradation persists
Alert: PATASServiceDown
Severity: Critical
Condition: PATAS API endpoint is not responding
Description: The PATAS service is down or unreachable.
Action:
- Check service health and logs
- Verify database connectivity
- Check for resource constraints (CPU, memory, disk)
Alert: HighAPILatency
Severity: Warning
Condition: 95th percentile API latency > 2 seconds over 5 minutes
Description: API responses are slower than expected.
Action:
- Check database query performance
- Review LLM/embedding API response times
- Check for resource constraints
- Consider scaling horizontally
Alert: PatternMiningFailure
Severity: Warning
Condition: Pattern mining errors detected in last 5 minutes
Description: Pattern mining operations are encountering errors.
Action:
- Check pattern mining logs
- Verify database connectivity
- Check LLM/embedding service availability
- Review checkpoint status
Alert: LowRuleCoverage
Severity: Info
Condition: Average rule coverage < 1% for 30 minutes
Description: Rules are not matching a significant portion of spam traffic.
Action:
- Review pattern mining results
- Check if new spam patterns have emerged
- Consider running pattern mining more frequently
Alert rules are defined in alerts.yml and can be customized based on your requirements.
Adjust thresholds in alerts.yml based on your use case:
- False Positive Rate: Default 10% (0.1) - adjust based on acceptable FP rate
- Precision Drop: Default 10% (0.1) - adjust based on acceptable degradation
- API Latency: Default 2 seconds - adjust based on SLA requirements
- Rule Coverage: Default 1% (0.01) - adjust based on expected coverage
Configure AlertManager to route alerts to your notification channels:
- Update
alerts.ymlwith your notification receivers - Configure email, Slack, PagerDuty, or other integrations
- Test alert routing
- Start Conservative: Begin with higher thresholds and adjust based on false positives
- Monitor Trends: Watch for gradual degradation, not just threshold breaches
- Document Actions: Keep a runbook for common alert scenarios
- Regular Review: Periodically review and adjust alert thresholds based on historical data