-
Notifications
You must be signed in to change notification settings - Fork 0
Safety Guide
Purpose: This document explains how PATAS should be integrated into spam detection infrastructure, with clear safety boundaries and enforcement recommendations.
PATAS is designed as a high-precision spam pattern signal engine, not as a standalone enforcement system.
Core Architecture: PATAS Core is a deterministic, rule-based engine. ML/LLM integration (if used) is optional and external, running inside your infrastructure for pattern discovery.
PATAS provides signals and patterns. You control enforcement decisions.
We strongly recommend:
Use the Conservative profile only for low-impact automatic actions:
- ✅ Spam labeling (marking messages as spam for filtering)
- ✅ Hiding / deprioritizing messages (reducing visibility without blocking)
- ✅ Rate-limiting (temporary throttling of suspicious senders)
- ✅ Soft actions that can be easily reversed
Metrics:
- Ham hit rate: ≤ 1.5%
- Spam recall: 20-40%
- Precision: ≥ 98%
Safety: Conservative profile is safe for automatic actions because it only includes SAFE_AUTO tier patterns with very low false positive rates.
Use Balanced and Aggressive profiles as signals only:
- ✅ Input features into your existing risk-scoring systems
- ✅ Prioritization for manual review queues
- ✅ Research / experiments for pattern discovery
- ❌ NOT for auto-ban or irreversible actions
Metrics:
- Balanced: 9.75% ham hit rate, 66.87% recall
- Aggressive: 19.12% ham hit rate, 72.84% recall
Why signals only: These profiles have higher recall but also higher false positive rates. They should be combined with other signals before making enforcement decisions.
Keep permanent or high-impact enforcement (account bans, global blocks, long-term penalties) under your existing systems that combine:
- ✅ User reports
- ✅ Account history and trust scores
- ✅ Device / network heuristics
- ✅ Behavioral patterns
- ✅ PATAS signals as just one of the inputs
We do NOT recommend wiring PATAS patterns directly into irreversible enforcement without combining them with your internal signals and policies.
| Profile | Patterns | Ham Hit Rate | Spam Recall | Use Case | Auto-Ban? |
|---|---|---|---|---|---|
| Conservative | SAFE_AUTO only | ≤ 1.5% | 20-40% | Low-impact auto actions | ✅ Yes (low-risk) |
| Balanced | SAFE_AUTO + top REVIEW_ONLY | ≤ 12% | ≥ 60% | Signals, prioritization | ❌ No |
| Aggressive | All except FEATURE_ONLY | ≤ 20% | ≥ 70% | Research, experiments | ❌ No |
PATAS classifies rules into three safety categories:
- Precision: ≥ 95% (from shadow evaluation)
- False Positive Rate: ≤ 1% (precision ≥ 99%)
- Minimum hits: ≥ 10 (sufficient sample size)
- Usage: Can be auto-promoted to active rules for Conservative profile
- Criteria: Must pass deterministic checks AND (high precision evaluation OR whitelist patterns with AND conditions)
- Description: High confidence, low risk (<10%). Specific rules without false positives. Safe for automatic application.
- Precision: 90-98% OR ham hit rate 1-5%
- Usage: Requires manual review before promotion
- Use in: Balanced profile (top patterns only)
- Description: Good pattern, but may catch edge cases. Recommended for quarantine or manual review.
- Pattern characteristics: Too broad (e.g., "Hello", "work", "time", stop words, very short patterns <3 chars)
- SQL status: Automatically commented out to prevent accidental execution
- Usage: Insight only - useful for understanding attack trends, NOT for enforcement
- Description: Pattern is too broad and would cause many false positives. SQL is commented out. Useful for research and understanding spam evolution.
Before deploying PATAS patterns to production:
patas safety-evalThis command:
- Evaluates all profiles against safety thresholds
- Generates
SAFETY_EVAL_REPORT.json - Exits with code 0 if all thresholds pass, non-zero if any threshold is violated
DO NOT DEPLOY if patas safety-eval fails.
- Deploy only Conservative profile patterns initially
- Monitor ham hit rate in production logs
- Track user complaints about false positives
- Use Balanced profile patterns as signals only
- Combine with existing risk-scoring systems
- Do NOT use Balanced patterns for auto-ban
- Aggressive profile is for offline analysis only
- Use for pattern discovery and experiments
- Never use for production enforcement
PATAS includes safety guardrails enforced in code:
-
SafetyMode enum:
CONSERVATIVE,BALANCED,OFF - Conservative mode: Only AUTO_SAFE patterns can trigger auto-ban
- Balanced mode: Patterns produce scores/flags only, never auto-ban
- SQL safety: Only SELECT queries, whitelisted columns, no match-everything rules
-
Rule Safety Classifier: Three-category system (AUTO_SAFE / REQUIRES_REVIEW / DANGEROUS)
- AUTO_SAFE: Ready for Automation - Precision ≥ 95%, FPR ≤ 1%, whitelist patterns with AND conditions. Can be automatically applied.
- REQUIRES_REVIEW: Requires Human Verification - Good pattern, but may catch edge cases. Recommended for quarantine or manual review.
- DANGEROUS: Insight Only (High Risk) - Pattern too broad (e.g., "Hello", "work", "time"). SQL is commented out to prevent accidental execution. Useful for understanding attack trends.
- Auto-fix SQL errors: Simple SQL errors automatically fixed in shadow evaluation
- Stop words filtering: Common words blocked to prevent false positives
See app/v2_safety_mode.py, app/v2_sql_safety.py, and app/v2_rule_safety_classifier.py for implementation details.
After deployment:
- Monitor ham hit rate in production logs
- Track user complaints about false positives
- Re-run
patas safety-evalif:- Pattern mining discovers new patterns
- Pattern quality thresholds change
- Significant changes to pattern SQL generation
If safety thresholds are violated:
- Immediately disable affected patterns
- Review
SAFETY_EVAL_REPORT.jsonfor violations - Fix issues before re-enabling patterns
- Re-run
patas safety-evalto confirm fixes
TL;DR for Engineers:
- ✅ Use Conservative profile for low-impact auto actions (labeling, hiding, throttling)
⚠️ Use Balanced profile as signals only (NOT for auto-ban)- ❌ Use Aggressive profile for research only (NOT for production)
- 🔒 Keep permanent enforcement (bans, blocks) in your existing systems
- ✅ Always run
patas safety-evalbefore deployment - 📊 Monitor ham hit rates and user complaints continuously
PATAS provides the "brain" for pattern detection. You control the "trigger" for enforcement.
Last Updated: 2025-11-21