-
Notifications
You must be signed in to change notification settings - Fork 0
Safety Profiles
PATAS organizes discovered patterns into three safety profiles based on precision, recall, and false positive rates. Each profile has specific use cases and enforcement recommendations.
Safety profiles determine which patterns are included and how they can be used:
| Profile | Patterns Included | Ham Hit Rate | Spam Recall | Use Case | Auto-Ban? |
|---|---|---|---|---|---|
| Conservative | SAFE_AUTO only | ≤ 1.5% | 20-40% | Low-impact auto actions | ✅ Yes (low-risk) |
| Balanced | SAFE_AUTO + top REVIEW_ONLY | ≤ 12% | ≥ 60% | Signals, prioritization | ❌ No |
| Aggressive | All except FEATURE_ONLY | ≤ 20% | ≥ 70% | Research, experiments | ❌ No |
Before patterns are assigned to profiles, they are classified into quality tiers:
Criteria:
- Precision: ≥ 98%
- Ham hit rate: ≤ 1%
- Spam support: ≥ 50 matches
Usage:
- Can be auto-promoted to active rules
- Used in Conservative profile
- Safe for automatic actions (spam labeling, hiding, throttling)
Example:
- Pattern with 99% precision, 0.5% ham hit rate, 200 spam matches → SAFE_AUTO
Criteria:
- Precision: 90-98% OR ham hit rate 1-5%
- Spam support: ≥ 20 matches
Usage:
- Requires manual review before promotion
- Used in Balanced profile (top patterns only, precision ≥ 95%)
- Signals only, not for auto-ban
Example:
- Pattern with 96% precision, 3% ham hit rate, 80 spam matches → REVIEW_ONLY
Criteria:
- Precision: < 90% OR ham hit rate > 5%
Usage:
- ML features only, never standalone rules
- Used in Aggressive profile (for research)
- Never for production enforcement
Example:
- Pattern with 85% precision, 8% ham hit rate → FEATURE_ONLY
Purpose: Safe for automatic actions on real traffic.
Patterns: Only SAFE_AUTO tier patterns.
Metrics:
- Ham hit rate: ≤ 1.5% (maximum acceptable)
- Spam recall: 20-40% (typical range)
- Precision: ≥ 98% (minimum)
Use Cases:
- ✅ Auto-spam labeling
- ✅ Automatic message hiding
- ✅ Throttling suspicious messages
- ✅ Low-risk auto-ban (with additional validation)
Auto-Ban: ✅ Yes, but only for low-risk scenarios. Always combine with existing anti-spam signals.
Risk Level: LOW - Safe for automatic actions.
Configuration:
aggressiveness_profile: conservative
safety_mode: CONSERVATIVECode Implementation:
- Only
SAFE_AUTOpatterns can trigger auto-ban - Enforced in
app/v2_safety_mode.py - Safety guardrails prevent non-SAFE_AUTO patterns from auto-banning
Purpose: Signals, prioritization, and soft actions only. NOT for hard bans.
Patterns: SAFE_AUTO + top REVIEW_ONLY patterns (precision ≥ 95%).
Metrics:
- Ham hit rate: ≤ 12% (maximum acceptable)
- Spam recall: ≥ 60% (typical range)
- Precision: ≥ 90% (minimum)
Use Cases:
- ✅ ML feature signals
- ✅ Prioritization for manual review
- ✅ Soft actions (flagging, deprioritization)
- ✅ Risk scoring (combine with other signals)
- ❌ NOT for auto-ban
Auto-Ban: ❌ No. Patterns produce scores/flags only, never auto-ban.
Risk Level: MEDIUM - Requires human oversight.
Configuration:
aggressiveness_profile: balanced
safety_mode: BALANCEDCode Implementation:
- Patterns produce scores/flags only
-
can_auto_ban()returnsFalsefor all patterns in Balanced mode - Enforced in
app/v2_safety_mode.py
Purpose: Research and experimentation only. NEVER for production auto-actions.
Patterns: All patterns except FEATURE_ONLY with very low precision.
Metrics:
- Ham hit rate: ≤ 20% (maximum, but still risky)
- Spam recall: ≥ 70% (typical range)
- Precision: ≥ 85% (minimum)
Use Cases:
- ✅ Offline analysis
- ✅ Research experiments
- ✅ Pattern discovery
- ✅ Understanding spam evolution
- ❌ NEVER for production
- ❌ NEVER for auto-ban
Auto-Ban: ❌ No. Research only.
Risk Level: HIGH - Too many false positives for production.
Configuration:
aggressiveness_profile: aggressive
safety_mode: BALANCED # Aggressive profile maps to BALANCED safety mode (no auto-ban)Code Implementation:
- Aggressive profile maps to
SafetyMode.BALANCED(no auto-ban) - All patterns produce signals only
- Enforced in
app/v2_safety_mode.py
PATAS includes safety guardrails enforced in code:
class SafetyMode(str, Enum):
CONSERVATIVE = "conservative" # Only SAFE_AUTO patterns can auto-ban
BALANCED = "balanced" # Patterns produce scores/flags only
OFF = "off" # No auto decisions, only signals-
Conservative mode: Only
SAFE_AUTOpatterns can trigger auto-ban - Balanced mode: Patterns produce scores/flags only, never auto-ban
-
Aggressive profile: Maps to
BALANCEDsafety mode (no auto-ban)
Patterns are classified into tiers based on:
-
precision(spam_matches / total_matches) -
spam_support(absolute number of spam matches) -
ham_hit_rate(ham_matches / total_ham_in_dataset)
Classification logic in app/v2_pattern_quality_tiers.py.
- Deploy only Conservative profile patterns initially
- Use for low-impact auto actions (spam labeling, hiding, throttling)
- Monitor ham hit rate in production logs
- Track user complaints about false positives
- Use Balanced profile patterns as signals only
- Combine with existing risk-scoring systems
- Do NOT use Balanced patterns for auto-ban
- Use for prioritization and ML features
- Aggressive profile is for offline analysis only
- Use for pattern discovery and experiments
- Never use for production enforcement
- Never use for auto-ban
- Account bans, global blocks, long-term penalties should be decided by existing systems
- Combine PATAS signals with:
- User reports
- Account history and trust scores
- Device / network heuristics
- Behavioral patterns
Before deploying PATAS patterns to production:
patas safety-evalThis command:
- Evaluates all profiles against safety thresholds
- Generates
SAFETY_EVAL_REPORT.json - Exits with code 0 if all thresholds pass, non-zero if any threshold is violated
DO NOT DEPLOY if patas safety-eval fails.
AGGRESSIVENESS_PROFILE=conservative # conservative, balanced, or aggressive
SAFETY_MODE=CONSERVATIVE # CONSERVATIVE, BALANCED, or OFFaggressiveness_profile: conservative
safety_mode: CONSERVATIVE-
app/v2_safety_mode.py- Safety mode definitions and enforcement -
app/v2_pattern_quality_tiers.py- Pattern tier classification -
app/v2_promotion.py- Profile-based pattern filtering -
tests/test_pattern_safety_profiles.py- Safety profile tests
Last Updated: 2025-11-18