Skip to content

Safety Profiles

Nick edited this page Nov 18, 2025 · 2 revisions

Safety Profiles

PATAS organizes discovered patterns into three safety profiles based on precision, recall, and false positive rates. Each profile has specific use cases and enforcement recommendations.


Overview

Safety profiles determine which patterns are included and how they can be used:

Profile Patterns Included Ham Hit Rate Spam Recall Use Case Auto-Ban?
Conservative SAFE_AUTO only ≤ 1.5% 20-40% Low-impact auto actions ✅ Yes (low-risk)
Balanced SAFE_AUTO + top REVIEW_ONLY ≤ 12% ≥ 60% Signals, prioritization ❌ No
Aggressive All except FEATURE_ONLY ≤ 20% ≥ 70% Research, experiments ❌ No

Pattern Quality Tiers

Before patterns are assigned to profiles, they are classified into quality tiers:

SAFE_AUTO

Criteria:

  • Precision: ≥ 98%
  • Ham hit rate: ≤ 1%
  • Spam support: ≥ 50 matches

Usage:

  • Can be auto-promoted to active rules
  • Used in Conservative profile
  • Safe for automatic actions (spam labeling, hiding, throttling)

Example:

  • Pattern with 99% precision, 0.5% ham hit rate, 200 spam matches → SAFE_AUTO

REVIEW_ONLY

Criteria:

  • Precision: 90-98% OR ham hit rate 1-5%
  • Spam support: ≥ 20 matches

Usage:

  • Requires manual review before promotion
  • Used in Balanced profile (top patterns only, precision ≥ 95%)
  • Signals only, not for auto-ban

Example:

  • Pattern with 96% precision, 3% ham hit rate, 80 spam matches → REVIEW_ONLY

FEATURE_ONLY

Criteria:

  • Precision: < 90% OR ham hit rate > 5%

Usage:

  • ML features only, never standalone rules
  • Used in Aggressive profile (for research)
  • Never for production enforcement

Example:

  • Pattern with 85% precision, 8% ham hit rate → FEATURE_ONLY

Conservative Profile

Purpose: Safe for automatic actions on real traffic.

Patterns: Only SAFE_AUTO tier patterns.

Metrics:

  • Ham hit rate: ≤ 1.5% (maximum acceptable)
  • Spam recall: 20-40% (typical range)
  • Precision: ≥ 98% (minimum)

Use Cases:

  • ✅ Auto-spam labeling
  • ✅ Automatic message hiding
  • ✅ Throttling suspicious messages
  • ✅ Low-risk auto-ban (with additional validation)

Auto-Ban: ✅ Yes, but only for low-risk scenarios. Always combine with existing anti-spam signals.

Risk Level: LOW - Safe for automatic actions.

Configuration:

aggressiveness_profile: conservative
safety_mode: CONSERVATIVE

Code Implementation:

  • Only SAFE_AUTO patterns can trigger auto-ban
  • Enforced in app/v2_safety_mode.py
  • Safety guardrails prevent non-SAFE_AUTO patterns from auto-banning

Balanced Profile

Purpose: Signals, prioritization, and soft actions only. NOT for hard bans.

Patterns: SAFE_AUTO + top REVIEW_ONLY patterns (precision ≥ 95%).

Metrics:

  • Ham hit rate: ≤ 12% (maximum acceptable)
  • Spam recall: ≥ 60% (typical range)
  • Precision: ≥ 90% (minimum)

Use Cases:

  • ✅ ML feature signals
  • ✅ Prioritization for manual review
  • ✅ Soft actions (flagging, deprioritization)
  • ✅ Risk scoring (combine with other signals)
  • NOT for auto-ban

Auto-Ban: ❌ No. Patterns produce scores/flags only, never auto-ban.

Risk Level: MEDIUM - Requires human oversight.

Configuration:

aggressiveness_profile: balanced
safety_mode: BALANCED

Code Implementation:

  • Patterns produce scores/flags only
  • can_auto_ban() returns False for all patterns in Balanced mode
  • Enforced in app/v2_safety_mode.py

Aggressive Profile

Purpose: Research and experimentation only. NEVER for production auto-actions.

Patterns: All patterns except FEATURE_ONLY with very low precision.

Metrics:

  • Ham hit rate: ≤ 20% (maximum, but still risky)
  • Spam recall: ≥ 70% (typical range)
  • Precision: ≥ 85% (minimum)

Use Cases:

  • ✅ Offline analysis
  • ✅ Research experiments
  • ✅ Pattern discovery
  • ✅ Understanding spam evolution
  • NEVER for production
  • NEVER for auto-ban

Auto-Ban: ❌ No. Research only.

Risk Level: HIGH - Too many false positives for production.

Configuration:

aggressiveness_profile: aggressive
safety_mode: BALANCED  # Aggressive profile maps to BALANCED safety mode (no auto-ban)

Code Implementation:

  • Aggressive profile maps to SafetyMode.BALANCED (no auto-ban)
  • All patterns produce signals only
  • Enforced in app/v2_safety_mode.py

Safety Guardrails

PATAS includes safety guardrails enforced in code:

SafetyMode Enum

class SafetyMode(str, Enum):
    CONSERVATIVE = "conservative"  # Only SAFE_AUTO patterns can auto-ban
    BALANCED = "balanced"          # Patterns produce scores/flags only
    OFF = "off"                    # No auto decisions, only signals

Enforcement Rules

  • Conservative mode: Only SAFE_AUTO patterns can trigger auto-ban
  • Balanced mode: Patterns produce scores/flags only, never auto-ban
  • Aggressive profile: Maps to BALANCED safety mode (no auto-ban)

Pattern Tier Classification

Patterns are classified into tiers based on:

  • precision (spam_matches / total_matches)
  • spam_support (absolute number of spam matches)
  • ham_hit_rate (ham_matches / total_ham_in_dataset)

Classification logic in app/v2_pattern_quality_tiers.py.


Production Recommendations

1. Start with Conservative Profile

  • Deploy only Conservative profile patterns initially
  • Use for low-impact auto actions (spam labeling, hiding, throttling)
  • Monitor ham hit rate in production logs
  • Track user complaints about false positives

2. Gradually Add Balanced Profile

  • Use Balanced profile patterns as signals only
  • Combine with existing risk-scoring systems
  • Do NOT use Balanced patterns for auto-ban
  • Use for prioritization and ML features

3. Use Aggressive Profile for Research

  • Aggressive profile is for offline analysis only
  • Use for pattern discovery and experiments
  • Never use for production enforcement
  • Never use for auto-ban

4. Combine with Existing Systems

  • Account bans, global blocks, long-term penalties should be decided by existing systems
  • Combine PATAS signals with:
    • User reports
    • Account history and trust scores
    • Device / network heuristics
    • Behavioral patterns

Running Safety Evaluation

Before deploying PATAS patterns to production:

patas safety-eval

This command:

  • Evaluates all profiles against safety thresholds
  • Generates SAFETY_EVAL_REPORT.json
  • Exits with code 0 if all thresholds pass, non-zero if any threshold is violated

DO NOT DEPLOY if patas safety-eval fails.


Configuration

Environment Variables

AGGRESSIVENESS_PROFILE=conservative  # conservative, balanced, or aggressive
SAFETY_MODE=CONSERVATIVE              # CONSERVATIVE, BALANCED, or OFF

YAML Configuration

aggressiveness_profile: conservative
safety_mode: CONSERVATIVE

Code References

  • app/v2_safety_mode.py - Safety mode definitions and enforcement
  • app/v2_pattern_quality_tiers.py - Pattern tier classification
  • app/v2_promotion.py - Profile-based pattern filtering
  • tests/test_pattern_safety_profiles.py - Safety profile tests

Last Updated: 2025-11-18

Clone this wiki locally