Safety Profiles

PATAS organizes discovered patterns into three safety profiles based on precision, recall, and false positive rates. Each profile has specific use cases and enforcement recommendations.

Overview

Safety profiles determine which patterns are included and how they can be used:

Profile	Patterns Included	Ham Hit Rate	Spam Recall	Use Case	Auto-Ban?
Conservative	SAFE_AUTO only	≤ 1.5%	20-40%	Low-impact auto actions	✅ Yes (low-risk)
Balanced	SAFE_AUTO + top REVIEW_ONLY	≤ 12%	≥ 60%	Signals, prioritization	❌ No
Aggressive	All except FEATURE_ONLY	≤ 20%	≥ 70%	Research, experiments	❌ No

Pattern Quality Tiers

Before patterns are assigned to profiles, they are classified into quality tiers:

SAFE_AUTO

Criteria:

Precision: ≥ 98%
Ham hit rate: ≤ 1%
Spam support: ≥ 50 matches

Usage:

Can be auto-promoted to active rules
Used in Conservative profile
Safe for automatic actions (spam labeling, hiding, throttling)

Example:

Pattern with 99% precision, 0.5% ham hit rate, 200 spam matches → SAFE_AUTO

REVIEW_ONLY

Criteria:

Precision: 90-98% OR ham hit rate 1-5%
Spam support: ≥ 20 matches

Usage:

Requires manual review before promotion
Used in Balanced profile (top patterns only, precision ≥ 95%)
Signals only, not for auto-ban

Example:

Pattern with 96% precision, 3% ham hit rate, 80 spam matches → REVIEW_ONLY

FEATURE_ONLY

Criteria:

Precision: < 90% OR ham hit rate > 5%

Usage:

ML features only, never standalone rules
Used in Aggressive profile (for research)
Never for production enforcement

Example:

Pattern with 85% precision, 8% ham hit rate → FEATURE_ONLY

Conservative Profile

Purpose: Safe for automatic actions on real traffic.

Patterns: Only SAFE_AUTO tier patterns.

Metrics:

Ham hit rate: ≤ 1.5% (maximum acceptable)
Spam recall: 20-40% (typical range)
Precision: ≥ 98% (minimum)

Use Cases:

✅ Auto-spam labeling
✅ Automatic message hiding
✅ Throttling suspicious messages
✅ Low-risk auto-ban (with additional validation)

Auto-Ban: ✅ Yes, but only for low-risk scenarios. Always combine with existing anti-spam signals.

Risk Level: LOW - Safe for automatic actions.

Configuration:

aggressiveness_profile: conservative
safety_mode: CONSERVATIVE

Code Implementation:

Only SAFE_AUTO patterns can trigger auto-ban
Enforced in app/v2_safety_mode.py
Safety guardrails prevent non-SAFE_AUTO patterns from auto-banning

Balanced Profile

Purpose: Signals, prioritization, and soft actions only. NOT for hard bans.

Patterns: SAFE_AUTO + top REVIEW_ONLY patterns (precision ≥ 95%).

Metrics:

Ham hit rate: ≤ 12% (maximum acceptable)
Spam recall: ≥ 60% (typical range)
Precision: ≥ 90% (minimum)

Use Cases:

✅ ML feature signals
✅ Prioritization for manual review
✅ Soft actions (flagging, deprioritization)
✅ Risk scoring (combine with other signals)
❌ NOT for auto-ban

Auto-Ban: ❌ No. Patterns produce scores/flags only, never auto-ban.

Risk Level: MEDIUM - Requires human oversight.

Configuration:

aggressiveness_profile: balanced
safety_mode: BALANCED

Code Implementation:

Patterns produce scores/flags only
can_auto_ban() returns False for all patterns in Balanced mode
Enforced in app/v2_safety_mode.py

Aggressive Profile

Purpose: Research and experimentation only. NEVER for production auto-actions.

Patterns: All patterns except FEATURE_ONLY with very low precision.

Metrics:

Ham hit rate: ≤ 20% (maximum, but still risky)
Spam recall: ≥ 70% (typical range)
Precision: ≥ 85% (minimum)

Use Cases:

✅ Offline analysis
✅ Research experiments
✅ Pattern discovery
✅ Understanding spam evolution
❌ NEVER for production
❌ NEVER for auto-ban

Auto-Ban: ❌ No. Research only.

Risk Level: HIGH - Too many false positives for production.

Configuration:

aggressiveness_profile: aggressive
safety_mode: BALANCED  # Aggressive profile maps to BALANCED safety mode (no auto-ban)

Code Implementation:

Aggressive profile maps to SafetyMode.BALANCED (no auto-ban)
All patterns produce signals only
Enforced in app/v2_safety_mode.py

Safety Guardrails

PATAS includes safety guardrails enforced in code:

SafetyMode Enum

class SafetyMode(str, Enum):
    CONSERVATIVE = "conservative"  # Only SAFE_AUTO patterns can auto-ban
    BALANCED = "balanced"          # Patterns produce scores/flags only
    OFF = "off"                    # No auto decisions, only signals

Enforcement Rules

Conservative mode: Only SAFE_AUTO patterns can trigger auto-ban
Balanced mode: Patterns produce scores/flags only, never auto-ban
Aggressive profile: Maps to BALANCED safety mode (no auto-ban)

Pattern Tier Classification

Patterns are classified into tiers based on:

precision (spam_matches / total_matches)
spam_support (absolute number of spam matches)
ham_hit_rate (ham_matches / total_ham_in_dataset)

Classification logic in app/v2_pattern_quality_tiers.py.

Production Recommendations

1. Start with Conservative Profile

Deploy only Conservative profile patterns initially
Use for low-impact auto actions (spam labeling, hiding, throttling)
Monitor ham hit rate in production logs
Track user complaints about false positives

2. Gradually Add Balanced Profile

Use Balanced profile patterns as signals only
Combine with existing risk-scoring systems
Do NOT use Balanced patterns for auto-ban
Use for prioritization and ML features

3. Use Aggressive Profile for Research

Aggressive profile is for offline analysis only
Use for pattern discovery and experiments
Never use for production enforcement
Never use for auto-ban

4. Combine with Existing Systems

Account bans, global blocks, long-term penalties should be decided by existing systems
Combine PATAS signals with:
- User reports
- Account history and trust scores
- Device / network heuristics
- Behavioral patterns

Running Safety Evaluation

Before deploying PATAS patterns to production:

patas safety-eval

This command:

Evaluates all profiles against safety thresholds
Generates SAFETY_EVAL_REPORT.json
Exits with code 0 if all thresholds pass, non-zero if any threshold is violated

DO NOT DEPLOY if patas safety-eval fails.

Configuration

Environment Variables

AGGRESSIVENESS_PROFILE=conservative  # conservative, balanced, or aggressive
SAFETY_MODE=CONSERVATIVE              # CONSERVATIVE, BALANCED, or OFF

YAML Configuration

aggressiveness_profile: conservative
safety_mode: CONSERVATIVE

Code References

app/v2_safety_mode.py - Safety mode definitions and enforcement
app/v2_pattern_quality_tiers.py - Pattern tier classification
app/v2_promotion.py - Profile-based pattern filtering
tests/test_pattern_safety_profiles.py - Safety profile tests

Last Updated: 2025-11-18

Safety Profiles

Safety Profiles

Overview

Pattern Quality Tiers

SAFE_AUTO

REVIEW_ONLY

FEATURE_ONLY

Conservative Profile

Balanced Profile

Aggressive Profile

Safety Guardrails

SafetyMode Enum

Enforcement Rules

Pattern Tier Classification

Production Recommendations

1. Start with Conservative Profile

2. Gradually Add Balanced Profile

3. Use Aggressive Profile for Research

4. Combine with Existing Systems

Running Safety Evaluation

Configuration

Environment Variables

YAML Configuration

Code References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!