Safety Guide

PATAS Safety & Enforcement Guide

Purpose: This document explains how PATAS should be integrated into spam detection infrastructure, with clear safety boundaries and enforcement recommendations.

Enforcement & Safety Model

PATAS is designed as a high-precision spam pattern signal engine, not as a standalone enforcement system.

Core Architecture: PATAS Core is a deterministic, rule-based engine. ML/LLM integration (if used) is optional and external, running inside your infrastructure for pattern discovery.

Core Principle

PATAS provides signals and patterns. You control enforcement decisions.

We strongly recommend:

✅ Conservative Profile → Low-Risk Automatic Actions

Use the Conservative profile only for low-impact automatic actions:

✅ Spam labeling (marking messages as spam for filtering)
✅ Hiding / deprioritizing messages (reducing visibility without blocking)
✅ Rate-limiting (temporary throttling of suspicious senders)
✅ Soft actions that can be easily reversed

Metrics:

Ham hit rate: ≤ 1.5%
Spam recall: 20-40%
Precision: ≥ 98%

Safety: Conservative profile is safe for automatic actions because it only includes SAFE_AUTO tier patterns with very low false positive rates.

⚠️ Balanced / Aggressive Profiles → Signals Only

Use Balanced and Aggressive profiles as signals only:

✅ Input features into your existing risk-scoring systems
✅ Prioritization for manual review queues
✅ Research / experiments for pattern discovery
❌ NOT for auto-ban or irreversible actions

Metrics:

Balanced: 9.75% ham hit rate, 66.87% recall
Aggressive: 19.12% ham hit rate, 72.84% recall

Why signals only: These profiles have higher recall but also higher false positive rates. They should be combined with other signals before making enforcement decisions.

❌ Permanent / High-Impact Enforcement

Keep permanent or high-impact enforcement (account bans, global blocks, long-term penalties) under your existing systems that combine:

✅ User reports
✅ Account history and trust scores
✅ Device / network heuristics
✅ Behavioral patterns
✅ PATAS signals as just one of the inputs

We do NOT recommend wiring PATAS patterns directly into irreversible enforcement without combining them with your internal signals and policies.

Safety Profiles Overview

Profile	Patterns	Ham Hit Rate	Spam Recall	Use Case	Auto-Ban?
Conservative	SAFE_AUTO only	≤ 1.5%	20-40%	Low-impact auto actions	✅ Yes (low-risk)
Balanced	SAFE_AUTO + top REVIEW_ONLY	≤ 12%	≥ 60%	Signals, prioritization	❌ No
Aggressive	All except FEATURE_ONLY	≤ 20%	≥ 70%	Research, experiments	❌ No

Rule Safety Categories

PATAS classifies rules into three safety categories:

AUTO_SAFE (Ready for Automation)

Precision: ≥ 95% (from shadow evaluation)
False Positive Rate: ≤ 1% (precision ≥ 99%)
Minimum hits: ≥ 10 (sufficient sample size)
Usage: Can be auto-promoted to active rules for Conservative profile
Criteria: Must pass deterministic checks AND (high precision evaluation OR whitelist patterns with AND conditions)
Description: High confidence, low risk (<10%). Specific rules without false positives. Safe for automatic application.

REQUIRES_REVIEW (Requires Human Verification)

Precision: 90-98% OR ham hit rate 1-5%
Usage: Requires manual review before promotion
Use in: Balanced profile (top patterns only)
Description: Good pattern, but may catch edge cases. Recommended for quarantine or manual review.

DANGEROUS (Insight Only - High Risk)

Pattern characteristics: Too broad (e.g., "Hello", "work", "time", stop words, very short patterns <3 chars)
SQL status: Automatically commented out to prevent accidental execution
Usage: Insight only - useful for understanding attack trends, NOT for enforcement
Description: Pattern is too broad and would cause many false positives. SQL is commented out. Useful for research and understanding spam evolution.

Running Safety Evaluation

Before deploying PATAS patterns to production:

patas safety-eval

This command:

Evaluates all profiles against safety thresholds
Generates SAFETY_EVAL_REPORT.json
Exits with code 0 if all thresholds pass, non-zero if any threshold is violated

DO NOT DEPLOY if patas safety-eval fails.

Integration Recommendations

1. Start with Conservative Profile

Deploy only Conservative profile patterns initially
Monitor ham hit rate in production logs
Track user complaints about false positives

2. Gradually Add Balanced Profile

Use Balanced profile patterns as signals only
Combine with existing risk-scoring systems
Do NOT use Balanced patterns for auto-ban

3. Use Aggressive Profile for Research

Aggressive profile is for offline analysis only
Use for pattern discovery and experiments
Never use for production enforcement

Safety Guardrails in Code

PATAS includes safety guardrails enforced in code:

SafetyMode enum: CONSERVATIVE, BALANCED, OFF
Conservative mode: Only AUTO_SAFE patterns can trigger auto-ban
Balanced mode: Patterns produce scores/flags only, never auto-ban
SQL safety: Only SELECT queries, whitelisted columns, no match-everything rules
Rule Safety Classifier: Three-category system (AUTO_SAFE / REQUIRES_REVIEW / DANGEROUS)
- AUTO_SAFE: Ready for Automation - Precision ≥ 95%, FPR ≤ 1%, whitelist patterns with AND conditions. Can be automatically applied.
- REQUIRES_REVIEW: Requires Human Verification - Good pattern, but may catch edge cases. Recommended for quarantine or manual review.
- DANGEROUS: Insight Only (High Risk) - Pattern too broad (e.g., "Hello", "work", "time"). SQL is commented out to prevent accidental execution. Useful for understanding attack trends.
Auto-fix SQL errors: Simple SQL errors automatically fixed in shadow evaluation
Stop words filtering: Common words blocked to prevent false positives

See app/v2_safety_mode.py, app/v2_sql_safety.py, and app/v2_rule_safety_classifier.py for implementation details.

Monitoring & Rollback

Continuous Monitoring

After deployment:

Monitor ham hit rate in production logs
Track user complaints about false positives
Re-run patas safety-eval if:
- Pattern mining discovers new patterns
- Pattern quality thresholds change
- Significant changes to pattern SQL generation

Rollback Procedure

If safety thresholds are violated:

Immediately disable affected patterns
Review SAFETY_EVAL_REPORT.json for violations
Fix issues before re-enabling patterns
Re-run patas safety-eval to confirm fixes

Summary

TL;DR for Engineers:

✅ Use Conservative profile for low-impact auto actions (labeling, hiding, throttling)
⚠️ Use Balanced profile as signals only (NOT for auto-ban)
❌ Use Aggressive profile for research only (NOT for production)
🔒 Keep permanent enforcement (bans, blocks) in your existing systems
✅ Always run patas safety-eval before deployment
📊 Monitor ham hit rates and user complaints continuously

PATAS provides the "brain" for pattern detection. You control the "trigger" for enforcement.

Last Updated: 2025-11-21

Safety Guide

PATAS Safety & Enforcement Guide

Enforcement & Safety Model

Core Principle

✅ Conservative Profile → Low-Risk Automatic Actions

⚠️ Balanced / Aggressive Profiles → Signals Only

❌ Permanent / High-Impact Enforcement

Safety Profiles Overview

Rule Safety Categories

AUTO_SAFE (Ready for Automation)

REQUIRES_REVIEW (Requires Human Verification)

DANGEROUS (Insight Only - High Risk)

Running Safety Evaluation

Integration Recommendations

1. Start with Conservative Profile

2. Gradually Add Balanced Profile

3. Use Aggressive Profile for Research

Safety Guardrails in Code

Monitoring & Rollback

Continuous Monitoring

Rollback Procedure

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!