Engineering Notes

Purpose: This document provides a concise technical overview of PATAS for engineering teams, focusing on safety, enforcement model, SQL rule constraints, LLM usage, and privacy.

Read This First

What PATAS Is

PATAS is a deterministic, rule-based spam pattern signal engine, not an enforcement system.

Key Points:

✅ Signal engine, not enforcement - PATAS provides patterns and signals. You control enforcement decisions.
✅ No ML/LLM dependency - Core is deterministic and rule-based. ML/LLM (if used) is optional and external.
✅ Safe Conservative profile - Recommended for production auto-actions (spam labeling, hiding, throttling).
✅ No outbound data flow - All processing happens inside your infrastructure by default.

Key Documentation

Safety Guide - Safety profiles, enforcement model, guardrails
Privacy and Data Protection - Privacy modes, data retention, on-prem deployment
LLM Usage - LLM role (optional), what data it sees, privacy guarantees

Minimal Safe Deployment Recipe

For first deployment, use these settings:

# config.yaml
privacy_mode: STRICT          # No external LLM, minimal logging
aggressiveness_profile: conservative  # Only SAFE_AUTO patterns
safety_mode: CONSERVATIVE     # Only SAFE_AUTO can trigger auto-ban
enable_llm: false             # No LLM required (deterministic rule-based)

Before deployment, always run:

patas safety-eval

This validates all profiles against safety thresholds. DO NOT DEPLOY if it fails (exit code ≠ 0).

Production recommendation:

Start with Conservative profile only for auto-actions
Use Balanced/Aggressive profiles as signals only (NOT for auto-ban)
Monitor ham hit rates and user complaints continuously

What PATAS Does

PATAS is a deterministic, rule-based pattern engine that:

Discovers spam patterns from historical message data (offline)
Generates SQL rules that match spam patterns (safe SELECT queries only)
Evaluates rules on historical data to measure precision/recall
Classifies rules into quality tiers (SAFE_AUTO / REVIEW_ONLY / FEATURE_ONLY)
Provides signals to your existing spam detection systems

Key Point: PATAS provides signals and patterns. You control enforcement decisions.

Safety Profiles

PATAS patterns are organized into three safety profiles:

Conservative Profile

Patterns: Only SAFE_AUTO tier (precision ≥98%, ham hit rate ≤1%)
Metrics: 24.91% recall, 1.11% ham rate
Use Case: ✅ Low-impact auto actions (spam labeling, hiding, throttling)
Auto-Ban: ✅ Yes (low-risk only)

Balanced Profile

Patterns: SAFE_AUTO + top REVIEW_ONLY (precision ≥95%)
Metrics: 66.87% recall, 9.75% ham rate
Use Case: ⚠️ Signals only (ML features, prioritization, soft actions)
Auto-Ban: ❌ No (signals only)

Aggressive Profile

Patterns: All except FEATURE_ONLY
Metrics: 72.84% recall, 19.12% ham rate
Use Case: ❌ Research only (offline analysis, experiments)
Auto-Ban: ❌ No (research only)

Recommendation: Start with Conservative profile for production. Use Balanced/Aggressive as signals only.

SQL Rule Safety

Constraints

All SQL rules are validated for safety:

Only SELECT queries: No UPDATE/DELETE/INSERT/ALTER
Whitelisted tables: Only messages, reports (configurable)
Whitelisted columns: Only safe columns (id, text, is_spam, timestamp, language, sender, source, country, has_media, etc.)
No semicolons: Prevents command chaining
No subqueries: Prevents complex SQL injection
No match-everything: Rejects WHERE 1=1, empty WHERE, rules matching >80% of messages

Validation Flow

LLM/Pattern Mining → SQL Rule Proposal
  ↓
SQL Safety Validation (app/v2_sql_safety.py)
  - Check whitelisted tables/columns
  - Check for dangerous operations
  - Check for match-everything patterns
  ↓
Coverage Check (if rule matches >80% of messages, reject)
  ↓
LLM Quality Validation (optional, if LLM available)
  - Assess false positive risks (low/medium/high)
  - Reject high-risk rules
  - Log warnings for medium-risk rules
  ↓
Offline Evaluation (test on historical data)
  ↓
Tier Classification (SAFE_AUTO / REVIEW_ONLY / FEATURE_ONLY)
  ↓
Safety Profile Assignment (Conservative / Balanced / Aggressive)

Example Safe SQL Rule

SELECT id, is_spam FROM messages 
WHERE LOWER(text) LIKE '%earn money%' 
  OR LOWER(text) LIKE '%get rich%' 
  OR LOWER(text) LIKE '%make cash%'

LLM Role and Quality (Optional / External)

Important: PATAS Core does NOT require or ship with LLM. LLM integration (if used) is optional and external, running inside your infrastructure.

If You Use LLM (Optional)

If you integrate LLM for pattern discovery:

✅ Runs inside your infrastructure - LLM runs on your servers, under your control
✅ Offline pattern discovery - Analyzes aggregated spam signals (not individual messages)
✅ SQL rule generation - Proposes safe SELECT queries
✅ Semantic pattern identification - Finds patterns based on meaning, not exact words
✅ Generates candidate patterns - LLM outputs go through same PATAS evaluation pipeline

What LLM Does NOT Do (Even If Integrated)

❌ NOT used for real-time message classification
❌ NOT making ban/unban decisions
❌ NOT processing individual messages online
❌ NOT sending data to external services by default (unless you explicitly configure it)

Default: No LLM Required

PATAS Core works without LLM:

✅ Rule-based pattern mining (default, no LLM required)
✅ Deterministic evaluation and tiering
✅ Safety profiles and guardrails
✅ No external API dependencies

LLM Data Privacy

What LLM Sees (during pattern mining):

Aggregated signals: Top URLs, top keywords, sample spam messages (limited to 10 examples)
NOT individual user messages in real-time
NOT full message history
NOT user identifiers (sender IDs, user names, etc.)

Privacy Guarantees:

On-prem deployment by default
LLM provider configurable by operator (internal endpoint or external)
No hardcoded external calls
STRICT privacy mode available (disables external LLM by default)

Privacy & Data Protection

On-Prem Deployment

PATAS is designed for on-premises deployment by default:

✅ Runs on your infrastructure
✅ No telemetry or external calls unless explicitly configured
✅ All data stays within your infrastructure
✅ LLM provider is configurable (internal endpoint or external, operator's choice)

Privacy Modes

STANDARD Mode (default):

External LLM providers can be used (if configured by operator)
Full logging available for debugging
Operator controls all external endpoints

STRICT Mode:

External LLM providers disabled by default
Logs avoid storing full message texts (only ids + pattern ids / counts)
No telemetry or external calls unless explicitly configured
Only internal/on-prem LLM endpoints allowed

Safety Evaluation

Pre-Deployment Check

Before deploying PATAS patterns to production:

patas safety-eval

This command:

Evaluates all profiles against safety thresholds
Generates SAFETY_EVAL_REPORT.json
Exits with code 0 if all thresholds pass, non-zero if any threshold is violated

DO NOT DEPLOY if patas safety-eval fails.

Safety Thresholds

Conservative Profile:

Ham hit rate ≤ 1.5%
Spam recall ≥ 20% and ≤ 40%
Precision ≥ 98%

Balanced Profile:

Ham hit rate ≤ 12%
Spam recall ≥ 60%
Precision ≥ 90%

Aggressive Profile:

Ham hit rate ≤ 20%
Spam recall ≥ 70%
Precision ≥ 85%

Integration Recommendations

1. Start with Conservative Profile

Deploy only Conservative profile patterns initially
Use for low-impact auto actions (spam labeling, hiding, throttling)
Monitor ham hit rate in production logs
Track user complaints about false positives

2. Gradually Add Balanced Profile

Use Balanced profile patterns as signals only
Combine with existing risk-scoring systems
Do NOT use Balanced patterns for auto-ban

3. Use Aggressive Profile for Research

Aggressive profile is for offline analysis only
Use for pattern discovery and experiments
Never use for production enforcement

4. Keep Permanent Enforcement in Your Systems

Account bans, global blocks, long-term penalties should be decided by your existing systems
Combine PATAS signals with:
- User reports
- Account history and trust scores
- Device / network heuristics
- Behavioral patterns

Key Takeaways

PATAS is a signal engine, not an enforcement system
- Provides patterns and signals
- You control enforcement decisions
Conservative profile is safe for low-impact auto actions
- 24.91% recall, 1.11% ham rate
- Only SAFE_AUTO patterns
- Suitable for spam labeling, hiding, throttling
Balanced/Aggressive profiles are signals only
- Higher recall but higher false positive rates
- Use as ML features, prioritization, research
- NOT for auto-ban
SQL rules are heavily constrained
- Only SELECT queries
- Whitelisted tables/columns
- No match-everything patterns
- Coverage limits (>80% = reject)
LLM is used offline for pattern discovery only
- NOT for real-time classification
- NOT for ban/unban decisions
- Sees only aggregated signals, not individual messages
Privacy is configurable
- On-prem deployment by default
- STRICT mode available (no external LLM, minimal logging)
- Operator controls all external endpoints

Last Updated: 2025-11-21

Engineering Notes

Engineering Notes

Read This First

What PATAS Is

Key Documentation

Minimal Safe Deployment Recipe

What PATAS Does

Safety Profiles

Conservative Profile

Balanced Profile

Aggressive Profile

SQL Rule Safety

Constraints

Validation Flow

Example Safe SQL Rule

LLM Role and Quality (Optional / External)

If You Use LLM (Optional)

What LLM Does NOT Do (Even If Integrated)

Default: No LLM Required

LLM Data Privacy

Privacy & Data Protection

On-Prem Deployment

Privacy Modes

Safety Evaluation

Pre-Deployment Check

Safety Thresholds

Integration Recommendations

1. Start with Conservative Profile

2. Gradually Add Balanced Profile

3. Use Aggressive Profile for Research

4. Keep Permanent Enforcement in Your Systems

Key Takeaways

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!