Architecture

PATAS Core v2 Architecture

Version: 2.0

This document provides a high-level overview of PATAS Core v2 architecture, data models, services, and data flow.

Overview

PATAS Core is a generic engine for:

Analyzing large corpora of messages (spam / not_spam)
Discovering spam patterns automatically
Generating machine-readable blocking rules
Evaluating rules on real traffic
Promoting good rules and deactivating bad ones

Key Principle: PATAS Core is generic and reusable. It works with abstract domain models and can be wrapped by different integration layers without modification.

Data Models

Message

Normalized message storage from logs or CSV imports.

Fields:

id - Internal ID
external_id - External message ID (for idempotence)
timestamp - Message timestamp
text - Message text content
meta - JSON metadata (channel, language, country, etc.)
is_spam - Optional spam label (True/False/None)
tas_action - Action taken ('blocked' / 'allowed')
user_complaint - User-reported spam
unbanned - Whether message/user was unbanned

Pattern

Discovered spam patterns.

Fields:

id - Pattern ID
type - Pattern type (URL, PHONE, TEXT, META, SIGNATURE, KEYWORD)
description - Human-readable description
examples - Representative message texts (JSON array)

Rule

SQL blocking rules with lifecycle management.

Fields:

id - Rule ID
pattern_id - Associated pattern (optional)
sql_expression - Safe SELECT query
status - Lifecycle state (candidate → shadow → active → deprecated)
origin - Origin ('llm', 'pattern_mining', 'manual')
created_at, updated_at - Timestamps

RuleEvaluation

Evaluation metrics for rules.

Fields:

id - Evaluation ID
rule_id - Associated rule
time_period_start, time_period_end - Evaluation window
hits_total - Total messages matched
spam_hits - Spam messages matched
ham_hits - Non-spam messages matched
precision - spam_hits / hits_total
recall - (requires total spam count)
coverage - hits_total / total_messages

LLM Usage

Important: PATAS uses LLMs for offline pattern discovery only, not for real-time message classification.

LLM Role

✅ Pattern Discovery: Analyze aggregated spam signals to identify semantic patterns
✅ SQL Rule Generation: Propose SQL rules that catch spam variations
✅ SQL Quality Validation (optional): Assess false positive risks for generated rules
✅ Offline Only: All LLM processing happens during pattern mining, not during message evaluation

What LLMs Do NOT Do

❌ NOT used for real-time message classification
❌ NOT making ban/unban decisions
❌ NOT processing individual messages online

LLM Configuration

Provider: Configurable (OpenAI, local/on-prem endpoint, or disabled)
Privacy: On-prem deployment by default, no hardcoded external calls
Data: LLM sees only aggregated signals (top URLs, keywords, sample messages), not individual user messages
Validation: LLM quality validation (if enabled) uses the same client/API key as pattern mining

See LLM Usage for detailed documentation.

Core Services

Ingestion (`v2_ingestion.py`)

Purpose: Load messages from external sources into PATAS storage.

Components:

TASLogIngester - Ingest from TAS API or storage
CSVIngester - Ingest from CSV files
Idempotency handling via external_id

Flow:

Fetch messages from source
Normalize to Message model
Store in database (with deduplication)

Pattern Mining (`v2_pattern_mining.py`)

Purpose: Discover spam patterns from message corpus.

Components:

PatternMiningPipeline - Main orchestration
PatternMiningEngine - Abstract interface (implemented by LLM engine)
Chunked processing for large datasets
Pre-aggregation before LLM calls

Flow:

Load messages from storage
Aggregate by type (URLs, keywords, etc.)
Cluster similar messages (semantic or exact)
Generate pattern descriptions via LLM
Create Pattern records

Rule Lifecycle (`v2_rule_lifecycle.py`)

Purpose: Manage rule state transitions.

State Machine:

candidate → Newly discovered, not yet evaluated
shadow → Evaluated on historical data, not active
active → Deployed and monitoring
deprecated → Deactivated due to poor performance

Components:

RuleLifecycleService - State transitions
Validation before state changes
Audit logging

Shadow Evaluation (`v2_shadow_evaluation.py`)

Purpose: Evaluate rules on historical data without deploying them.

Components:

ShadowEvaluationService - Run SQL queries on message storage
Compute metrics (precision, recall, coverage, ham_rate)
Store results in RuleEvaluation

Flow:

Load shadow rules
Execute sql_expression on message storage
Compute metrics
Store RuleEvaluation records

Promotion (`v2_promotion.py`)

Purpose: Promote good rules to active, deprecate bad ones.

Components:

PromotionService - Review evaluation metrics
Apply safety profile thresholds
Export rules to external systems (via RuleBackend)

Flow:

Load shadow rules with recent evaluations
Check metrics against safety profile thresholds
Promote to active if thresholds met
Export to external system (your platform, etc.)
Deprecate active rules if metrics degrade

Rule Backend (`v2_rule_backend.py`)

Purpose: Export rules to external systems.

Interfaces:

RuleBackend - Abstract interface
SqlRuleBackend - Export as SQL
RolRuleBackend - Export as ROL (Rule Object Language)

Usage: Implement RuleBackend for your system (e.g., your platform rule engine).

Data Flow

Pattern Discovery Flow

Messages (Storage)
    ↓
Pattern Mining Pipeline
    ↓
Aggregation (URLs, keywords, etc.)
    ↓
Clustering (semantic or exact)
    ↓
LLM Pattern Description
    ↓
Pattern Records
    ↓
SQL Rule Generation
    ↓
Rule Records (status: candidate)

Rule Evaluation Flow

Rule (status: shadow)
    ↓
Shadow Evaluation Service
    ↓
Execute SQL on Messages
    ↓
Compute Metrics
    ↓
RuleEvaluation Records
    ↓
Promotion Service
    ↓
Check Safety Thresholds
    ↓
Promote to active OR Keep in shadow

Production Flow

Active Rules
    ↓
Export via RuleBackend
    ↓
External System (your platform, etc.)
    ↓
Monitor Performance
    ↓
Re-evaluate Periodically
    ↓
Deprecate if Metrics Degrade

Extension Points

Rule Backend

Implement RuleBackend interface to export rules to your system:

class MyRuleBackend(RuleBackend):
    def export_rule(self, rule: Rule) -> str:
        # Convert rule to your format
        return formatted_rule

LLM Engine

Implement PatternMiningEngine interface for custom LLM providers:

class MyLLMEngine(PatternMiningEngine):
    async def discover_patterns(self, signals: Dict) -> List[Pattern]:
        # Your LLM integration
        return patterns

Message Adapter

Extend MessageRepository or create adapter for custom message formats:

class MyMessageAdapter:
    def to_patas_message(self, raw_message: Dict) -> Message:
        # Convert to PATAS Message model
        return message

Safety and Validation

SQL Safety

All SQL rules are validated:

Only SELECT queries allowed
No DDL/DELETE/UPDATE/INSERT
Syntax validation
"Match everything" detection

See SQL Rule Generation for details.

Safety Profiles

Three profiles with different risk tolerances:

Conservative: High precision (≥98%), low false positive rate (≤1%)
Balanced: Moderate precision (≥95%), higher recall
Aggressive: Maximum recall, higher false positive rate

See Safety Profiles for details.

API Layer

The API layer (app/api/) is a thin orchestration layer over Core services:

No business logic in API
Delegates to Core services
Pydantic models for request/response
FastAPI for HTTP handling

See API Reference for endpoint documentation.

CLI

The CLI (app/cli.py) provides command-line access to Core services:

patas ingest-logs - Ingest messages
patas mine-patterns - Discover patterns
patas eval-rules - Evaluate shadow rules
patas promote-rules - Promote/deprecate rules
patas safety-eval - Run safety evaluation

Architecture

PATAS Core v2 Architecture

Overview

Data Models

Message

Pattern

Rule

RuleEvaluation

LLM Usage

LLM Role

What LLMs Do NOT Do

LLM Configuration

Core Services

Ingestion (v2_ingestion.py)

Pattern Mining (v2_pattern_mining.py)

Rule Lifecycle (v2_rule_lifecycle.py)

Shadow Evaluation (v2_shadow_evaluation.py)

Promotion (v2_promotion.py)

Rule Backend (v2_rule_backend.py)

Data Flow

Pattern Discovery Flow

Rule Evaluation Flow

Production Flow

Extension Points

Rule Backend

LLM Engine

Message Adapter

Safety and Validation

SQL Safety

Safety Profiles

API Layer

CLI

Related Documentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ingestion (`v2_ingestion.py`)

Pattern Mining (`v2_pattern_mining.py`)

Rule Lifecycle (`v2_rule_lifecycle.py`)

Shadow Evaluation (`v2_shadow_evaluation.py`)

Promotion (`v2_promotion.py`)

Rule Backend (`v2_rule_backend.py`)