Skip to content

Architecture

Nick edited this page Nov 21, 2025 · 2 revisions

PATAS Core v2 Architecture

Version: 2.0

This document provides a high-level overview of PATAS Core v2 architecture, data models, services, and data flow.


Overview

PATAS Core is a generic engine for:

  • Analyzing large corpora of messages (spam / not_spam)
  • Discovering spam patterns automatically
  • Generating machine-readable blocking rules
  • Evaluating rules on real traffic
  • Promoting good rules and deactivating bad ones

Key Principle: PATAS Core is generic and reusable. It works with abstract domain models and can be wrapped by different integration layers without modification.


Data Models

Message

Normalized message storage from logs or CSV imports.

Fields:

  • id - Internal ID
  • external_id - External message ID (for idempotence)
  • timestamp - Message timestamp
  • text - Message text content
  • meta - JSON metadata (channel, language, country, etc.)
  • is_spam - Optional spam label (True/False/None)
  • tas_action - Action taken ('blocked' / 'allowed')
  • user_complaint - User-reported spam
  • unbanned - Whether message/user was unbanned

Pattern

Discovered spam patterns.

Fields:

  • id - Pattern ID
  • type - Pattern type (URL, PHONE, TEXT, META, SIGNATURE, KEYWORD)
  • description - Human-readable description
  • examples - Representative message texts (JSON array)

Rule

SQL blocking rules with lifecycle management.

Fields:

  • id - Rule ID
  • pattern_id - Associated pattern (optional)
  • sql_expression - Safe SELECT query
  • status - Lifecycle state (candidate → shadow → active → deprecated)
  • origin - Origin ('llm', 'pattern_mining', 'manual')
  • created_at, updated_at - Timestamps

RuleEvaluation

Evaluation metrics for rules.

Fields:

  • id - Evaluation ID
  • rule_id - Associated rule
  • time_period_start, time_period_end - Evaluation window
  • hits_total - Total messages matched
  • spam_hits - Spam messages matched
  • ham_hits - Non-spam messages matched
  • precision - spam_hits / hits_total
  • recall - (requires total spam count)
  • coverage - hits_total / total_messages

LLM Usage

Important: PATAS uses LLMs for offline pattern discovery only, not for real-time message classification.

LLM Role

  • Pattern Discovery: Analyze aggregated spam signals to identify semantic patterns
  • SQL Rule Generation: Propose SQL rules that catch spam variations
  • SQL Quality Validation (optional): Assess false positive risks for generated rules
  • Offline Only: All LLM processing happens during pattern mining, not during message evaluation

What LLMs Do NOT Do

  • NOT used for real-time message classification
  • NOT making ban/unban decisions
  • NOT processing individual messages online

LLM Configuration

  • Provider: Configurable (OpenAI, local/on-prem endpoint, or disabled)
  • Privacy: On-prem deployment by default, no hardcoded external calls
  • Data: LLM sees only aggregated signals (top URLs, keywords, sample messages), not individual user messages
  • Validation: LLM quality validation (if enabled) uses the same client/API key as pattern mining

See LLM Usage for detailed documentation.


Core Services

Ingestion (v2_ingestion.py)

Purpose: Load messages from external sources into PATAS storage.

Components:

  • TASLogIngester - Ingest from TAS API or storage
  • CSVIngester - Ingest from CSV files
  • Idempotency handling via external_id

Flow:

  1. Fetch messages from source
  2. Normalize to Message model
  3. Store in database (with deduplication)

Pattern Mining (v2_pattern_mining.py)

Purpose: Discover spam patterns from message corpus.

Components:

  • PatternMiningPipeline - Main orchestration
  • PatternMiningEngine - Abstract interface (implemented by LLM engine)
  • Chunked processing for large datasets
  • Pre-aggregation before LLM calls

Flow:

  1. Load messages from storage
  2. Aggregate by type (URLs, keywords, etc.)
  3. Cluster similar messages (semantic or exact)
  4. Generate pattern descriptions via LLM
  5. Create Pattern records

Rule Lifecycle (v2_rule_lifecycle.py)

Purpose: Manage rule state transitions.

State Machine:

  • candidate → Newly discovered, not yet evaluated
  • shadow → Evaluated on historical data, not active
  • active → Deployed and monitoring
  • deprecated → Deactivated due to poor performance

Components:

  • RuleLifecycleService - State transitions
  • Validation before state changes
  • Audit logging

Shadow Evaluation (v2_shadow_evaluation.py)

Purpose: Evaluate rules on historical data without deploying them.

Components:

  • ShadowEvaluationService - Run SQL queries on message storage
  • Compute metrics (precision, recall, coverage, ham_rate)
  • Store results in RuleEvaluation

Flow:

  1. Load shadow rules
  2. Execute sql_expression on message storage
  3. Compute metrics
  4. Store RuleEvaluation records

Promotion (v2_promotion.py)

Purpose: Promote good rules to active, deprecate bad ones.

Components:

  • PromotionService - Review evaluation metrics
  • Apply safety profile thresholds
  • Export rules to external systems (via RuleBackend)

Flow:

  1. Load shadow rules with recent evaluations
  2. Check metrics against safety profile thresholds
  3. Promote to active if thresholds met
  4. Export to external system (your platform, etc.)
  5. Deprecate active rules if metrics degrade

Rule Backend (v2_rule_backend.py)

Purpose: Export rules to external systems.

Interfaces:

  • RuleBackend - Abstract interface
  • SqlRuleBackend - Export as SQL
  • RolRuleBackend - Export as ROL (Rule Object Language)

Usage: Implement RuleBackend for your system (e.g., your platform rule engine).


Data Flow

Pattern Discovery Flow

Messages (Storage)
    ↓
Pattern Mining Pipeline
    ↓
Aggregation (URLs, keywords, etc.)
    ↓
Clustering (semantic or exact)
    ↓
LLM Pattern Description
    ↓
Pattern Records
    ↓
SQL Rule Generation
    ↓
Rule Records (status: candidate)

Rule Evaluation Flow

Rule (status: shadow)
    ↓
Shadow Evaluation Service
    ↓
Execute SQL on Messages
    ↓
Compute Metrics
    ↓
RuleEvaluation Records
    ↓
Promotion Service
    ↓
Check Safety Thresholds
    ↓
Promote to active OR Keep in shadow

Production Flow

Active Rules
    ↓
Export via RuleBackend
    ↓
External System (your platform, etc.)
    ↓
Monitor Performance
    ↓
Re-evaluate Periodically
    ↓
Deprecate if Metrics Degrade

Extension Points

Rule Backend

Implement RuleBackend interface to export rules to your system:

class MyRuleBackend(RuleBackend):
    def export_rule(self, rule: Rule) -> str:
        # Convert rule to your format
        return formatted_rule

LLM Engine

Implement PatternMiningEngine interface for custom LLM providers:

class MyLLMEngine(PatternMiningEngine):
    async def discover_patterns(self, signals: Dict) -> List[Pattern]:
        # Your LLM integration
        return patterns

Message Adapter

Extend MessageRepository or create adapter for custom message formats:

class MyMessageAdapter:
    def to_patas_message(self, raw_message: Dict) -> Message:
        # Convert to PATAS Message model
        return message

Safety and Validation

SQL Safety

All SQL rules are validated:

  • Only SELECT queries allowed
  • No DDL/DELETE/UPDATE/INSERT
  • Syntax validation
  • "Match everything" detection

See SQL Rule Generation for details.

Safety Profiles

Three profiles with different risk tolerances:

  • Conservative: High precision (≥98%), low false positive rate (≤1%)
  • Balanced: Moderate precision (≥95%), higher recall
  • Aggressive: Maximum recall, higher false positive rate

See Safety Profiles for details.


API Layer

The API layer (app/api/) is a thin orchestration layer over Core services:

  • No business logic in API
  • Delegates to Core services
  • Pydantic models for request/response
  • FastAPI for HTTP handling

See API Reference for endpoint documentation.


CLI

The CLI (app/cli.py) provides command-line access to Core services:

  • patas ingest-logs - Ingest messages
  • patas mine-patterns - Discover patterns
  • patas eval-rules - Evaluate shadow rules
  • patas promote-rules - Promote/deprecate rules
  • patas safety-eval - Run safety evaluation

Related Documentation

  • Code Overview - Detailed code structure
  • Configuration - Configuration options
  • [Engineering Notes for integration](Engineering-Notes-for-your platform) - your platform-specific guidance
  • Safety Profiles - Safety profile details

Clone this wiki locally