Privacy and Data Protection

Privacy & Data Protection in PATAS

Purpose: This document describes how PATAS handles data privacy, what data it processes, and how to configure it for strict privacy requirements.

On-Prem Deployment by Default

PATAS is designed for on-premises deployment by default:

✅ Runs on your infrastructure
✅ No telemetry or external calls unless explicitly configured
✅ All data stays within your infrastructure
✅ LLM provider is configurable (internal endpoint or external, operator's choice)

Privacy Modes

STANDARD Mode (Default)

Configuration:

privacy_mode = "STANDARD"

Behavior:

✅ External LLM providers can be used (if configured by operator)
✅ Message texts can be included in test reports
✅ Full logging available for debugging
✅ Operator controls all external endpoints

Use Case: Development, testing, or when operator explicitly configures external services.

STRICT Mode

Configuration:

privacy_mode = "STRICT"

Behavior:

❌ External LLM providers disabled by default (unless explicitly configured to internal endpoint)
❌ Logs avoid storing full message texts (only ids + pattern ids / counts)
❌ No telemetry or external calls unless explicitly configured
✅ Only internal/on-prem LLM endpoints allowed
✅ Minimal data retention

Use Case: Production deployment with strict privacy requirements.

Data Processing

What Data PATAS Processes

Message Data:

text: Message content (can be hashed or dropped in STRICT mode)
id: Message identifier
timestamp: Message timestamp
is_spam: Spam label (True/False/None)
meta: JSON metadata (channel, language, country, etc.)

Pattern Data:

Pattern descriptions
Pattern examples (can be truncated/hashed in STRICT mode)
SQL rules (safe SELECT queries only)

Evaluation Data:

Rule evaluation metrics (precision, recall, coverage)
Match counts (spam/ham hits)

Data Retention

Configurable Retention

Settings:

log_retention_days: int = 30  # How long to keep logs
report_retention_days: int = 90  # How long to keep reports

In STRICT Mode:

Raw message texts can be dropped after pattern mining
Only aggregated statistics and pattern IDs are retained
Logs contain only message IDs and pattern matches, not full texts

LLM & External Services

LLM Provider Configuration

Operator Controls:

LLM provider is configurable by operator
No hardcoded external endpoints
Operator chooses: internal endpoint, external API, or disabled

Configuration:

llm_provider: str = "openai"  # or "local", "none", "disabled"
llm_api_endpoint: str = ""  # Internal endpoint URL (if using local provider)

In STRICT Mode:

External LLM providers disabled by default
Only internal/on-prem endpoints allowed
Operator must explicitly configure internal endpoint if LLM is needed

Data Minimization

What Can Be Dropped/Hashed

At Ingestion:

User identifiers (sender IDs, user names) can be hashed or dropped
Message texts can be hashed (for pattern matching without storing full text)
Metadata can be anonymized

In Logs:

Full message texts can be replaced with message IDs
Only pattern matches and counts stored
User identifiers can be hashed

Example Configuration:

# In STRICT mode, logs only contain:
{
  "message_id": "abc123",
  "pattern_id": 42,
  "match_count": 5,
  # No "text" field, no "sender" field
}

No Data Leakage

Guarantees

No Hardcoded External Calls: PATAS does not hardcode sending data to external services
Operator Controls Endpoints: All external endpoints are configured by operator
On-Prem by Default: All processing happens on-premises unless operator configures otherwise
No Telemetry: PATAS does not send telemetry or usage statistics to external services

Compliance & Audit

Audit Trail

PATAS can maintain audit logs (if enabled):

Pattern creation/modification
Rule promotion/deprecation
Safety evaluation results
Configuration changes

In STRICT Mode:

Audit logs contain only IDs and metadata, not full message texts
User identifiers can be hashed in audit logs

Configuration Examples

STRICT Privacy Configuration

# .env or config file
PRIVACY_MODE=STRICT
LLM_PROVIDER=none  # or "local" with internal endpoint
LOG_RETENTION_DAYS=7  # Shorter retention
REPORT_RETENTION_DAYS=30
ENABLE_LLM=false  # Disable LLM entirely if not needed

STANDARD Privacy Configuration

# .env or config file
PRIVACY_MODE=STANDARD
LLM_PROVIDER=openai  # or "local" with internal endpoint
LOG_RETENTION_DAYS=30
REPORT_RETENTION_DAYS=90
ENABLE_LLM=true

Data Flow

Ingestion

External Source (TAS logs, CSV)
  ↓
PATAS Ingestion
  ↓
[STRICT: Hash/drop user identifiers, truncate texts]
  ↓
Normalized Message Storage
  ↓
[STRICT: Only IDs + metadata stored]

Pattern Mining

Message Storage
  ↓
Pattern Mining Pipeline
  ↓
[STRICT: Aggregated signals only, no individual texts]
  ↓
LLM (if enabled, internal endpoint only in STRICT)
  ↓
Pattern + Rule Creation
  ↓
[STRICT: Examples truncated/hashed]

Evaluation

Rules
  ↓
Offline Evaluation
  ↓
[STRICT: Only metrics stored, no message texts]
  ↓
Safety Evaluation
  ↓
[STRICT: Only IDs + metrics in reports]

Summary

Privacy Guarantees:

✅ On-prem deployment by default
✅ No telemetry or external calls unless explicitly configured
✅ LLM provider configurable by operator (internal or external)
✅ STRICT mode: minimal data storage, no external LLM by default
✅ Data retention configurable
✅ User identifiers can be hashed/dropped
✅ Message texts can be hashed/truncated in STRICT mode

for integration:

PATAS runs on your infrastructure
You control all external endpoints
STRICT mode available for production
No data leaves your infrastructure unless you explicitly configure it

Last Updated: 2025-11-18

Privacy and Data Protection

Privacy & Data Protection in PATAS

On-Prem Deployment by Default

Privacy Modes

STANDARD Mode (Default)

STRICT Mode

Data Processing

What Data PATAS Processes

Data Retention

Configurable Retention

LLM & External Services

LLM Provider Configuration

Data Minimization

What Can Be Dropped/Hashed

No Data Leakage

Guarantees

Compliance & Audit

Audit Trail

Configuration Examples

STRICT Privacy Configuration

STANDARD Privacy Configuration

Data Flow

Ingestion

Pattern Mining

Evaluation

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!