-
Notifications
You must be signed in to change notification settings - Fork 0
Privacy and Data Protection
Nick edited this page Nov 21, 2025
·
3 revisions
Purpose: This document describes how PATAS handles data privacy, what data it processes, and how to configure it for strict privacy requirements.
PATAS is designed for on-premises deployment by default:
- ✅ Runs on your infrastructure
- ✅ No telemetry or external calls unless explicitly configured
- ✅ All data stays within your infrastructure
- ✅ LLM provider is configurable (internal endpoint or external, operator's choice)
Configuration:
privacy_mode = "STANDARD"Behavior:
- ✅ External LLM providers can be used (if configured by operator)
- ✅ Message texts can be included in test reports
- ✅ Full logging available for debugging
- ✅ Operator controls all external endpoints
Use Case: Development, testing, or when operator explicitly configures external services.
Configuration:
privacy_mode = "STRICT"Behavior:
- ❌ External LLM providers disabled by default (unless explicitly configured to internal endpoint)
- ❌ Logs avoid storing full message texts (only ids + pattern ids / counts)
- ❌ No telemetry or external calls unless explicitly configured
- ✅ Only internal/on-prem LLM endpoints allowed
- ✅ Minimal data retention
Use Case: Production deployment with strict privacy requirements.
Message Data:
-
text: Message content (can be hashed or dropped in STRICT mode) -
id: Message identifier -
timestamp: Message timestamp -
is_spam: Spam label (True/False/None) -
meta: JSON metadata (channel, language, country, etc.)
Pattern Data:
- Pattern descriptions
- Pattern examples (can be truncated/hashed in STRICT mode)
- SQL rules (safe SELECT queries only)
Evaluation Data:
- Rule evaluation metrics (precision, recall, coverage)
- Match counts (spam/ham hits)
Settings:
log_retention_days: int = 30 # How long to keep logs
report_retention_days: int = 90 # How long to keep reportsIn STRICT Mode:
- Raw message texts can be dropped after pattern mining
- Only aggregated statistics and pattern IDs are retained
- Logs contain only message IDs and pattern matches, not full texts
Operator Controls:
- LLM provider is configurable by operator
- No hardcoded external endpoints
- Operator chooses: internal endpoint, external API, or disabled
Configuration:
llm_provider: str = "openai" # or "local", "none", "disabled"
llm_api_endpoint: str = "" # Internal endpoint URL (if using local provider)In STRICT Mode:
- External LLM providers disabled by default
- Only internal/on-prem endpoints allowed
- Operator must explicitly configure internal endpoint if LLM is needed
At Ingestion:
- User identifiers (sender IDs, user names) can be hashed or dropped
- Message texts can be hashed (for pattern matching without storing full text)
- Metadata can be anonymized
In Logs:
- Full message texts can be replaced with message IDs
- Only pattern matches and counts stored
- User identifiers can be hashed
Example Configuration:
# In STRICT mode, logs only contain:
{
"message_id": "abc123",
"pattern_id": 42,
"match_count": 5,
# No "text" field, no "sender" field
}- No Hardcoded External Calls: PATAS does not hardcode sending data to external services
- Operator Controls Endpoints: All external endpoints are configured by operator
- On-Prem by Default: All processing happens on-premises unless operator configures otherwise
- No Telemetry: PATAS does not send telemetry or usage statistics to external services
PATAS can maintain audit logs (if enabled):
- Pattern creation/modification
- Rule promotion/deprecation
- Safety evaluation results
- Configuration changes
In STRICT Mode:
- Audit logs contain only IDs and metadata, not full message texts
- User identifiers can be hashed in audit logs
# .env or config file
PRIVACY_MODE=STRICT
LLM_PROVIDER=none # or "local" with internal endpoint
LOG_RETENTION_DAYS=7 # Shorter retention
REPORT_RETENTION_DAYS=30
ENABLE_LLM=false # Disable LLM entirely if not needed# .env or config file
PRIVACY_MODE=STANDARD
LLM_PROVIDER=openai # or "local" with internal endpoint
LOG_RETENTION_DAYS=30
REPORT_RETENTION_DAYS=90
ENABLE_LLM=trueExternal Source (TAS logs, CSV)
↓
PATAS Ingestion
↓
[STRICT: Hash/drop user identifiers, truncate texts]
↓
Normalized Message Storage
↓
[STRICT: Only IDs + metadata stored]
Message Storage
↓
Pattern Mining Pipeline
↓
[STRICT: Aggregated signals only, no individual texts]
↓
LLM (if enabled, internal endpoint only in STRICT)
↓
Pattern + Rule Creation
↓
[STRICT: Examples truncated/hashed]
Rules
↓
Offline Evaluation
↓
[STRICT: Only metrics stored, no message texts]
↓
Safety Evaluation
↓
[STRICT: Only IDs + metrics in reports]
Privacy Guarantees:
- ✅ On-prem deployment by default
- ✅ No telemetry or external calls unless explicitly configured
- ✅ LLM provider configurable by operator (internal or external)
- ✅ STRICT mode: minimal data storage, no external LLM by default
- ✅ Data retention configurable
- ✅ User identifiers can be hashed/dropped
- ✅ Message texts can be hashed/truncated in STRICT mode
for integration:
- PATAS runs on your infrastructure
- You control all external endpoints
- STRICT mode available for production
- No data leaves your infrastructure unless you explicitly configure it
Last Updated: 2025-11-18