Configuration Examples

This document provides example configurations for different deployment scenarios.

Model Provider Configurations

PATAS uses two separate engines that can be configured independently:

Embedding Engine – for semantic similarity, clustering, and pattern discovery
LLM Engine – for pattern explanation, rule generation, and LLM-based validation

Each engine supports openai (cloud) or local (on-premise) providers.

Example 1: OpenAI Profile (Default)

Cloud-based deployment using OpenAI's managed services.

Environment Variables (.env)

# Embedding Engine
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...

# LLM Engine
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
# OPENAI_API_KEY is reused for LLM

# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/patas

Python Config (app/config.py)

# Embedding Engine
embedding_provider = "openai"
embedding_model = "text-embedding-3-small"
embedding_api_key = os.getenv("OPENAI_API_KEY")

# LLM Engine
llm_provider = "openai"
llm_model = "gpt-4o-mini"
llm_api_key = os.getenv("OPENAI_API_KEY")

Example 2: Local Profile (On-Premise)

On-premise deployment using local models. Recommended models:

Embeddings: BAAI/bge-m3 – multilingual, strong for RU/UK/EN spam logs
LLM: mistralai/Mistral-7B-Instruct-v0.2 – compact, Apache 2.0, good at structured JSON

Environment Variables (.env)

# Embedding Engine
EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=BAAI/bge-m3
EMBEDDING_BASE_URL=http://localhost:8080/v1  # Your embedding service endpoint

# LLM Engine
LLM_PROVIDER=local
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_BASE_URL=http://localhost:8000/v1  # Your LLM service endpoint (vLLM/TGI/Ollama)

# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/patas

Python Config (app/config.py)

# Embedding Engine
embedding_provider = "local"
embedding_model = "BAAI/bge-m3"
embedding_base_url = "http://localhost:8080/v1"

# LLM Engine
llm_provider = "local"
llm_model = "mistralai/Mistral-7B-Instruct-v0.2"
llm_base_url = "http://localhost:8000/v1"

Integration Notes

When using local provider:

Embedding Service: Must expose an OpenAI-compatible /embeddings endpoint or implement LocalEmbeddingEngine interface.
LLM Service: Must expose an OpenAI-compatible /chat/completions endpoint (vLLM, TGI, Ollama, or custom gateway).
Model Identifiers: The *_model fields are just identifiers. For local models, use HuggingFace model IDs (e.g., BAAI/bge-m3) or your internal model names.
Base URLs: Point to your inference stack endpoints. The exact wiring (vLLM/TGI/gateway) is left to the integrator.

Example 3: Mixed Profile

Use OpenAI for embeddings, local LLM for rule generation.

Environment Variables (.env)

# Embedding Engine (OpenAI)
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...

# LLM Engine (Local)
LLM_PROVIDER=local
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_BASE_URL=http://localhost:8000/v1

Example 4: Disabled AI (Deterministic Only)

Run PATAS without any AI models, using only deterministic pattern mining.

Environment Variables (.env)

# Embedding Engine (disabled)
EMBEDDING_PROVIDER=none

# LLM Engine (disabled)
LLM_PROVIDER=none

# Two-stage processing will skip Stage 2 (semantic analysis)
ENABLE_TWO_STAGE_PROCESSING=false

YAML Configuration (config.example.yaml)

For YAML-based configuration:

pattern_mining:
  # Embedding Engine
  embedding_provider: openai  # or "local"
  embedding_model: text-embedding-3-small  # or "BAAI/bge-m3" for local
  embedding_base_url: ""  # For local: "http://localhost:8080/v1"
  embedding_api_key: "${OPENAI_API_KEY}"
  
  # LLM Engine
  llm_provider: openai  # or "local"
  llm_model: gpt-4o-mini  # or "mistralai/Mistral-7B-Instruct-v0.2" for local
  llm_base_url: ""  # For local: "http://localhost:8000/v1"
  llm_api_key: "${OPENAI_API_KEY}"

Security Notes

Never commit API keys or secrets to version control.
Use environment variables or secret management systems.
For on-premise deployments, ensure inference endpoints are properly secured and authenticated.
In PRIVACY_MODE=STRICT, external LLM providers are disabled by default unless explicitly configured to internal endpoints.

Configuration Examples

Configuration Examples

Model Provider Configurations

Example 1: OpenAI Profile (Default)

Environment Variables (.env)

Python Config (app/config.py)

Example 2: Local Profile (On-Premise)

Environment Variables (.env)

Python Config (app/config.py)

Integration Notes

Example 3: Mixed Profile

Environment Variables (.env)

Example 4: Disabled AI (Deterministic Only)

Environment Variables (.env)

YAML Configuration (config.example.yaml)

Security Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally