Skip to content

Configuration Examples

Nick edited this page Nov 21, 2025 · 1 revision

Configuration Examples

This document provides example configurations for different deployment scenarios.

Model Provider Configurations

PATAS uses two separate engines that can be configured independently:

  • Embedding Engine – for semantic similarity, clustering, and pattern discovery
  • LLM Engine – for pattern explanation, rule generation, and LLM-based validation

Each engine supports openai (cloud) or local (on-premise) providers.


Example 1: OpenAI Profile (Default)

Cloud-based deployment using OpenAI's managed services.

Environment Variables (.env)

# Embedding Engine
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...

# LLM Engine
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
# OPENAI_API_KEY is reused for LLM

# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/patas

Python Config (app/config.py)

# Embedding Engine
embedding_provider = "openai"
embedding_model = "text-embedding-3-small"
embedding_api_key = os.getenv("OPENAI_API_KEY")

# LLM Engine
llm_provider = "openai"
llm_model = "gpt-4o-mini"
llm_api_key = os.getenv("OPENAI_API_KEY")

Example 2: Local Profile (On-Premise)

On-premise deployment using local models. Recommended models:

  • Embeddings: BAAI/bge-m3 – multilingual, strong for RU/UK/EN spam logs
  • LLM: mistralai/Mistral-7B-Instruct-v0.2 – compact, Apache 2.0, good at structured JSON

Environment Variables (.env)

# Embedding Engine
EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=BAAI/bge-m3
EMBEDDING_BASE_URL=http://localhost:8080/v1  # Your embedding service endpoint

# LLM Engine
LLM_PROVIDER=local
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_BASE_URL=http://localhost:8000/v1  # Your LLM service endpoint (vLLM/TGI/Ollama)

# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/patas

Python Config (app/config.py)

# Embedding Engine
embedding_provider = "local"
embedding_model = "BAAI/bge-m3"
embedding_base_url = "http://localhost:8080/v1"

# LLM Engine
llm_provider = "local"
llm_model = "mistralai/Mistral-7B-Instruct-v0.2"
llm_base_url = "http://localhost:8000/v1"

Integration Notes

When using local provider:

  1. Embedding Service: Must expose an OpenAI-compatible /embeddings endpoint or implement LocalEmbeddingEngine interface.

  2. LLM Service: Must expose an OpenAI-compatible /chat/completions endpoint (vLLM, TGI, Ollama, or custom gateway).

  3. Model Identifiers: The *_model fields are just identifiers. For local models, use HuggingFace model IDs (e.g., BAAI/bge-m3) or your internal model names.

  4. Base URLs: Point to your inference stack endpoints. The exact wiring (vLLM/TGI/gateway) is left to the integrator.


Example 3: Mixed Profile

Use OpenAI for embeddings, local LLM for rule generation.

Environment Variables (.env)

# Embedding Engine (OpenAI)
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...

# LLM Engine (Local)
LLM_PROVIDER=local
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_BASE_URL=http://localhost:8000/v1

Example 4: Disabled AI (Deterministic Only)

Run PATAS without any AI models, using only deterministic pattern mining.

Environment Variables (.env)

# Embedding Engine (disabled)
EMBEDDING_PROVIDER=none

# LLM Engine (disabled)
LLM_PROVIDER=none

# Two-stage processing will skip Stage 2 (semantic analysis)
ENABLE_TWO_STAGE_PROCESSING=false

YAML Configuration (config.example.yaml)

For YAML-based configuration:

pattern_mining:
  # Embedding Engine
  embedding_provider: openai  # or "local"
  embedding_model: text-embedding-3-small  # or "BAAI/bge-m3" for local
  embedding_base_url: ""  # For local: "http://localhost:8080/v1"
  embedding_api_key: "${OPENAI_API_KEY}"
  
  # LLM Engine
  llm_provider: openai  # or "local"
  llm_model: gpt-4o-mini  # or "mistralai/Mistral-7B-Instruct-v0.2" for local
  llm_base_url: ""  # For local: "http://localhost:8000/v1"
  llm_api_key: "${OPENAI_API_KEY}"

Security Notes

  • Never commit API keys or secrets to version control.
  • Use environment variables or secret management systems.
  • For on-premise deployments, ensure inference endpoints are properly secured and authenticated.
  • In PRIVACY_MODE=STRICT, external LLM providers are disabled by default unless explicitly configured to internal endpoints.

Clone this wiki locally