-
Notifications
You must be signed in to change notification settings - Fork 0
Configuration Examples
This document provides example configurations for different deployment scenarios.
PATAS uses two separate engines that can be configured independently:
- Embedding Engine – for semantic similarity, clustering, and pattern discovery
- LLM Engine – for pattern explanation, rule generation, and LLM-based validation
Each engine supports openai (cloud) or local (on-premise) providers.
Cloud-based deployment using OpenAI's managed services.
# Embedding Engine
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...
# LLM Engine
LLM_PROVIDER=openai
LLM_MODEL=gpt-4o-mini
# OPENAI_API_KEY is reused for LLM
# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/patas# Embedding Engine
embedding_provider = "openai"
embedding_model = "text-embedding-3-small"
embedding_api_key = os.getenv("OPENAI_API_KEY")
# LLM Engine
llm_provider = "openai"
llm_model = "gpt-4o-mini"
llm_api_key = os.getenv("OPENAI_API_KEY")On-premise deployment using local models. Recommended models:
-
Embeddings:
BAAI/bge-m3– multilingual, strong for RU/UK/EN spam logs -
LLM:
mistralai/Mistral-7B-Instruct-v0.2– compact, Apache 2.0, good at structured JSON
# Embedding Engine
EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=BAAI/bge-m3
EMBEDDING_BASE_URL=http://localhost:8080/v1 # Your embedding service endpoint
# LLM Engine
LLM_PROVIDER=local
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_BASE_URL=http://localhost:8000/v1 # Your LLM service endpoint (vLLM/TGI/Ollama)
# Database
DATABASE_URL=postgresql+asyncpg://user:pass@localhost/patas# Embedding Engine
embedding_provider = "local"
embedding_model = "BAAI/bge-m3"
embedding_base_url = "http://localhost:8080/v1"
# LLM Engine
llm_provider = "local"
llm_model = "mistralai/Mistral-7B-Instruct-v0.2"
llm_base_url = "http://localhost:8000/v1"When using local provider:
-
Embedding Service: Must expose an OpenAI-compatible
/embeddingsendpoint or implementLocalEmbeddingEngineinterface. -
LLM Service: Must expose an OpenAI-compatible
/chat/completionsendpoint (vLLM, TGI, Ollama, or custom gateway). -
Model Identifiers: The
*_modelfields are just identifiers. For local models, use HuggingFace model IDs (e.g.,BAAI/bge-m3) or your internal model names. -
Base URLs: Point to your inference stack endpoints. The exact wiring (vLLM/TGI/gateway) is left to the integrator.
Use OpenAI for embeddings, local LLM for rule generation.
# Embedding Engine (OpenAI)
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...
# LLM Engine (Local)
LLM_PROVIDER=local
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_BASE_URL=http://localhost:8000/v1Run PATAS without any AI models, using only deterministic pattern mining.
# Embedding Engine (disabled)
EMBEDDING_PROVIDER=none
# LLM Engine (disabled)
LLM_PROVIDER=none
# Two-stage processing will skip Stage 2 (semantic analysis)
ENABLE_TWO_STAGE_PROCESSING=falseFor YAML-based configuration:
pattern_mining:
# Embedding Engine
embedding_provider: openai # or "local"
embedding_model: text-embedding-3-small # or "BAAI/bge-m3" for local
embedding_base_url: "" # For local: "http://localhost:8080/v1"
embedding_api_key: "${OPENAI_API_KEY}"
# LLM Engine
llm_provider: openai # or "local"
llm_model: gpt-4o-mini # or "mistralai/Mistral-7B-Instruct-v0.2" for local
llm_base_url: "" # For local: "http://localhost:8000/v1"
llm_api_key: "${OPENAI_API_KEY}"- Never commit API keys or secrets to version control.
- Use environment variables or secret management systems.
- For on-premise deployments, ensure inference endpoints are properly secured and authenticated.
- In
PRIVACY_MODE=STRICT, external LLM providers are disabled by default unless explicitly configured to internal endpoints.