-
Notifications
You must be signed in to change notification settings - Fork 0
Local Model Integration
PATAS now supports local HTTP-based engines for both embeddings and LLM, enabling on-premise deployments without requiring heavy ML frameworks in PATAS core.
PATAS uses two separate engines that can be configured independently:
- Embedding Engine – generates semantic embeddings for message similarity analysis, clustering, and pattern discovery
- LLM Engine – performs pattern explanation, rule generation, and LLM-based validation
Each engine supports two provider modes:
-
openai(default) – uses OpenAI's managed API services -
local– uses on-premise models via HTTP endpoints
The LocalHttpEmbeddingEngine calls a local/self-hosted HTTP endpoint for embedding generation.
-
URL:
{base_url}/embeddings - Method: POST
-
Request:
{ "model": "<model_identifier>", "inputs": ["text1", "text2", ...] } -
Response:
{ "embeddings": [ [0.1, 0.2, ...], [0.3, 0.4, ...] ] }
embedding_provider = "local"
embedding_model = "BAAI/bge-m3" # or your model identifier
embedding_base_url = "http://localhost:8080/v1"
embedding_api_key = "" # optional
embedding_timeout_seconds = 30.0- Automatic batching (default: 512 texts per batch)
- Embedding cache support (same as OpenAI engine)
- Error handling with logging
- Optional API key authentication
The LocalHttpPatternMiningEngine calls a local/self-hosted HTTP endpoint for LLM inference.
-
URL:
{base_url}/chat/completions - Method: POST
-
Request (OpenAI-compatible):
{ "model": "<model_identifier>", "messages": [ {"role": "system", "content": "..."}, {"role": "user", "content": "..."} ], "max_tokens": 1500, "temperature": 0.0 } -
Response (OpenAI-compatible):
{ "choices": [ { "message": { "role": "assistant", "content": "{...}" } } ] }
llm_provider = "local"
llm_model = "mistralai/Mistral-7B-Instruct-v0.2" # or your model identifier
llm_base_url = "http://localhost:8000/v1"
llm_api_key = "" # optional
llm_timeout_seconds = 30.0- OpenAI-compatible API format
- Low temperature (0.0) for deterministic outputs
- Same prompt builder as OpenAI engine
- Error handling with logging
- Optional API key authentication
For on-premise deployments, the following models are recommended as good defaults:
-
Embeddings:
BAAI/bge-m3– multilingual embedding model, strong performance for RU/UK/EN spam and abuse logs, well-suited for semantic clustering tasks. -
LLM:
mistralai/Mistral-7B-Instruct-v0.2– compact 7B parameter model, Apache 2.0 license, performs well at structured JSON generation and SQL-style filter generation at low temperature settings.
These models are not required by PATAS, but they provide a solid starting point for on-premise deployments.
The local HTTP engines are designed to work with common inference stacks:
- vLLM: OpenAI-compatible API server
- TGI (Text Generation Inference): HuggingFace's inference server
- Ollama: Local model serving
- Custom endpoints: Any OpenAI-compatible HTTP endpoint
The exact wiring to your inference stack is intentionally left to the integrator. PATAS only requires that the endpoints follow the contract described above.
# Embedding Engine (Local)
EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=BAAI/bge-m3
EMBEDDING_BASE_URL=http://localhost:8080/v1
EMBEDDING_TIMEOUT_SECONDS=30.0
# LLM Engine (Local)
LLM_PROVIDER=local
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_BASE_URL=http://localhost:8000/v1
LLM_TIMEOUT_SECONDS=30.0from app.config import settings
settings.embedding_provider = "local"
settings.embedding_model = "BAAI/bge-m3"
settings.embedding_base_url = "http://localhost:8080/v1"
settings.llm_provider = "local"
settings.llm_model = "mistralai/Mistral-7B-Instruct-v0.2"
settings.llm_base_url = "http://localhost:8000/v1"- If
provider="local"andbase_urlis provided → uses HTTP-based local engine - If
provider="local"andbase_urlis not provided:- Embedding engine falls back to
LocalEmbeddingEngine(sentence-transformers) - LLM engine returns None with a warning
- Embedding engine falls back to
- Never commit API keys or secrets to version control
- Use environment variables or secret management systems
- For on-premise deployments, ensure inference endpoints are properly secured and authenticated
- In
PRIVACY_MODE=STRICT, external LLM providers are disabled by default unless explicitly configured to internal endpoints
Comprehensive tests are available:
-
tests/test_v2_embedding_engine_local_http.py– Local HTTP embedding engine tests -
tests/test_v2_llm_engine_local_http.py– Local HTTP LLM engine tests
Tests cover:
- Successful embedding/pattern generation
- Batching for large inputs
- HTTP error handling
- Malformed response handling
- Authentication with API keys