Local Model Integration

PATAS now supports local HTTP-based engines for both embeddings and LLM, enabling on-premise deployments without requiring heavy ML frameworks in PATAS core.

Architecture

PATAS uses two separate engines that can be configured independently:

Embedding Engine – generates semantic embeddings for message similarity analysis, clustering, and pattern discovery
LLM Engine – performs pattern explanation, rule generation, and LLM-based validation

Each engine supports two provider modes:

openai (default) – uses OpenAI's managed API services
local – uses on-premise models via HTTP endpoints

Local HTTP Embedding Engine

The LocalHttpEmbeddingEngine calls a local/self-hosted HTTP endpoint for embedding generation.

Endpoint Contract

URL: {base_url}/embeddings
Method: POST

Request:

{
  "model": "<model_identifier>",
  "inputs": ["text1", "text2", ...]
}

Response:

{
  "embeddings": [
    [0.1, 0.2, ...],
    [0.3, 0.4, ...]
  ]
}

Configuration

embedding_provider = "local"
embedding_model = "BAAI/bge-m3"  # or your model identifier
embedding_base_url = "http://localhost:8080/v1"
embedding_api_key = ""  # optional
embedding_timeout_seconds = 30.0

Features

Automatic batching (default: 512 texts per batch)
Embedding cache support (same as OpenAI engine)
Error handling with logging
Optional API key authentication

Local HTTP LLM Engine

The LocalHttpPatternMiningEngine calls a local/self-hosted HTTP endpoint for LLM inference.

Endpoint Contract

URL: {base_url}/chat/completions
Method: POST

Request (OpenAI-compatible):

{
  "model": "<model_identifier>",
  "messages": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."}
  ],
  "max_tokens": 1500,
  "temperature": 0.0
}

Response (OpenAI-compatible):

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "{...}"
      }
    }
  ]
}

Configuration

llm_provider = "local"
llm_model = "mistralai/Mistral-7B-Instruct-v0.2"  # or your model identifier
llm_base_url = "http://localhost:8000/v1"
llm_api_key = ""  # optional
llm_timeout_seconds = 30.0

Features

OpenAI-compatible API format
Low temperature (0.0) for deterministic outputs
Same prompt builder as OpenAI engine
Error handling with logging
Optional API key authentication

Recommended Local Models

For on-premise deployments, the following models are recommended as good defaults:

Embeddings: BAAI/bge-m3 – multilingual embedding model, strong performance for RU/UK/EN spam and abuse logs, well-suited for semantic clustering tasks.
LLM: mistralai/Mistral-7B-Instruct-v0.2 – compact 7B parameter model, Apache 2.0 license, performs well at structured JSON generation and SQL-style filter generation at low temperature settings.

These models are not required by PATAS, but they provide a solid starting point for on-premise deployments.

Integration with Inference Stacks

The local HTTP engines are designed to work with common inference stacks:

vLLM: OpenAI-compatible API server
TGI (Text Generation Inference): HuggingFace's inference server
Ollama: Local model serving
Custom endpoints: Any OpenAI-compatible HTTP endpoint

The exact wiring to your inference stack is intentionally left to the integrator. PATAS only requires that the endpoints follow the contract described above.

Example Configuration

Environment Variables (.env)

# Embedding Engine (Local)
EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=BAAI/bge-m3
EMBEDDING_BASE_URL=http://localhost:8080/v1
EMBEDDING_TIMEOUT_SECONDS=30.0

# LLM Engine (Local)
LLM_PROVIDER=local
LLM_MODEL=mistralai/Mistral-7B-Instruct-v0.2
LLM_BASE_URL=http://localhost:8000/v1
LLM_TIMEOUT_SECONDS=30.0

Python Config

from app.config import settings

settings.embedding_provider = "local"
settings.embedding_model = "BAAI/bge-m3"
settings.embedding_base_url = "http://localhost:8080/v1"

settings.llm_provider = "local"
settings.llm_model = "mistralai/Mistral-7B-Instruct-v0.2"
settings.llm_base_url = "http://localhost:8000/v1"

Fallback Behavior

If provider="local" and base_url is provided → uses HTTP-based local engine
If provider="local" and base_url is not provided:
- Embedding engine falls back to LocalEmbeddingEngine (sentence-transformers)
- LLM engine returns None with a warning

Security Notes

Never commit API keys or secrets to version control
Use environment variables or secret management systems
For on-premise deployments, ensure inference endpoints are properly secured and authenticated
In PRIVACY_MODE=STRICT, external LLM providers are disabled by default unless explicitly configured to internal endpoints

Testing

Comprehensive tests are available:

tests/test_v2_embedding_engine_local_http.py – Local HTTP embedding engine tests
tests/test_v2_llm_engine_local_http.py – Local HTTP LLM engine tests

Tests cover:

Successful embedding/pattern generation
Batching for large inputs
HTTP error handling
Malformed response handling
Authentication with API keys

Local Model Integration

Local Model Integration

Architecture

Local HTTP Embedding Engine

Endpoint Contract

Configuration

Features

Local HTTP LLM Engine

Endpoint Contract

Configuration

Features

Recommended Local Models

Integration with Inference Stacks

Example Configuration

Environment Variables (.env)

Python Config

Fallback Behavior

Security Notes

Testing

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally