Skip to content

H-Ismael/ollamonym

Repository files navigation

ØllamØnym

ØllamØnym

Template-driven, session-stable anonymization and pseudonymization for teams that want to use LLM workflows while reducing data leakage risk.

Quick Demo Video

Why This Project Exists

Most teams want LLM-powered automation, but they cannot expose raw sensitive text to third-party providers without creating legal, security, and reputational risk.

ØllamØnym gives you a practical middle path:

  • Detect sensitive entities locally
  • Replace them deterministically with reversible tokens
  • Keep text structurally usable for downstream LLM tasks
  • Restore originals only when authorized

In short: LLM utility without shipping raw private identifiers by default.

A practical positioning for teams under GDPR/compliance pressure: pseudonymization is often more useful than hard redaction/anonymization for LLM workflows, because it preserves semantic continuity and referential meaning across long or distant queries while still reducing direct identifier exposure.

This service sits between your data and cloud-hosted and/or proprietary LLMs (OpenAI, Gemini, Claude, Grok, and similar providers) to reduce direct sensitive-text exposure.

Where Sensitive Text Blocks Cloud-Hosted and/or Proprietary LLM Adoption

Many enterprise teams already have valid LLM use cases, but security/legal controls prevent sending raw text to external model providers. Typical blocked scenarios:

  • Customer support and CRM copilots: ticket bodies include names, emails, phones, and addresses.
  • Sales and customer success assistants: notes include account identifiers, contract values, and personal contacts.
  • Legal contract analysis: drafts expose counterparties, clauses, obligations, and signatures.
  • Healthcare/insurance operations: notes and claims can contain direct identifiers and case-sensitive details.
  • Security/SOC workflows: incident narratives may include user IDs, endpoints, and internal infrastructure references.
  • Internal knowledge assistants: enterprise docs frequently contain confidential project, vendor, or employee data.

This creates a common enterprise bottleneck: product teams want high-end model quality, while governance teams require strong controls on sensitive text exposure.

ØllamØnym is designed to be the privacy middleware layer that resolves this tension:

  • Transform sensitive entities before model calls.
  • Preserve enough semantic structure to keep LLM output useful.
  • Keep deanonymization controlled and reversible only where authorized.

Core Value Proposition

  • Leakage-risk reduction: sensitive fields are transformed before downstream processing.
  • Operational realism: anonymized text remains readable and coherent.
  • GDPR-aligned pseudonymization value: preserve contextual meaning for downstream LLM tasks while reducing exposure of direct identifiers.
  • Reversible by design: exact restoration via mapping when needed.
  • Template extensibility: add domain entities without changing core pipeline.
  • Session consistency: repeated mentions stay stable within a session.
  • Provider flexibility: run with local quantized models today, swap model/provider config as needs evolve.

Key Features

  • Hybrid Detection Engine (Deterministic + Local Quantized LLM)
    • Deterministic rule extraction for patterned data (EMAIL, PHONE, links, etc.).
    • Local Ollama-hosted quantized LLM extraction for contextual entities (PERSON, ORG, domain entities).
    • Configurable model/runtime via template and environment.
  • Template-Driven Entity Taxonomy
    • Define PERSON, ORG, LINKS, PRODUCT, or any custom class in JSON.
  • Deterministic Placeholder Mode
    • Example token: <<PERSON:K7D2QH>>
  • Realistic Rendering Mode
    • Optional fake values for human-readable anonymized output.
  • Generic Post-Pass Alias Propagation
    • Moving-window + token-overlap propagation can link full and partial mentions in-session (e.g., Jensen Huang and Huang).
  • No LLM Offsets Required
    • LLM returns only (entity_id, text); span resolution is deterministic in code.
  • Robust Span Resolution
    • Boundary-safe matching prevents substring corruption (for example, avoids replacing com inside company).
  • Chunked + Bounded Parallel Inference
    • Handles long documents with predictable concurrency.
  • Model Runtime Observability
    • Response metadata includes requested/resolved model and quantization info.
  • Dockerized Deployment
    • FastAPI + Ollama stack with persistent model volume.

⚠️ Caution:

  • Several components of this project were generated and or refactored via agentic AI. Tests were set but caution is required.

High-Level Architecture

  1. Detection Plane
    • Template compilation
    • Hybrid extraction: deterministic rules + local quantized LLM
    • Normalization and deduplication
  2. Transformation Plane
    • Deterministic span resolution
    • Placeholder insertion
    • Session-aware alias propagation (moving window + overlap policy)
    • Optional realistic rendering
  3. Reversal Plane
    • Token/fake back-mapping to exact original text

Input / Output Example

Input (POST /v2/anonymize)

{
  "session_id": "case-42",
  "template_id": "default-pii-v1",
  "text": "Jensen Huang leads NVIDIA. Visit www.tech-private.com",
  "render_mode": "structural",
  "language": "auto"
}

Output (structural)

{
  "anonymized_text": "<<PERSON:XXXXXX>> leads <<ORG:XXXXXX>>. Visit <<LINKS:XXXXXX>>",
  "mapping": {
    "token_to_original": {
      "<<PERSON:XXXXXX>>": "Jensen Huang",
      "<<ORG:XXXXXX>>": "NVIDIA",
      "<<LINKS:XXXXXX>>": "www.tech-private.com"
    },
    "meta": {
      "session_id": "case-42",
      "template_id": "default-pii-v1",
      "template_version": 3,
      "render_mode": "structural"
    }
  }
}

Input (POST /v2/anonymize, realistic)

{
  "session_id": "string_id_test_23",
  "template_id": "default-pii-v1",
  "text": "AI Overview Jensen Huang is the co-founder, President, and CEO of NVIDIA ... you can find more at www.tech-private.com ... for now Jensen Huang is doing great",
  "render_mode": "realistic",
  "language": "auto"
}

Output (realistic)

{
  "anonymized_text": "AI Overview William Adams is the co-founder, President, and CEO of Elliott, Wilson and Terry and father of Shannon Gomez MD and Maria Thompson ... A pivotal figure in the AI revolution, Adams has guided Elliott, Wilson and Terry ... you can find more at www.blue-connect.com ... for now William Adams is doing great",
  "mapping": {
    "token_to_original": {
      "<<PERSON:A2N72P>>": "Jensen Huang",
      "<<ORG:T5YKPW>>": "NVIDIA",
      "<<PERSON:XG6QHD>>": "Huang",
      "<<LINKS:XA4JRC>>": "www.tech-private.com"
    },
    "token_to_fake": {
      "<<PERSON:A2N72P>>": "William Adams",
      "<<ORG:T5YKPW>>": "Elliott, Wilson and Terry",
      "<<PERSON:XG6QHD>>": "Adams",
      "<<LINKS:XA4JRC>>": "www.blue-connect.com"
    },
    "fake_to_token": {
      "William Adams": "<<PERSON:A2N72P>>",
      "Elliott, Wilson and Terry": "<<ORG:T5YKPW>>",
      "Adams": "<<PERSON:XG6QHD>>",
      "www.blue-connect.com": "<<LINKS:XA4JRC>>"
    },
    "meta": {
      "session_id": "string_id_test_23",
      "template_id": "default-pii-v1",
      "template_version": 3,
      "render_mode": "realistic",
      "model_runtime": {
        "requested_model": "llama3.1:8b-instruct-q4_K_M",
        "resolved_model": "llama3.1:latest",
        "quantization_level": "Q4_K_M"
      }
    }
  }
}

Where This Is Most Useful

  • LLM preprocessing gateway for enterprise copilots
  • Support and CRM text handling before summarization/classification
  • Medical/legal document workflows requiring controlled exposure
  • Model evaluation datasets needing repeatable anonymization
  • Cross-team AI enablement where security/compliance gate AI usage

Quick Start

Prerequisites

  • Docker + Docker Compose
  • PSEUDONYM_SECRET environment variable

Run

export PSEUDONYM_SECRET="replace-with-a-strong-secret"
docker compose up --build

Health

curl http://localhost:8000/health

API Endpoints

  • POST /v2/anonymize
  • POST /v2/anonymize/stream (SSE-style stream for UI visualization)
  • POST /v2/deanonymize
  • GET /v2/templates
  • GET /v2/templates/{template_id}
  • POST /v2/templates/validate
  • POST /v2/templates/save (create/update custom templates)
  • DELETE /v2/templates/{template_id} (delete custom templates)
  • GET / (web UX demo)

Web pages:

  • / -> landing page
  • /web/demo.html -> interactive demo studio

Compare Llama vs Qwen

This repo now includes model-specific templates you can switch via template_id:

  • default-pii-v1 (Llama baseline)
  • default-pii-qwen2.5-7b-v1
  • default-pii-qwen2.5-14b-v1

Pull models in Ollama (once):

ollama pull qwen2.5:7b-instruct-q4_K_M
ollama pull qwen2.5:14b-instruct-q4_K_M

Run the same text against different templates to compare extraction quality/latency:

curl -X POST http://localhost:8000/v2/anonymize \
  -H "Content-Type: application/json" \
  -d '{
    "session_id":"bench-1",
    "template_id":"default-pii-qwen2.5-14b-v1",
    "text":"Jensen Huang leads NVIDIA. Contact: john.doe@example.com",
    "render_mode":"structural",
    "language":"auto"
  }'

Configuration Highlights

Important env vars:

  • TEMPLATES_DIR (bundled/read-only templates source)
  • CUSTOM_TEMPLATES_DIR (writable directory for create/update/delete template APIs)
  • OLLAMA_BASE_URL
  • OLLAMA_FALLBACK_URLS
  • LLM_MODEL
  • OLLAMA_KEEP_ALIVE
  • LLM_NUM_PREDICT
  • LLM_TEMPERATURE
  • LLM_WARMUP_ENABLED (default true, runs warmup at startup)
  • LLM_WARMUP_MODEL (optional override; defaults to LLM_MODEL)
  • LLM_WARMUP_TIMEOUT (seconds for warmup request)
  • LLM_WARMUP_NUM_PREDICT (token budget for warmup call)
  • LLM_CONCURRENCY
  • CHUNK_CHAR_TARGET
  • CHUNK_MAX_PARALLEL
  • RULE_PREEXTRACT_ENABLED
  • TOKEN_ID_LEN

Template controls:

  • entity definitions and examples
  • placeholder format and pseudonym providers
  • per-entity fake strategy override (entity.fake_provider) and pseudo-entity pools (entity.use_pseudo_entities, entity.pseudo_entities)
  • post-pass alias policy (postpass_alias)
  • per-template model selection (template.llm.model)

Security and Compliance Positioning

  • Keeps sensitive originals out of downstream prompts by default.
  • Pseudonymization can be a strong GDPR/compliance selling point when full anonymization would destroy the context needed for useful LLM responses.
  • Preserves semantic continuity across distant mentions/queries better than blunt redaction, which improves downstream LLM utility.
  • Supports self-hosted/local inference stacks.
  • Allows strict control of what is reversible and by whom (through mapping handling policy).

Note: this reduces leakage risk materially, but final security posture still depends on infrastructure, access control, logging policy, and secret management.

Adaptability by Design

  • Add new entity classes in template JSON without pipeline rewrites.
  • Tune matching behavior through policy knobs:
    • postpass_alias.window_size
    • postpass_alias.min_overlap_tokens
    • postpass_alias.min_token_len
    • postpass_alias.entity_ids
  • Mix strict deterministic extraction with contextual LLM extraction per use case.
  • Keep consistent anonymization across non-exact mentions in the same session.

What’s Next

Product / UX

  • UX polish and extension for:
    • richer live token analytics and traces
    • collaborative template lifecycle workflows
    • advanced policy presets by compliance profile

Metrics and Governance

  • Detection quality dashboard (precision/recall by entity class)
  • Latency and throughput dashboards (p50/p95/p99)
  • Drift alerts for template/model changes
  • Session-memory effectiveness metrics (alias recovery success)
  • Audit-friendly anonymization/deanonymization event logs

Platform Evolution

  • Test on premise servers (better hardware) for detection accuracy, generation fidelity when llm faker is enabled and latency evaluation.
  • Pluggable provider strategies for advanced fake generation by entity family
  • Multi-tenant policy isolation
  • Optional external session memory (e.g., Redis) for horizontal scale

Repository Notes

  • Detailed implementation notes: README_IMPLEMENTATION.md
  • Build summary: IMPLEMENTATION.md
  • Full technical spec: spec_readme.md