Releases: laplaque/ai-anonymizing-proxy
v0.1.5-alpha — Pack-based PII detection with EU locale support
v0.1.5-alpha — Pack-based PII detection with EU locale support
16 commits, 80+ files changed since v0.1.4-alpha
This release completes Phase 2a: the anonymizer now uses a modular pack-based architecture instead of a single monolithic pattern list. Eight locale and domain packs ship by default, covering US, DE, FR, NL, financial (EU), healthcare, secrets/tokens, and global patterns. The legacy compilePatterns() fallback has been removed.
Features
- Pack-based PII detection — modular architecture replacing the monolithic pattern list. Each pack is independently testable and configurable. (#65, #66, #71)
- EU locale packs — FR (NIR, SIRET, SIREN), DE (Steuer-ID, SVNR), NL, FINANCE_EU, HEALTHCARE (#65, #66, #71)
- Expanded SECRETS pack — 18 new token formats: GitLab, Slack, Stripe, NPM, PyPI, OpenAI, Docker, Google, Shopify, SendGrid, Groq, Twilio, Facebook, Amazon MWS, Cloudinary, PGP (#82)
Bug fixes
- SECRETS before GLOBAL ordering — SECRETS pack now runs first in the pipeline, preventing GLOBAL's
api_keykeyword matcher from stealing tokens that have distinctive structural prefixes (#78) - DE pack space-tolerant regexes — Steuer-ID and SVNR patterns now accept optional spaces between digit groups (#72)
- US address false positives — street suffix now requires space separation; ordinal street names (42nd, 5th, etc.) now match correctly (#73, #79)
- SSN/SIREN cross-pattern interference — US SSN regex now requires hyphens, preventing false matches on French SIREN numbers (#75)
- CI benchmark output interleaving — benchmark stderr redirected to prevent log lines from corrupting benchmark stdout (#85)
- Benchmark badge gist ID — replaced placeholder with actual gist ID (#84)
Maintenance
- Removed legacy
compilePatterns()fallback — all patterns now load exclusively through the pack system (#81) - Documentation overhaul — updated all docs to reflect pack-based architecture; added GLOBAL, US, FR pack test set documents (#76, #83)
- Community readiness — added MIT LICENSE, CONTRIBUTING.md, coverage badge, license badge, and repo topics (#86)
- Test coverage — comprehensive test coverage for proxy and MITM components (#88)
Full PR list
#65, #66, #71, #72, #73, #75, #76, #78, #79, #81, #82, #83, #84, #85, #86, #88
Full Changelog: v0.1.4-alpha...v0.1.5-alpha
v0.1.4-alpha
What's Changed
- fix(streaming): deanonymize input_json_delta events in streamed tool output [closes #58] by @laplaque in #59
Full Changelog: v0.1.3-alpha...v0.1.4-alpha
v0.1.3-alpha — Streaming fix and Go security update
Bug Fixes
- fix(streaming): skip empty-text delta when accumulator is under suffix guard [closes #55] (#57)
—StreamingDeanonymizewas emittingcontent_block_deltaevents with empty text when accumulated text was shorter thantokenSuffixLen(26 bytes). When the model response contained no PII token placeholders, clients using the Anthropic SDK received an empty response body.processTextDeltanow returns early whensafeCutPointreturns 0, holding text in the accumulator until flushed by a non-text-delta event or EOF.
Security
- chore: bump Go toolchain from 1.25.7 to 1.26.1 [closes #56] (#57)
��� Resolves 3 stdlib vulnerabilities flagged bygovulncheck:- GO-2026-4598 ��� improper parsing of IPv6 literals in
net/url - GO-2026-4600 ��� panic in name constraint checking for malformed certificates in
crypto/x509 - GO-2026-4601 — improper enforcement of email constraints in
crypto/x509
- GO-2026-4598 ��� improper parsing of IPv6 literals in
Quality
- 2 new regression tests covering the #55 scenario (short response, no token match)
make checkandmake vulnchecknow both pass cleanly
Full Changelog: v0.1.1-alpha...v0.1.3-alpha
v0.1.2-alpha
What's Changed
- docs: README about story + fix Mermaid participant labels in tls-mitm by @laplaque in #17
- docs: Mermaid syntax incompatible with GitHub renderer in tls-mitm docs by @laplaque in #21
- add SonarQube integration by @laplaque in #23
- chore(repo): gitignore .claude/, CLAUDE.md, and .tmp/ by @laplaque in #35
- fix(proxy): reject non-prefix auth paths, close bypass [closes #18] by @laplaque in #36
- chore(ci): pin GitHub Actions to commit SHAs for supply chain security [closes #19] by @laplaque in #38
- feat(bench): add per-gate latency benchmarks and make benchmark target [closes #24] by @laplaque in #39
- feat(proxy): add RemoteAddr to all MITM, TUNNEL, and HTTP log lines [closes #20] by @laplaque in #40
- feat(proxy): add hashed RemoteAddr to MITM, TUNNEL, HTTP log lines [closes #20] by @laplaque in #41
- docs(bench): add benchmarks.md documenting latency benchmarks [closes #26] by @laplaque in #42
- chore(anonymizer): extract "data: " string literal to named constant [closes #27] by @laplaque in #43
- chore(proxy): extract Content-Encoding and bad gateway to constants [closes #28] by @laplaque in #44
- chore(config): extract duplicate privacy tokens strings to constants [closes #29] by @laplaque in #45
- chore(anonymizer): replace 8-param constructor with Options struct [closes #30] by @laplaque in #46
- refactor(anonymizer): reduce tokenForMatch cognitive complexity [closes #31] by @laplaque in #47
- refactor(config): reduce loadEnv cognitive complexity [closes #32] by @laplaque in #48
- refactor(proxy): reduce handleMITMTunnel cognitive complexity [closes #33] by @laplaque in #49
- refactor(anonymizer): decompose StreamingDeanonymize into testable helpers [closes #34] by @iamclaude697 in #51
- docs(anonymizer): update docs for StreamingDeanonymize decomposition by @iamclaude697 in #53
New Contributors
- @iamclaude697 made their first contribution in #51
Full Changelog: v0.1.0-alpha...v0.1.2-alpha
v0.1.1-alpha — Code quality pass and streaming refactor
What's new since v0.1.0-alpha
Refactoring & code quality (closes #26–#34)
- Extracted all magic string literals to named constants across proxy, config, and anonymizer packages
- Replaced 8-parameter
NewAnonymizerconstructor with anOptionsstruct - Reduced cognitive complexity in
tokenForMatch,loadEnv, andhandleMITMTunnelto within SonarQube quality gate thresholds - Decomposed
StreamingDeanonymize(complexity 113) into seven independently testable helpers instreaming.go:readLoop,assembleLines,processLine,processTextDelta,safeCutPoint,flushRemainder,handleStreamEnd - Fixed hardcoded content block index in flush path —
lastIndexnow tracked per invocation so thinking and text blocks flush to the correct position - Anonymizer package test coverage: 91.9%
Documentation
- Added
docs/benchmarks.mdwith latency baseline results - Updated
docs/architecture.md,docs/anonymizer.md, anddocs/development.mdto reflect the decomposed streaming pipeline - Fixed inaccurate "64-byte sliding window" description in README
Known limitations
(unchanged from v0.1.0-alpha)
- Single-instance only
- Ollama as NER fallback
- mapid not yet implemented (32-bit hash space)
⚠️ Developer warning
Disable the proxy when using an AI assistant to work on the proxy source code itself.
This release was developed with assistance from Claude https://claude.ai (Anthropic)
v0.1.0-alpha — Core anonymization proxy with persistent cache and observability
First tagged release of the AI anonymizing proxy. Sits between AI clients (Claude Code, Cursor, VS Code extensions) and upstream LLM APIs, stripping PII from requests and restoring it in responses — transparently and without breaking streaming.
What's included
Anonymization pipeline
- Regex-based PII detection with per-pattern confidence scores (email, API key, SSN, credit card, phone, IPv4/IPv6, address)
- Bracket token format
[TYPE_<8hex>]— deterministic, non-re-triggering, session-idempotent - Async Ollama fallback for low-confidence matches (off the critical path)
- System instruction injection to prevent LLMs from substituting or interpreting tokens
Session management
- Session-scoped bidirectional token maps — deanonymization is session-local only, never cross-session
- Ephemeral sessions with deferred cleanup; no session leaks
- Streaming (SSE) support with sliding overlap window to handle tokens straddling chunk boundaries
Persistent cache
- Cross-session Ollama value cache backed by bbolt (embedded, no external dependencies)
- S3-FIFO in-memory eviction layer for bounded hot-path performance
- Cache survives process restarts; recurring PII values are protected from the first request of a new session
Proxy infrastructure
- MITM TLS interception with auto-generated CA; per-host certificate caching
- CONNECT tunnel handling with ALPN split (HTTPS → MITM, opaque protocols → plain tunnel)
- SSRF protection via
ssrfSafeDialContext— DNS rebinding TOCTOU gap closed - Configurable domain allowlist, auth domain bypass, telemetry path passthrough (no unnecessary anonymization of instrumentation)
Observability
- Structured logging with session ID and token count on every anonymized request
- Per-type cache hit/miss counters, Ollama dispatch/error counters, fallback counters
/metricsendpoint on management API (localhost only)
Management API
- Bearer token authentication
- Session inspection, manual flush, health check endpoints
Known limitations
- Single-instance only — session maps are in-process; horizontal scaling requires a shared session store (not planned yet)
- Ollama as NER fallback — contextual entity detection (names, orgs, locations) relies on a local Ollama instance; replacement with an embedded ONNX model is planned for the next milestone
- mapid not yet implemented — session map uses a 32-bit hash space; birthday collision risk is negligible at current traffic but is tracked for future hardening
⚠️ Developer warning
Disable the proxy when using an AI assistant to work on the proxy source code itself. The proxy will anonymize its own source files, producing tokens that are written to disk and are unrecoverable after the session ends.
This release was developed with assistance from Claude (Anthropic)