Releases · laplaque/ai-anonymizing-proxy

30 Mar 23:22

laplaque

v0.1.5-alpha

5e853cc

v0.1.5-alpha — Pack-based PII detection with EU locale support Pre-release

Pre-release

v0.1.5-alpha — Pack-based PII detection with EU locale support

16 commits, 80+ files changed since v0.1.4-alpha

This release completes Phase 2a: the anonymizer now uses a modular pack-based architecture instead of a single monolithic pattern list. Eight locale and domain packs ship by default, covering US, DE, FR, NL, financial (EU), healthcare, secrets/tokens, and global patterns. The legacy compilePatterns() fallback has been removed.

Features

Pack-based PII detection — modular architecture replacing the monolithic pattern list. Each pack is independently testable and configurable. (#65, #66, #71)
EU locale packs — FR (NIR, SIRET, SIREN), DE (Steuer-ID, SVNR), NL, FINANCE_EU, HEALTHCARE (#65, #66, #71)
Expanded SECRETS pack — 18 new token formats: GitLab, Slack, Stripe, NPM, PyPI, OpenAI, Docker, Google, Shopify, SendGrid, Groq, Twilio, Facebook, Amazon MWS, Cloudinary, PGP (#82)

Bug fixes

SECRETS before GLOBAL ordering — SECRETS pack now runs first in the pipeline, preventing GLOBAL's api_key keyword matcher from stealing tokens that have distinctive structural prefixes (#78)
DE pack space-tolerant regexes — Steuer-ID and SVNR patterns now accept optional spaces between digit groups (#72)
US address false positives — street suffix now requires space separation; ordinal street names (42nd, 5th, etc.) now match correctly (#73, #79)
SSN/SIREN cross-pattern interference — US SSN regex now requires hyphens, preventing false matches on French SIREN numbers (#75)
CI benchmark output interleaving — benchmark stderr redirected to prevent log lines from corrupting benchmark stdout (#85)
Benchmark badge gist ID — replaced placeholder with actual gist ID (#84)

Maintenance

Removed legacy compilePatterns() fallback — all patterns now load exclusively through the pack system (#81)
Documentation overhaul — updated all docs to reflect pack-based architecture; added GLOBAL, US, FR pack test set documents (#76, #83)
Community readiness — added MIT LICENSE, CONTRIBUTING.md, coverage badge, license badge, and repo topics (#86)
Test coverage — comprehensive test coverage for proxy and MITM components (#88)

Full PR list

#65, #66, #71, #72, #73, #75, #76, #78, #79, #81, #82, #83, #84, #85, #86, #88

Full Changelog: v0.1.4-alpha...v0.1.5-alpha

Assets 2

15 Mar 04:18

laplaque

v0.1.4-alpha

e93bd3e

v0.1.4-alpha Pre-release

Pre-release

What's Changed

fix(streaming): deanonymize input_json_delta events in streamed tool output [closes #58] by @laplaque in #59

Full Changelog: v0.1.3-alpha...v0.1.4-alpha

Contributors

laplaque

Assets 2

14 Mar 15:53

laplaque

v0.1.3-alpha

a53a85f

v0.1.3-alpha — Streaming fix and Go security update Pre-release

Pre-release

Bug Fixes

fix(streaming): skip empty-text delta when accumulator is under suffix guard [closes #55] (#57)
— StreamingDeanonymize was emitting content_block_delta events with empty text when accumulated text was shorter than tokenSuffixLen (26 bytes). When the model response contained no PII token placeholders, clients using the Anthropic SDK received an empty response body. processTextDelta now returns early when safeCutPoint returns 0, holding text in the accumulator until flushed by a non-text-delta event or EOF.

Security

chore: bump Go toolchain from 1.25.7 to 1.26.1 [closes #56] (#57)
�� Resolves 3 stdlib vulnerabilities flagged by govulncheck:
- GO-2026-4598 �� improper parsing of IPv6 literals in net/url
- GO-2026-4600 �� panic in name constraint checking for malformed certificates in crypto/x509
- GO-2026-4601 — improper enforcement of email constraints in crypto/x509

Quality

2 new regression tests covering the #55 scenario (short response, no token match)
make check and make vulncheck now both pass cleanly

Full Changelog: v0.1.1-alpha...v0.1.3-alpha

Assets 2

14 Mar 14:02

laplaque

v0.1.2-alpha

71d29f5

v0.1.2-alpha Pre-release

Pre-release

What's Changed

docs: README about story + fix Mermaid participant labels in tls-mitm by @laplaque in #17
docs: Mermaid syntax incompatible with GitHub renderer in tls-mitm docs by @laplaque in #21
add SonarQube integration by @laplaque in #23
chore(repo): gitignore .claude/, CLAUDE.md, and .tmp/ by @laplaque in #35
fix(proxy): reject non-prefix auth paths, close bypass [closes #18] by @laplaque in #36
chore(ci): pin GitHub Actions to commit SHAs for supply chain security [closes #19] by @laplaque in #38
feat(bench): add per-gate latency benchmarks and make benchmark target [closes #24] by @laplaque in #39
feat(proxy): add RemoteAddr to all MITM, TUNNEL, and HTTP log lines [closes #20] by @laplaque in #40
feat(proxy): add hashed RemoteAddr to MITM, TUNNEL, HTTP log lines [closes #20] by @laplaque in #41
docs(bench): add benchmarks.md documenting latency benchmarks [closes #26] by @laplaque in #42
chore(anonymizer): extract "data: " string literal to named constant [closes #27] by @laplaque in #43
chore(proxy): extract Content-Encoding and bad gateway to constants [closes #28] by @laplaque in #44
chore(config): extract duplicate privacy tokens strings to constants [closes #29] by @laplaque in #45
chore(anonymizer): replace 8-param constructor with Options struct [closes #30] by @laplaque in #46
refactor(anonymizer): reduce tokenForMatch cognitive complexity [closes #31] by @laplaque in #47
refactor(config): reduce loadEnv cognitive complexity [closes #32] by @laplaque in #48
refactor(proxy): reduce handleMITMTunnel cognitive complexity [closes #33] by @laplaque in #49
refactor(anonymizer): decompose StreamingDeanonymize into testable helpers [closes #34] by @iamclaude697 in #51
docs(anonymizer): update docs for StreamingDeanonymize decomposition by @iamclaude697 in #53

New Contributors

@iamclaude697 made their first contribution in #51

Full Changelog: v0.1.0-alpha...v0.1.2-alpha

Contributors

laplaque and iamclaude697

Assets 2

04 Mar 11:28

laplaque

v0.1.1-alpha

71d29f5

v0.1.1-alpha — Code quality pass and streaming refactor Pre-release

Pre-release

What's new since v0.1.0-alpha

Refactoring & code quality (closes #26–#34)

Extracted all magic string literals to named constants across proxy, config, and anonymizer packages
Replaced 8-parameter NewAnonymizer constructor with an Options struct
Reduced cognitive complexity in tokenForMatch, loadEnv, and handleMITMTunnel to within SonarQube quality gate thresholds
Decomposed StreamingDeanonymize (complexity 113) into seven independently testable helpers in streaming.go: readLoop, assembleLines, processLine, processTextDelta, safeCutPoint, flushRemainder, handleStreamEnd
Fixed hardcoded content block index in flush path — lastIndex now tracked per invocation so thinking and text blocks flush to the correct position
Anonymizer package test coverage: 91.9%

Documentation

Added docs/benchmarks.md with latency baseline results
Updated docs/architecture.md, docs/anonymizer.md, and docs/development.md to reflect the decomposed streaming pipeline
Fixed inaccurate "64-byte sliding window" description in README

Known limitations

(unchanged from v0.1.0-alpha)

Single-instance only
Ollama as NER fallback
mapid not yet implemented (32-bit hash space)

⚠️ Developer warning

Disable the proxy when using an AI assistant to work on the proxy source code itself.

This release was developed with assistance from Claude https://claude.ai (Anthropic)

Assets 2

28 Feb 03:05

laplaque

v0.1.0-alpha

dcca5c0

v0.1.0-alpha — Core anonymization proxy with persistent cache and observability Pre-release

Pre-release

First tagged release of the AI anonymizing proxy. Sits between AI clients (Claude Code, Cursor, VS Code extensions) and upstream LLM APIs, stripping PII from requests and restoring it in responses — transparently and without breaking streaming.

What's included

Anonymization pipeline

Regex-based PII detection with per-pattern confidence scores (email, API key, SSN, credit card, phone, IPv4/IPv6, address)
Bracket token format [TYPE_<8hex>] — deterministic, non-re-triggering, session-idempotent
Async Ollama fallback for low-confidence matches (off the critical path)
System instruction injection to prevent LLMs from substituting or interpreting tokens

Session management

Session-scoped bidirectional token maps — deanonymization is session-local only, never cross-session
Ephemeral sessions with deferred cleanup; no session leaks
Streaming (SSE) support with sliding overlap window to handle tokens straddling chunk boundaries

Persistent cache

Cross-session Ollama value cache backed by bbolt (embedded, no external dependencies)
S3-FIFO in-memory eviction layer for bounded hot-path performance
Cache survives process restarts; recurring PII values are protected from the first request of a new session

Proxy infrastructure

MITM TLS interception with auto-generated CA; per-host certificate caching
CONNECT tunnel handling with ALPN split (HTTPS → MITM, opaque protocols → plain tunnel)
SSRF protection via ssrfSafeDialContext — DNS rebinding TOCTOU gap closed
Configurable domain allowlist, auth domain bypass, telemetry path passthrough (no unnecessary anonymization of instrumentation)

Observability

Structured logging with session ID and token count on every anonymized request
Per-type cache hit/miss counters, Ollama dispatch/error counters, fallback counters
/metrics endpoint on management API (localhost only)

Management API

Bearer token authentication
Session inspection, manual flush, health check endpoints

Known limitations

Single-instance only — session maps are in-process; horizontal scaling requires a shared session store (not planned yet)
Ollama as NER fallback — contextual entity detection (names, orgs, locations) relies on a local Ollama instance; replacement with an embedded ONNX model is planned for the next milestone
mapid not yet implemented — session map uses a 32-bit hash space; birthday collision risk is negligible at current traffic but is tracked for future hardening

⚠️ Developer warning

Disable the proxy when using an AI assistant to work on the proxy source code itself. The proxy will anonymize its own source files, producing tokens that are written to disk and are unrecoverable after the session ends.

This release was developed with assistance from Claude (Anthropic)

Assets 2

Releases: laplaque/ai-anonymizing-proxy

v0.1.5-alpha — Pack-based PII detection with EU locale support

v0.1.5-alpha — Pack-based PII detection with EU locale support

Features

Bug fixes

Maintenance

Full PR list

Uh oh!

v0.1.4-alpha

What's Changed

Contributors

Uh oh!

v0.1.3-alpha — Streaming fix and Go security update

Bug Fixes

Security

Quality

Uh oh!

v0.1.2-alpha

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.1-alpha — Code quality pass and streaming refactor

What's new since v0.1.0-alpha

Refactoring & code quality (closes #26–#34)

Documentation

Known limitations

⚠️ Developer warning

Uh oh!

v0.1.0-alpha — Core anonymization proxy with persistent cache and observability

What's included

Anonymization pipeline

Session management

Persistent cache

Proxy infrastructure

Observability

Management API

Known limitations

⚠️ Developer warning

Uh oh!