This document describes the internal design of the AI Anonymizing Proxy: how requests flow through the system, how PII detection works, and the rationale behind key design choices.
flowchart TD
subgraph Client side
APP[Application]
end
subgraph Proxy ["AI Anonymizing Proxy (127.0.0.1:8080)"]
direction TB
PRXY[proxy.go\nrequest router]
ANON[anonymizer.go\npack-based PII detection]
MITM[mitm/\ncert.go · mitm.go]
REG[(DomainRegistry)]
MET[metrics.go]
end
subgraph Mgmt ["Management API (127.0.0.1:8081)"]
API[management.go\n/status /metrics /domains]
end
subgraph Backends
AIAPI[AI API\nOpenAI · Anthropic · …]
OTHER[Other HTTPS]
OLL[Ollama\nlocal LLM]
end
APP -->|HTTP_PROXY| PRXY
PRXY -->|AI domain| MITM
MITM -->|plaintext body| ANON
ANON -->|async cache miss| OLL
ANON -->|anonymized body| AIAPI
AIAPI -->|response| ANON
ANON -->|de-anonymized| APP
PRXY -->|other domain| OTHER
API -->|read/write| REG
PRXY -->|lookup| REG
PRXY --> MET
ANON --> MET
sequenceDiagram
participant C as Client
participant P as proxy.go
participant CA as mitm/cert.go
participant A as anonymizer.go
participant API as AI API
C->>P: CONNECT api.openai.com:443
P->>P: DomainRegistry.Has(domain) → true
P->>C: 200 Connection Established
P->>CA: CertFor("api.openai.com")
CA-->>P: leaf cert signed by proxy CA
Note over C,P: TLS handshake — client uses proxy CA cert
Note over P,API: Proxy opens separate real TLS to ai api
loop each request over the tunnel
C->>P: POST /v1/messages (plaintext to proxy)
P->>P: isAuthRequest? → No
P->>A: AnonymizeJSON(body, sessionID)
A-->>P: anonymized body + token map stored
P->>API: POST /v1/messages (anonymized, real TLS)
API-->>P: response
alt SSE / text/event-stream
P->>A: StreamingDeanonymize(body, sessionID, domain)
A-->>C: token replacements streamed on-the-fly
else buffered response
P->>A: DeanonymizeText(body, sessionID)
A-->>P: restored text
P-->>C: response with original values
end
P->>A: DeleteSession(sessionID)
end
sequenceDiagram
participant C as Client
participant P as proxy.go
participant D as ssrfSafeDialContext
participant S as Destination server
C->>P: CONNECT other-site.com:443
P->>P: DomainRegistry.Has → false
P->>P: isPrivateHost? → No
P->>D: Dial tcp other-site.com:443
D->>D: Resolve hostname → check IPs against private CIDRs
D-->>P: net.Conn (or blocked if private IP)
P->>C: 200 Connection Established
Note over C,S: Raw bytes copied bidirectionally — no inspection
sequenceDiagram
participant C as Client
participant P as proxy.go
participant A as anonymizer.go
participant API as AI API
C->>P: POST http://api.openai.com/v1/chat (plain HTTP)
P->>P: isAuthRequest? → No
P->>A: AnonymizeJSON(body, sessionID)
A-->>P: anonymized body
P->>API: POST (anonymized)
API-->>P: response
P->>A: DeanonymizeText(response, sessionID)
A-->>P: restored response
P-->>C: response
P->>A: DeleteSession(sessionID)
flowchart TD
IN([Request body]) --> PARSE{Valid JSON?}
PARSE -->|Yes| WALK[Walk string leaves\nrecursively]
PARSE -->|No| PLAIN[Treat as plain text]
WALK --> RX
PLAIN --> RX
RX[Pack-based regex pass\npatterns loaded from enabled packs\nwith confidence scores] --> MATCH{Any match?}
MATCH -->|No| NOCONF[effectiveConfidence = 0.0\ntext may still contain\nAI-detectable PII]
MATCH -->|Yes| MINC[track minConfidence\nacross all matches]
MINC --> THRESH{minConfidence\n≥ aiThreshold?}
THRESH -->|Yes, regex is sufficient| OUT
THRESH -->|No| AI{useAI enabled?}
NOCONF --> AI
AI -->|No| OUT
AI -->|Yes| CACHE{Cache hit\nfor PII value?}
CACHE -->|Hit| APPLY[Use cached\ntoken immediately]
APPLY --> OUT
CACHE -->|Miss| ASYNC[Dispatch background\nOllama goroutine\nvia inflight dedup map]
ASYNC --> OUT
OUT([Return anonymized text\ntoken map stored in session])
ASYNC -.->|populates cache\nfor next request| CACHE
Patterns are organized into named packs that self-register via init() in
internal/anonymizer/packs/. The anonymizer loads patterns only from packs listed
in the enabledPacks configuration. Packs enabled by default: GLOBAL, DE, SECRETS.
A positional decay multiplier reduces confidence for patterns in later packs:
effectiveConfidence = baseConfidence × (1.0 - (position - 1) × packDecayRate)
Patterns with a Validate function (e.g. Luhn for credit cards, ISO 7064 for Steuer-ID)
reject regex matches that fail checksum validation, reducing false positives.
Pattern confidence scores (GLOBAL pack, base values):
| PII type | Pack | Example | Confidence | Validator |
|---|---|---|---|---|
| GLOBAL | user@example.com |
0.95 | — | |
| API key | GLOBAL | Bearer sk-abc… (≥ 20 chars) |
0.90 | — |
| Credit card | GLOBAL | 4111 1111 1111 1111 |
0.85 | Luhn |
| Steuer-ID | DE | 65929970489 |
0.70 | ISO 7064 |
| SVNR | DE | 12150385A123 |
0.80 | — |
| KFZ | DE | B AB 1234 |
0.75 | — |
| SSH key | SECRETS | -----BEGIN RSA PRIVATE KEY----- |
0.99 | — |
| JWT | SECRETS | eyJhbGci... |
0.95 | — |
| Bearer token | SECRETS | Bearer <token> |
0.92 | — |
| DB connection | SECRETS | postgresql://user:pass@host |
0.93 | — |
| AWS key | SECRETS | AKIAIOSFODNN7EXAMPLE |
0.97 | — |
| GitHub token | SECRETS | ghp_ABC... |
0.97 | — |
Tokens are formatted as [PII_<TYPE>_<16hex>] — e.g. [PII_EMAIL_c160f8cc4b2e1a3d] — where
<TYPE> is the uppercased PII type name and <16hex> is the first 16 hex digits of
md5(original). Maximum token length: 33 bytes ([PII_CREDITCARD_XXXXXXXXXXXXXXXX]).
The type label gives the LLM semantic context without revealing the original value; the system
instruction injected into every anonymized request prohibits the LLM from substituting
plausible-looking values in place of tokens. The hash is deterministic so the same value
always produces the same token within and across requests.
Each PII value (not the full request body) progresses through three states. The cache is
keyed by the original value string so a recurring email address or phone number gets a hit
regardless of which message body it appears in. The self-transition on Inflight is the
in-flight deduplication: a second request containing the same value while Ollama is still
querying reuses the running goroutine rather than spawning a new one.
stateDiagram-v2
[*] --> Uncached : new PII value
Uncached --> Inflight : cache miss — goroutine dispatched\ncurrent request uses fallback token immediately
Inflight --> Inflight : duplicate request for same value\ninflight dedup — no second goroutine\ncurrent request uses fallback token immediately
Inflight --> Cached : Ollama query succeeded\ndetections stored in cache
Inflight --> Uncached : Ollama query failed\nor semaphore full (request dropped)\nnext request will retry dispatch
Cached --> Cached : cache hit — AI detections\napplied to current request immediately
note right of Cached
S3-FIFO eviction when
capacity (50 000 entries)
is reached. Evicted entries
are deleted from bbolt so
disk usage stays bounded.
end note
Each request gets a random sessionID. The token→original map is stored in
anonymizer.sessions[sessionID] during anonymization and deleted after the response is delivered.
For SSE (Content-Type: text/event-stream), StreamingDeanonymize wraps the response body in a
pipe-based reader. AI API providers deliver text content in different SSE formats, and a single
token like [PII_EMAIL_c160f8cc4b2e1a3d] frequently arrives split across multiple events.
The streaming system uses a provider-aware StreamingDeanonymizer interface to handle each
provider's SSE format. The domain parameter selects the appropriate implementation:
| Provider | Text field | Domains |
|---|---|---|
| Anthropic | delta.text / delta.thinking |
api.anthropic.com |
| OpenAI | choices[0].delta.content |
api.openai.com, api.mistral.ai, api.together.xyz, api.perplexity.ai, api.huggingface.co |
| Gemini | candidates[0].content.parts[0].text |
generativelanguage.googleapis.com |
| Cohere | delta.message.content.text |
api.cohere.ai |
| Replicate | Plain text in data |
api.replicate.com |
| Passthrough | Raw replacement | Unknown domains |
The shared framework in internal/anonymizer/streaming.go handles SSE framing:
| Helper | Responsibility |
|---|---|
readLoop |
Top-level goroutine: reads chunks from the source and dispatches complete lines |
assembleLines |
Splits raw bytes on newlines, strips \r, dispatches to processLine |
processLine |
Classifies each SSE line (comment, non-data, data) and delegates data payloads to the provider |
safeCutPoint |
Calculates how many accumulated bytes can be flushed without splitting a partial token |
handleStreamEnd |
Flushes partial lines and calls provider.Flush() at EOF or on read error |
Provider-specific implementations live in separate files (streaming_anthropic.go,
streaming_openai.go, etc.) and handle JSON parsing, text accumulation, and re-serialization.
The 33-byte suffix guard (tokenSuffixLen) is retained in the accumulator — enough to cover
the longest possible token ([PII_CREDITCARD_XXXXXXXXXXXXXXXX] = 33 chars). Non-delta events (ping,
message_start, etc.) also pass through the replacer so tokens embedded in any part of the SSE
stream are deanonymized.
flowchart TD
REQ([Dial request\nhostname:port]) --> ISIP{Literal IP\nin request?}
ISIP -->|Yes, private| BLOCK([Block — log + error])
ISIP -->|No| RESOLVE[Resolve hostname\nnet.DefaultResolver]
RESOLVE --> CHECK{Any resolved IP\nin private CIDRs?}
CHECK -->|Yes| BLOCK
CHECK -->|No| DIAL[Dial first resolved IP\ndirectly]
DIAL --> CONN([net.Conn])
Blocked CIDRs: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8,
169.254.0.0/16, ::1/128, fc00::/7, fe80::/10.
The check runs at dial time (not at request-parse time) to close the TOCTOU gap exploited by DNS rebinding, where a hostname resolves to a public IP during the check but switches to a private IP when the TCP connection is established.
flowchart LR
START([Proxy startup]) --> LOAD{ca-cert.pem\nca-key.pem\nexist?}
LOAD -->|Yes| PARSE[Parse CA cert + key]
LOAD -->|No| GEN[GenerateCA\nRSA-4096, 10 yr validity]
GEN --> PARSE
PARSE --> CA[(mitm.CA\ncert + key + cache)]
REQ([CONNECT host]) --> CCHECK{cache has\ncert for host?}
CCHECK -->|Hit, not expired| TLSCFG
CCHECK -->|Miss or expired| SIGN[GenerateKey RSA-2048\nSignCert 7 day validity]
SIGN --> STORE[Store in cache\nmax 10 000 entries\nfull clear on overflow]
STORE --> TLSCFG[tls.Config.GetCertificate]
TLSCFG --> ALPN{ALPN negotiated?}
ALPN -->|h2| H2[http2.Server.ServeConn]
ALPN -->|http/1.1| H1[http.Server\nsingleConnListener]
flowchart TD
START([Proxy startup]) --> FILE{ai-domains.json\nexists?}
FILE -->|Yes| LOAD[Load persisted domains\ntakes precedence]
FILE -->|No / corrupt| CFG[Load from\nproxy-config.json]
LOAD --> REG[(DomainRegistry\nmap + RWMutex)]
CFG --> REG
REG -->|DomainRegistry.Has| PROXY[proxy: intercept or tunnel?]
ADD[POST /domains/add] --> LOCK[Lock → mutate map → snapshot]
RM[POST /domains/remove] --> LOCK
LOCK --> ATOMIC[Write temp file\nos.Rename → ai-domains.json]
ATOMIC --> REG
Writes use an atomic rename (write to a temp file, then os.Rename) so the persisted file is
never partially written. The DomainRegistry mutex is released before the write; Has calls
are never blocked by disk I/O.
| Package | Responsibility |
|---|---|
cmd/proxy |
Entry point: wires config, shared registry, metrics, both HTTP servers |
internal/config |
Layered config loading: defaults → proxy-config.json → env vars |
internal/anonymizer |
Pack-based PII detection, token replacement, session maps, streaming de-anon |
internal/anonymizer/packs |
Self-registering PII detection pattern packs (GLOBAL, DE, US, SECRETS, etc.) |
internal/proxy |
Request router: MITM tunnel, opaque tunnel, plain-HTTP forwarding, SSRF |
internal/mitm |
CA management, per-host leaf cert generation/caching, TLS handshake, ALPN |
internal/management |
Management HTTP API + persistent DomainRegistry |
internal/metrics |
Atomic request/error/token counters; latency stats; JSON snapshot |
internal/logger |
Structured, level-gated logger (debug/info/warn/error) → stderr |
All hot-path counters (RequestsTotal, TokensReplaced, etc.) are sync/atomic.Int64 — no
mutex in the request path. Latency accumulators use one sync.Mutex each, updated once per
request at the call site.
The piiTokens block in the /metrics snapshot includes observability for the low-confidence
detection path:
| Counter | Where incremented | What it signals |
|---|---|---|
cacheHits[<type>] |
tokenForMatch — cache hit |
Cache is warm for this PII type |
cacheMisses[<type>] |
tokenForMatch — cache miss |
Value not yet seen by Ollama |
cacheFallbacks |
tokenForMatch — cache miss |
Fallback token used; increments with every miss |
ollamaDispatches |
dispatchOllamaAsync — before goroutine launch |
Goroutine was spawned |
ollamaErrors |
dispatchOllamaAsync — semaphore full or HTTP error |
Ollama unavailable or overloaded |
Per-type counters are pre-allocated for all known PII types (including pack-added types) at startup; zero-count types are
omitted from the JSON output. Counter maps are written only during initialisation so concurrent
reads in Snapshot() require no additional lock.
Cache effectiveness signal: cacheFallbacks / ollamaDispatches trending toward 0 after
warm-up means recurring values are now served from cache. A ratio near 1 after warm-up
indicates Ollama is unreachable, values are high-cardinality, or aiConfidenceThreshold is
routing too many matches through the low-confidence path.