Worker affinity by lkosewsk · Pull Request #6 · tailscale/tsheadroom

lkosewsk · 2026-06-11T17:55:42Z

Massively improve compress times by steering requests to workers that already have a message cache.

headroom's compress() keeps a per-process cache of already-compressed content (process-global pipeline + content-keyed cache, ~30min TTL), so within one worker a growing conversation does not recompress its unchanged prefix each turn. Measured on headroom 0.24.0: a repeated tool result costs ~45ms the first time a worker sees it and ~1ms thereafter. But the pool dispatched every request to any free worker, so a conversation's turns scattered across workers and kept hitting cold caches. Route by conversation instead. Each slot gets its own affinity channel (buffered depth 1) alongside a shared spillover channel; Compress sends a request to the slot chosen from its affinity key, queuing at most one job deep on the warm worker and spilling to any free worker only when that slot already has a job waiting. The key is aperture's session_id (always present in hook metadata) or, when absent, a stable hash of the opening messages. End-to-end with -pool-size 2: turn 1 worker_ms=45, turns 2-4 worker_ms=1, affinity_hits=4 spills=0. This is best-effort: under load, spills route to cold workers; bounded per-worker queueing (a turn waiting briefly for its warm worker) is the planned next step. -no-affinity disables routing. New metrics tsheadroom_affinity_{hits,spills}_total expose the split. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Luke Kosewski <lkosewsk@tailscale.com>

The affinity-hit dispatch path was a non-blocking buffered send with no pool-shutdown guard, unlike the shared-channel path. A job buffered into a slot's affinity channel just as the pool shut down was never drained (runSlot exits on ctx.Done; Shutdown only did cancel + wg.Wait), so its Compress caller could block on j.resp. Reachable only on the shutdown- timeout path during process exit, but it dropped a serve-or-error guarantee the previous single-channel dispatch had. dispatch now fails fast with "pool shutting down" before attempting the affinity send, and Shutdown drains any straggler buffered job (after wg.Wait, so there is no competing receiver) and answers it with an error. Add a regression test that a buffered affinity job on a non-cancelable context is answered on shutdown rather than orphaned. Also merge newPool's two identical slot loops into one and correct the dispatch doc comment's description of the buffered-send semantics. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Luke Kosewski <lkosewsk@tailscale.com>

lkosewsk and others added 2 commits June 11, 2026 16:59

lkosewsk merged commit 056f6a2 into main Jun 12, 2026
2 checks passed

lkosewsk deleted the lk/worker-affinity branch June 12, 2026 06:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker affinity#6

Worker affinity#6
lkosewsk merged 2 commits into
mainfrom
lk/worker-affinity

lkosewsk commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lkosewsk commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant