Skip to content

Worker affinity#6

Merged
lkosewsk merged 2 commits into
mainfrom
lk/worker-affinity
Jun 12, 2026
Merged

Worker affinity#6
lkosewsk merged 2 commits into
mainfrom
lk/worker-affinity

Conversation

@lkosewsk

Copy link
Copy Markdown
Collaborator

Massively improve compress times by steering requests to workers that already have a message cache.

lkosewsk and others added 2 commits June 11, 2026 16:59
headroom's compress() keeps a per-process cache of already-compressed
content (process-global pipeline + content-keyed cache, ~30min TTL), so
within one worker a growing conversation does not recompress its
unchanged prefix each turn. Measured on headroom 0.24.0: a repeated tool
result costs ~45ms the first time a worker sees it and ~1ms thereafter.
But the pool dispatched every request to any free worker, so a
conversation's turns scattered across workers and kept hitting cold
caches.

Route by conversation instead. Each slot gets its own affinity channel
(buffered depth 1) alongside a shared spillover channel; Compress sends a
request to the slot chosen from its affinity key, queuing at most one job
deep on the warm worker and spilling to any free worker only when that
slot already has a job waiting. The key is aperture's session_id (always
present in hook metadata) or, when absent, a stable hash of the opening
messages. End-to-end with -pool-size 2: turn 1 worker_ms=45, turns 2-4
worker_ms=1, affinity_hits=4 spills=0.

This is best-effort: under load, spills route to cold workers; bounded
per-worker queueing (a turn waiting briefly for its warm worker) is the
planned next step. -no-affinity disables routing. New metrics
tsheadroom_affinity_{hits,spills}_total expose the split.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Luke Kosewski <lkosewsk@tailscale.com>
The affinity-hit dispatch path was a non-blocking buffered send with no
pool-shutdown guard, unlike the shared-channel path. A job buffered into
a slot's affinity channel just as the pool shut down was never drained
(runSlot exits on ctx.Done; Shutdown only did cancel + wg.Wait), so its
Compress caller could block on j.resp. Reachable only on the shutdown-
timeout path during process exit, but it dropped a serve-or-error
guarantee the previous single-channel dispatch had.

dispatch now fails fast with "pool shutting down" before attempting the
affinity send, and Shutdown drains any straggler buffered job (after
wg.Wait, so there is no competing receiver) and answers it with an error.
Add a regression test that a buffered affinity job on a non-cancelable
context is answered on shutdown rather than orphaned.

Also merge newPool's two identical slot loops into one and correct the
dispatch doc comment's description of the buffered-send semantics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Luke Kosewski <lkosewsk@tailscale.com>
@lkosewsk lkosewsk merged commit 056f6a2 into main Jun 12, 2026
2 checks passed
@lkosewsk lkosewsk deleted the lk/worker-affinity branch June 12, 2026 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant