Worker affinity#6
Merged
Merged
Conversation
headroom's compress() keeps a per-process cache of already-compressed
content (process-global pipeline + content-keyed cache, ~30min TTL), so
within one worker a growing conversation does not recompress its
unchanged prefix each turn. Measured on headroom 0.24.0: a repeated tool
result costs ~45ms the first time a worker sees it and ~1ms thereafter.
But the pool dispatched every request to any free worker, so a
conversation's turns scattered across workers and kept hitting cold
caches.
Route by conversation instead. Each slot gets its own affinity channel
(buffered depth 1) alongside a shared spillover channel; Compress sends a
request to the slot chosen from its affinity key, queuing at most one job
deep on the warm worker and spilling to any free worker only when that
slot already has a job waiting. The key is aperture's session_id (always
present in hook metadata) or, when absent, a stable hash of the opening
messages. End-to-end with -pool-size 2: turn 1 worker_ms=45, turns 2-4
worker_ms=1, affinity_hits=4 spills=0.
This is best-effort: under load, spills route to cold workers; bounded
per-worker queueing (a turn waiting briefly for its warm worker) is the
planned next step. -no-affinity disables routing. New metrics
tsheadroom_affinity_{hits,spills}_total expose the split.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Luke Kosewski <lkosewsk@tailscale.com>
The affinity-hit dispatch path was a non-blocking buffered send with no pool-shutdown guard, unlike the shared-channel path. A job buffered into a slot's affinity channel just as the pool shut down was never drained (runSlot exits on ctx.Done; Shutdown only did cancel + wg.Wait), so its Compress caller could block on j.resp. Reachable only on the shutdown- timeout path during process exit, but it dropped a serve-or-error guarantee the previous single-channel dispatch had. dispatch now fails fast with "pool shutting down" before attempting the affinity send, and Shutdown drains any straggler buffered job (after wg.Wait, so there is no competing receiver) and answers it with an error. Add a regression test that a buffered affinity job on a non-cancelable context is answered on shutdown rather than orphaned. Also merge newPool's two identical slot loops into one and correct the dispatch doc comment's description of the buffered-send semantics. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Luke Kosewski <lkosewsk@tailscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Massively improve compress times by steering requests to workers that already have a message cache.