CPU optimizations: try_recv drain, batch timestamps, zero-copy Bytes#18
Open
vyshah wants to merge 6 commits intonerdsane:mainfrom
Open
CPU optimizations: try_recv drain, batch timestamps, zero-copy Bytes#18vyshah wants to merge 6 commits intonerdsane:mainfrom
vyshah wants to merge 6 commits intonerdsane:mainfrom
Conversation
The access_times map was written on every key access and cleaned up during eviction, but never read back for decision-making — no LRU eviction was implemented. This eliminates one AHashMap<String, VirtualTime> per shard (×16 shards), plus one String clone per key access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of iterating expirations to collect expired keys into a Vec, then iterating again to remove them from both maps, use retain() to remove from expirations in one pass while collecting keys for data removal. Eliminates one full HashMap iteration per eviction cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The buffer pool was hardcoded at 512 pre-allocated 8KB buffers (4MB) and 10k max connections. Add ConnectionPoolConfig to PerformanceConfig so these can be tuned via TOML config file. Lower the default buffer pool from 512 to 64 (512KB), which is more appropriate for most deployments while still allowing on-demand allocation beyond the pool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After processing a message received via recv().await, drain all pending messages with try_recv() before yielding back to tokio. This is the FoundationDB actor loop pattern — it reduces unnecessary context switches when messages arrive faster than processing time, which happens frequently under pipeline batches. Measured: -41% reduction in CPU gap vs Redis 8.4 under 12k req/s load. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of calling Instant::now() twice per command (once for start, once for elapsed), capture a single timestamp at the top of each read iteration and reuse it for all commands in the pipeline batch. This amortizes the clock_gettime syscall cost across all commands in a batch — at batchLen=500, that's ~1000 syscalls saved per batch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Bytes::copy_from_slice() with split_to().freeze().slice() in all 4 hot-path methods: collect_get_keys, collect_set_pairs, try_fast_get, try_fast_set. This eliminates heap allocation per key by using reference-counted slices into the already-allocated read buffer. Under MGET with 500 keys, that's 500 fewer allocations per batch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three CPU optimizations identified via Datadog continuous profiling of redis-rust under comparison load against Redis 8.4 in staging.
recv().await, drain all pending messages withtry_recv()before yielding back to tokio. FoundationDB actor loop pattern. Reduces context switches when messages arrive faster than processing time (common under pipeline batches).Instant::now()at the top of each read iteration and reuse it for all commands in the pipeline batch. Eliminates 2clock_gettimesyscalls per command.Bytes::copy_from_slice()withsplit_to().freeze().slice()in all 4 hot-path methods (collect_get_keys, collect_set_pairs, try_fast_get, try_fast_set). Eliminates heap allocation per key by using reference-counted slices into the read buffer.Measurements
Under identical comparison traffic from ephemera-probe (12k req/s, batchLen=500):
The CPU gap vs Redis 8.4 narrowed from +100% to +41% under load. The optimizations scale with request volume — at low traffic the fixed overhead dominates, but at higher throughput the per-request savings compound.
Test plan
🤖 Generated with Claude Code