CPU optimizations: try_recv drain, batch timestamps, zero-copy Bytes by vyshah · Pull Request #18 · nerdsane/redis-rust

vyshah · 2026-02-24T03:37:16Z

Summary

Three CPU optimizations identified via Datadog continuous profiling of redis-rust under comparison load against Redis 8.4 in staging.

Depends on #15 (memory optimizations) — merge #15 first, then this PR applies cleanly on top.

try_recv drain — After processing a message via recv().await, drain all pending messages with try_recv() before yielding back to tokio. FoundationDB actor loop pattern. Reduces context switches when messages arrive faster than processing time (common under pipeline batches).
Batch clock_gettime — Capture a single Instant::now() at the top of each read iteration and reuse it for all commands in the pipeline batch. Eliminates 2 clock_gettime syscalls per command.
Zero-copy Bytes — Replace Bytes::copy_from_slice() with split_to().freeze().slice() in all 4 hot-path methods (collect_get_keys, collect_set_pairs, try_fast_get, try_fast_set). Eliminates heap allocation per key by using reference-counted slices into the read buffer.

Measurements

Under identical comparison traffic from ephemera-probe (12k req/s, batchLen=500):

	Redis 8.4	redis-rust (before)	redis-rust (after)
CPU	0.364 cores	~0.55 cores (+100%)	0.513 cores (+41%)

The CPU gap vs Redis 8.4 narrowed from +100% to +41% under load. The optimizations scale with request volume — at low traffic the fixed overhead dominates, but at higher throughput the per-request savings compound.

Test plan

All 449 deterministic simulation tests pass (TTL, concurrency, replay, buggify chaos, multi-seed invariants)
Deployed to staging and verified under 12k req/s comparison traffic
Zero errors on both redis-rust and Redis 8.4 sides
Traffic parity confirmed between comparison caches

🤖 Generated with Claude Code

The access_times map was written on every key access and cleaned up during eviction, but never read back for decision-making — no LRU eviction was implemented. This eliminates one AHashMap<String, VirtualTime> per shard (×16 shards), plus one String clone per key access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Instead of iterating expirations to collect expired keys into a Vec, then iterating again to remove them from both maps, use retain() to remove from expirations in one pass while collecting keys for data removal. Eliminates one full HashMap iteration per eviction cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The buffer pool was hardcoded at 512 pre-allocated 8KB buffers (4MB) and 10k max connections. Add ConnectionPoolConfig to PerformanceConfig so these can be tuned via TOML config file. Lower the default buffer pool from 512 to 64 (512KB), which is more appropriate for most deployments while still allowing on-demand allocation beyond the pool. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

After processing a message received via recv().await, drain all pending messages with try_recv() before yielding back to tokio. This is the FoundationDB actor loop pattern — it reduces unnecessary context switches when messages arrive faster than processing time, which happens frequently under pipeline batches. Measured: -41% reduction in CPU gap vs Redis 8.4 under 12k req/s load. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Instead of calling Instant::now() twice per command (once for start, once for elapsed), capture a single timestamp at the top of each read iteration and reuse it for all commands in the pipeline batch. This amortizes the clock_gettime syscall cost across all commands in a batch — at batchLen=500, that's ~1000 syscalls saved per batch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace Bytes::copy_from_slice() with split_to().freeze().slice() in all 4 hot-path methods: collect_get_keys, collect_set_pairs, try_fast_get, try_fast_set. This eliminates heap allocation per key by using reference-counted slices into the already-allocated read buffer. Under MGET with 500 keys, that's 500 fewer allocations per batch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vyshah and others added 6 commits February 23, 2026 01:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU optimizations: try_recv drain, batch timestamps, zero-copy Bytes#18

CPU optimizations: try_recv drain, batch timestamps, zero-copy Bytes#18
vyshah wants to merge 6 commits intonerdsane:mainfrom
vyshah:feat/cpu-optimizations

vyshah commented Feb 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vyshah commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measurements

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vyshah commented Feb 24, 2026 •

edited

Loading