Skip to content

fix(kv-router): optimize clear recovery replay#9296

Merged
PeaBrane merged 1 commit intomainfrom
codex/fix-clear-recovery-tail
May 8, 2026
Merged

fix(kv-router): optimize clear recovery replay#9296
PeaBrane merged 1 commit intomainfrom
codex/fix-clear-recovery-tail

Conversation

@PeaBrane
Copy link
Copy Markdown
Contributor

@PeaBrane PeaBrane commented May 8, 2026

Fix router recovery after cleared batches and let worker buffer recovery responses start at the last clear barrier, avoiding replay of dead pre-clear events.

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of recovery barriers when querying KV events after clear operations; buffer queries now correctly skip entries prior to the most recent clear point.
  • Tests

    • Added test coverage for KV event recovery behavior following clear operations.
  • Documentation

    • Clarified KV event recovery semantics and how clear barriers affect event batch responses.

Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane requested a review from a team as a code owner May 8, 2026 00:28
@github-actions github-actions Bot added fix router Relates to routing, KV-aware routing, etc. labels May 8, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5e42d8f8-ecc5-462a-a14a-dd54a3e1281e

📥 Commits

Reviewing files that changed from the base of the PR and between c9b7845 and bf74e47.

📒 Files selected for processing (4)
  • lib/kv-router/src/indexer/local.rs
  • lib/kv-router/src/indexer/tests.rs
  • lib/kv-router/src/indexer/types.rs
  • lib/llm/src/kv_router/indexer/worker_query.rs

Walkthrough

KV indexer buffer queries now return recovery-equivalent event suffixes that account for Cleared events as worker-wide recovery barriers. The local indexer shifts response start to the most recent Cleared event within the queried range. Worker recovery logic preserves live tail state when the clear barrier event_id is older than recent live events.

Changes

KV Indexer Cleared Event Recovery Semantics

Layer / File(s) Summary
Recovery Contract Documentation
lib/kv-router/src/indexer/types.rs
WorkerKvQueryResponse::Events documentation clarified to state that events form a recovery-equivalent suffix from the circular buffer; Cleared events act as worker-wide recovery barriers; last_event_id derives from the same buffer snapshot as the recovery watermark.
Local Indexer Buffer Query Logic
lib/kv-router/src/indexer/local.rs
get_events_in_id_range documentation updated to describe recovery-equivalent suffix semantics. New private helper buffer_response_start_idx finds the last Cleared event at/after the computed start and shifts response start accordingly. classify_query now uses this helper instead of directly using start_idx for Events responses.
Worker Clear Barrier State Preservation
lib/llm/src/kv_router/indexer/worker_query.rs
apply_worker_clear_locked now conditionally preserves max_seen_live_id when it is strictly greater than the clear barrier's event_id, instead of unconditionally resetting it; this affects post-clear gap/live tail coalescing decisions.
Tests and Validation
lib/kv-router/src/indexer/tests.rs, lib/llm/src/kv_router/indexer/worker_query.rs
New test test_local_indexer_buffer_response_starts_at_last_clear verifies that queries after Cleared events return only the post-clear suffix. New test test_recovered_cleared_follows_coalesced_live_tail validates worker recovery behavior when buffered recovery includes a Cleared event followed by live tail coalescing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is incomplete. It lacks Overview and Details sections, file references for review focus, and related issue linkage as specified in the template. Expand the description to include Overview and Details sections, specify key files for review (local.rs, worker_query.rs), and add a 'Closes' or 'Relates to' reference with the GitHub issue number.
Docstring Coverage ⚠️ Warning Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(kv-router): optimize clear recovery replay' directly summarizes the main change: optimizing recovery replay behavior after cleared batches by avoiding unnecessary replay of pre-clear events.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@PeaBrane PeaBrane enabled auto-merge (squash) May 8, 2026 02:43
@PeaBrane PeaBrane merged commit 6f0b7fa into main May 8, 2026
288 of 294 checks passed
@PeaBrane PeaBrane deleted the codex/fix-clear-recovery-tail branch May 8, 2026 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix router Relates to routing, KV-aware routing, etc. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants