Skip to content

feat: add R3/v1 router replay deserialization support#450

Open
SunnySoldier357 wants to merge 6 commits intomainfrom
sandeep/router-replay-support
Open

feat: add R3/v1 router replay deserialization support#450
SunnySoldier357 wants to merge 6 commits intomainfrom
sandeep/router-replay-support

Conversation

@SunnySoldier357
Copy link
Copy Markdown
Collaborator

@SunnySoldier357 SunnySoldier357 commented May 4, 2026

Summary

  • Adds r3_deserializer module that decompresses and unpacks R3/v1 binary router-replay payloads (base64-encoded, zstd-compressed) into per-token routing matrices. Supports ALL, SUFFIX, and BITMAP selector modes with uint8/uint16 dtypes.
  • Threads a new include_payloads parameter through FireworksTracingAdapter.get_evaluation_rows(), RemoteRolloutProcessor, DataLoaderConfig, and update_row_with_remote_trace() so callers can opt-in to fetching and extracting router replay data from traces.
  • When include_payloads=True, convert_trace_dict_to_evaluation_row automatically decompresses any payloads.router_replay.data blob and attaches routing_matrices and routing_metadata to execution_metadata.extra.
  • Adds zstandard>=0.19.0 as a dependency.

Test plan

  • Unit tests in tests/adapters/test_r3_deserializer.py covering:
    • Header parsing (valid, bad magic, too short, unsupported version)
    • Bitmap position reading (all set, none set, sparse, multi-byte)
    • Full decompress+parse for ALL, SUFFIX, and BITMAP selector modes
    • uint16 dtype support
    • Zero replayed tokens edge case
    • Round-trip test against the gateway's r3_serializer (skips if serializer not available)
    • Integration with convert_trace_dict_to_evaluation_row (with payload, without, empty data)

Made with Cursor


Note

Medium Risk
Adds opt-in fetching and zstd decompression/parsing of binary trace payloads, which can affect performance/memory and introduces a new dependency; failures are logged but could change returned execution_metadata.extra contents.

Overview
Adds opt-in support for pulling payloads.router_replay data from the Fireworks tracing gateway and attaching per-token routing matrices/metadata onto execution_metadata.extra when converting traces to EvaluationRow.

Introduces a new r3_deserializer that base64-decodes + zstd-decompresses the R3/v1 binary format (ALL/SUFFIX/BITMAP selectors) and threads a new include_payloads flag through FireworksTracingAdapter.get_evaluation_rows, RemoteRolloutProcessor, DataLoaderConfig, and update_row_with_remote_trace to request and propagate these payloads.

Adds zstandard as a dependency and comprehensive unit/integration tests for the deserializer and trace conversion behavior.

Reviewed by Cursor Bugbot for commit 6639c01. Bugbot is set up for automated code reviews on this repo. Configure here.

@SunnySoldier357 SunnySoldier357 self-assigned this May 4, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e769ac1e1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
Comment thread eval_protocol/adapters/r3_deserializer.py Outdated
@SunnySoldier357 SunnySoldier357 force-pushed the sandeep/router-replay-support branch from f10a6a9 to 6be8056 Compare May 6, 2026 01:52
@SunnySoldier357 SunnySoldier357 requested a review from benjibc May 6, 2026 17:44
# Build final model base URL with tracing metadata
final_model_base_url = model_base_url
if model_base_url and ("tracing.fireworks.ai" in model_base_url or model_base_url.startswith("http://localhost")):
if model_base_url and ("tracing.fireworks.ai" in model_base_url or model_base_url.startswith("http://localhost") or "litellm-gateway" in model_base_url):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the check for tracing.fireworks.ai or litellm-gateway. Which one is it. Are there cases where its one and not the other, and vice versa?

Comment on lines +168 to +175
if end > len(matrix_bytes):
logger.warning(
"R3 matrix data truncated at token %d (position %d): "
"expected %d bytes but only %d remaining",
idx, pos, matrix_elem_size, len(matrix_bytes) - start,
)
break
matrices[pos] = base64.b64encode(matrix_bytes[start:end]).decode("ascii")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be helpful to still include the bytes that aren't a full matrix elem size?

# tokens share the same matrix length, so we can recover it from the
# matrix section total length divided by the replayed-token count.
if replayed_token_count > 0:
if matrix_byte_length % replayed_token_count != 0:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier I asked if it would be helpful to include bytes at the end that weren't a full matrix_elem_size, but here I see that you already make sure its a perfect multiple of matrix_byte_length. So that check that I commented on earlier could be removed.

# Per-token matrix byte size is implicit in the payload: all replayed
# tokens share the same matrix length, so we can recover it from the
# matrix section total length divided by the replayed-token count.
if replayed_token_count > 0:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if replayed token count is 0, then I think we should just early return and remove some of the checks down the road.

Comment on lines +88 to +90
byte_idx = i >> 3
bit_idx = i & 7
if byte_idx < len(selector_bytes) and (selector_bytes[byte_idx] >> bit_idx) & 1:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does 7 and 3 come from?

SunnySoldier357 and others added 6 commits May 7, 2026 13:53
Mirrors the gateway-side r3_serializer change: the per-token matrix
shape (num_moe_layers, top_k) is no longer required and is no longer
written into the r3/v1 binary header.  Per-token matrix byte size is
recovered as matrix_byte_length / replayed_token_count.

- HEADER_FORMAT: "<4sBBBBIIHHIIQ" (36 bytes) -> "<4sBBBBIIIIQ" (32 bytes).
- Drop num_moe_layers/top_k from _parse_header() and the metadata dict
  returned by decompress_and_parse_r3().
- Compute matrix_elem_size from matrix_byte_length / replayed_token_count
  with a divisibility check that surfaces malformed payloads early.
- Update unit tests to use matrix_elem_size as the parameter and drop
  assertions on the removed header fields; round-trip test no longer
  passes num_moe_layers/top_k to RouterReplayData.

Co-authored-by: Cursor <cursoragent@cursor.com>
ZstdCompressor.compress() (used by the gateway-side r3_serializer)
embeds the uncompressed size in the frame header, so passing
max_output_size=len(compressed)*20 was both unnecessary and incorrect:
highly compressible router-replay payloads (e.g. tokens routing to a
small subset of experts) routinely exceed a 20:1 ratio, and would have
failed deserialization with ZstdError.

Removing the cap lets the library auto-allocate from the embedded
content size.  Verified locally: a 64 KiB zero-filled matrix payload
compresses to ~35 bytes (>1800x ratio) and now deserializes cleanly.

Adds a regression test covering the high-compression case.

Co-authored-by: Cursor <cursoragent@cursor.com>
_RoutingDtype(int) and _SelectorMode(int) raise ValueError for any
value not in the enum, so the .get() fallback was unreachable: a
future routing_dtype=3 in the header would crash metadata
construction before str(int) could run.

Look up names by raw int instead — IntEnum keys hash-equal their int
values, so known modes resolve to their lowercase name and unknown
ones fall back to str(int) without ever constructing the enum.
Adds a regression test exercising routing_dtype=99.

Co-authored-by: Cursor <cursoragent@cursor.com>
decompress_and_parse_r3 now derives matrix_elem_size from
matrix_byte_length / replayed_token_count, so the dtype's per-element
byte width is no longer referenced anywhere. Removing dead code.

Co-authored-by: Cursor <cursoragent@cursor.com>
@SunnySoldier357 SunnySoldier357 force-pushed the sandeep/router-replay-support branch from 6be8056 to 6639c01 Compare May 7, 2026 21:00
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6639c01. Configure here.

elif selector_mode == _SelectorMode.SUFFIX:
replayed_positions = list(
range(replay_start_token, replay_start_token + replayed_token_count)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUFFIX mode can IndexError on out-of-bounds position

Medium Severity

In SUFFIX mode, replayed_positions is computed as range(replay_start_token, replay_start_token + replayed_token_count), but there's no validation that these positions stay within total_token_count. If the header contains replay_start_token + replayed_token_count > total_token_count, matrices[pos] raises an IndexError since matrices has length total_token_count. The ALL and BITMAP modes are naturally bounded by total_token_count, but SUFFIX is not. The caller catches Exception, so this won't crash the application, but it's inconsistent with the defensive truncation handling done for matrix data.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 6639c01. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants