feat: add R3/v1 router replay deserialization support by SunnySoldier357 · Pull Request #450 · eval-protocol/python-sdk

SunnySoldier357 · 2026-05-04T21:48:13Z

Summary

Adds r3_deserializer module that decompresses and unpacks R3/v1 binary router-replay payloads (base64-encoded, zstd-compressed) into per-token routing matrices. Supports ALL, SUFFIX, and BITMAP selector modes with uint8/uint16 dtypes.
Threads a new include_payloads parameter through FireworksTracingAdapter.get_evaluation_rows(), RemoteRolloutProcessor, DataLoaderConfig, and update_row_with_remote_trace() so callers can opt-in to fetching and extracting router replay data from traces.
When include_payloads=True, convert_trace_dict_to_evaluation_row automatically decompresses any payloads.router_replay.data blob and attaches routing_matrices and routing_metadata to execution_metadata.extra.
Adds zstandard>=0.19.0 as a dependency.

Test plan

Unit tests in tests/adapters/test_r3_deserializer.py covering:
- Header parsing (valid, bad magic, too short, unsupported version)
- Bitmap position reading (all set, none set, sparse, multi-byte)
- Full decompress+parse for ALL, SUFFIX, and BITMAP selector modes
- uint16 dtype support
- Zero replayed tokens edge case
- Round-trip test against the gateway's r3_serializer (skips if serializer not available)
- Integration with convert_trace_dict_to_evaluation_row (with payload, without, empty data)

Made with Cursor

Note

Medium Risk
Adds opt-in fetching and zstd decompression/parsing of binary trace payloads, which can affect performance/memory and introduces a new dependency; failures are logged but could change returned execution_metadata.extra contents.

Overview
Adds opt-in support for pulling payloads.router_replay data from the Fireworks tracing gateway and attaching per-token routing matrices/metadata onto execution_metadata.extra when converting traces to EvaluationRow.

Introduces a new r3_deserializer that base64-decodes + zstd-decompresses the R3/v1 binary format (ALL/SUFFIX/BITMAP selectors) and threads a new include_payloads flag through FireworksTracingAdapter.get_evaluation_rows, RemoteRolloutProcessor, DataLoaderConfig, and update_row_with_remote_trace to request and propagate these payloads.

Adds zstandard as a dependency and comprehensive unit/integration tests for the deserializer and trace conversion behavior.

^{Reviewed by Cursor Bugbot for commit 6639c01. Bugbot is set up for automated code reviews on this repo. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e769ac1e1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

colin-fireworks · 2026-05-07T04:58:37Z

    # Build final model base URL with tracing metadata
    final_model_base_url = model_base_url
-    if model_base_url and ("tracing.fireworks.ai" in model_base_url or model_base_url.startswith("http://localhost")):
+    if model_base_url and ("tracing.fireworks.ai" in model_base_url or model_base_url.startswith("http://localhost") or "litellm-gateway" in model_base_url):


Why do we need the check for tracing.fireworks.ai or litellm-gateway. Which one is it. Are there cases where its one and not the other, and vice versa?

colin-fireworks · 2026-05-07T05:01:32Z

+        if end > len(matrix_bytes):
+            logger.warning(
+                "R3 matrix data truncated at token %d (position %d): "
+                "expected %d bytes but only %d remaining",
+                idx, pos, matrix_elem_size, len(matrix_bytes) - start,
+            )
+            break
+        matrices[pos] = base64.b64encode(matrix_bytes[start:end]).decode("ascii")


would it be helpful to still include the bytes that aren't a full matrix elem size?

colin-fireworks · 2026-05-07T05:03:47Z

+    # tokens share the same matrix length, so we can recover it from the
+    # matrix section total length divided by the replayed-token count.
+    if replayed_token_count > 0:
+        if matrix_byte_length % replayed_token_count != 0:


Earlier I asked if it would be helpful to include bytes at the end that weren't a full matrix_elem_size, but here I see that you already make sure its a perfect multiple of matrix_byte_length. So that check that I commented on earlier could be removed.

colin-fireworks · 2026-05-07T05:05:40Z

+    # Per-token matrix byte size is implicit in the payload: all replayed
+    # tokens share the same matrix length, so we can recover it from the
+    # matrix section total length divided by the replayed-token count.
+    if replayed_token_count > 0:


if replayed token count is 0, then I think we should just early return and remove some of the checks down the road.

colin-fireworks · 2026-05-07T05:06:52Z

+        byte_idx = i >> 3
+        bit_idx = i & 7
+        if byte_idx < len(selector_bytes) and (selector_bytes[byte_idx] >> bit_idx) & 1:


Where does 7 and 3 come from?

Mirrors the gateway-side r3_serializer change: the per-token matrix shape (num_moe_layers, top_k) is no longer required and is no longer written into the r3/v1 binary header. Per-token matrix byte size is recovered as matrix_byte_length / replayed_token_count. - HEADER_FORMAT: "<4sBBBBIIHHIIQ" (36 bytes) -> "<4sBBBBIIIIQ" (32 bytes). - Drop num_moe_layers/top_k from _parse_header() and the metadata dict returned by decompress_and_parse_r3(). - Compute matrix_elem_size from matrix_byte_length / replayed_token_count with a divisibility check that surfaces malformed payloads early. - Update unit tests to use matrix_elem_size as the parameter and drop assertions on the removed header fields; round-trip test no longer passes num_moe_layers/top_k to RouterReplayData. Co-authored-by: Cursor <cursoragent@cursor.com>

ZstdCompressor.compress() (used by the gateway-side r3_serializer) embeds the uncompressed size in the frame header, so passing max_output_size=len(compressed)*20 was both unnecessary and incorrect: highly compressible router-replay payloads (e.g. tokens routing to a small subset of experts) routinely exceed a 20:1 ratio, and would have failed deserialization with ZstdError. Removing the cap lets the library auto-allocate from the embedded content size. Verified locally: a 64 KiB zero-filled matrix payload compresses to ~35 bytes (>1800x ratio) and now deserializes cleanly. Adds a regression test covering the high-compression case. Co-authored-by: Cursor <cursoragent@cursor.com>

_RoutingDtype(int) and _SelectorMode(int) raise ValueError for any value not in the enum, so the .get() fallback was unreachable: a future routing_dtype=3 in the header would crash metadata construction before str(int) could run. Look up names by raw int instead — IntEnum keys hash-equal their int values, so known modes resolve to their lowercase name and unknown ones fall back to str(int) without ever constructing the enum. Adds a regression test exercising routing_dtype=99. Co-authored-by: Cursor <cursoragent@cursor.com>

decompress_and_parse_r3 now derives matrix_elem_size from matrix_byte_length / replayed_token_count, so the dtype's per-element byte width is no longer referenced anywhere. Removing dead code. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 6639c01. Configure here.}

cursor · 2026-05-07T21:05:57Z

+    elif selector_mode == _SelectorMode.SUFFIX:
+        replayed_positions = list(
+            range(replay_start_token, replay_start_token + replayed_token_count)
+        )


SUFFIX mode can IndexError on out-of-bounds position

Medium Severity

In SUFFIX mode, replayed_positions is computed as range(replay_start_token, replay_start_token + replayed_token_count), but there's no validation that these positions stay within total_token_count. If the header contains replay_start_token + replayed_token_count > total_token_count, matrices[pos] raises an IndexError since matrices has length total_token_count. The ALL and BITMAP modes are naturally bounded by total_token_count, but SUFFIX is not. The caller catches Exception, so this won't crash the application, but it's inconsistent with the defensive truncation handling done for matrix data.

Additional Locations (1)

eval_protocol/adapters/r3_deserializer.py#L174-L175

^{Reviewed by Cursor Bugbot for commit 6639c01. Configure here.}

SunnySoldier357 self-assigned this May 4, 2026

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread eval_protocol/adapters/r3_deserializer.py Outdated

cursor Bot reviewed May 4, 2026

View reviewed changes

Comment thread eval_protocol/adapters/r3_deserializer.py Outdated

cursor Bot reviewed May 5, 2026

View reviewed changes

Comment thread eval_protocol/adapters/r3_deserializer.py Outdated

cursor Bot reviewed May 5, 2026

View reviewed changes

Comment thread eval_protocol/adapters/r3_deserializer.py Outdated

SunnySoldier357 force-pushed the sandeep/router-replay-support branch from f10a6a9 to 6be8056 Compare May 6, 2026 01:52

SunnySoldier357 requested a review from benjibc May 6, 2026 17:44

colin-fireworks reviewed May 7, 2026

View reviewed changes

colin-fireworks approved these changes May 7, 2026

View reviewed changes

SunnySoldier357 and others added 6 commits May 7, 2026 13:53

initial commit

8ad2b55

test

aa0762a

chore: drop unused _RoutingDtype.byte_width

6639c01

decompress_and_parse_r3 now derives matrix_elem_size from matrix_byte_length / replayed_token_count, so the dtype's per-element byte width is no longer referenced anywhere. Removing dead code. Co-authored-by: Cursor <cursoragent@cursor.com>

SunnySoldier357 force-pushed the sandeep/router-replay-support branch from 6be8056 to 6639c01 Compare May 7, 2026 21:00

cursor Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add R3/v1 router replay deserialization support#450

feat: add R3/v1 router replay deserialization support#450
SunnySoldier357 wants to merge 6 commits intomainfrom
sandeep/router-replay-support

SunnySoldier357 commented May 4, 2026 •

edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

colin-fireworks May 7, 2026

Uh oh!

colin-fireworks May 7, 2026

Uh oh!

colin-fireworks May 7, 2026

Uh oh!

colin-fireworks May 7, 2026

Uh oh!

colin-fireworks May 7, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SunnySoldier357 commented May 4, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

colin-fireworks May 7, 2026

Choose a reason for hiding this comment

Uh oh!

colin-fireworks May 7, 2026

Choose a reason for hiding this comment

Uh oh!

colin-fireworks May 7, 2026

Choose a reason for hiding this comment

Uh oh!

colin-fireworks May 7, 2026

Choose a reason for hiding this comment

Uh oh!

colin-fireworks May 7, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 7, 2026

Choose a reason for hiding this comment

SUFFIX mode can IndexError on out-of-bounds position

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SunnySoldier357 commented May 4, 2026 •

edited by cursor Bot

Loading