feat: add R3/v1 router replay deserialization support#450
feat: add R3/v1 router replay deserialization support#450SunnySoldier357 wants to merge 6 commits intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e769ac1e1a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
f10a6a9 to
6be8056
Compare
| # Build final model base URL with tracing metadata | ||
| final_model_base_url = model_base_url | ||
| if model_base_url and ("tracing.fireworks.ai" in model_base_url or model_base_url.startswith("http://localhost")): | ||
| if model_base_url and ("tracing.fireworks.ai" in model_base_url or model_base_url.startswith("http://localhost") or "litellm-gateway" in model_base_url): |
There was a problem hiding this comment.
Why do we need the check for tracing.fireworks.ai or litellm-gateway. Which one is it. Are there cases where its one and not the other, and vice versa?
| if end > len(matrix_bytes): | ||
| logger.warning( | ||
| "R3 matrix data truncated at token %d (position %d): " | ||
| "expected %d bytes but only %d remaining", | ||
| idx, pos, matrix_elem_size, len(matrix_bytes) - start, | ||
| ) | ||
| break | ||
| matrices[pos] = base64.b64encode(matrix_bytes[start:end]).decode("ascii") |
There was a problem hiding this comment.
would it be helpful to still include the bytes that aren't a full matrix elem size?
| # tokens share the same matrix length, so we can recover it from the | ||
| # matrix section total length divided by the replayed-token count. | ||
| if replayed_token_count > 0: | ||
| if matrix_byte_length % replayed_token_count != 0: |
There was a problem hiding this comment.
Earlier I asked if it would be helpful to include bytes at the end that weren't a full matrix_elem_size, but here I see that you already make sure its a perfect multiple of matrix_byte_length. So that check that I commented on earlier could be removed.
| # Per-token matrix byte size is implicit in the payload: all replayed | ||
| # tokens share the same matrix length, so we can recover it from the | ||
| # matrix section total length divided by the replayed-token count. | ||
| if replayed_token_count > 0: |
There was a problem hiding this comment.
if replayed token count is 0, then I think we should just early return and remove some of the checks down the road.
| byte_idx = i >> 3 | ||
| bit_idx = i & 7 | ||
| if byte_idx < len(selector_bytes) and (selector_bytes[byte_idx] >> bit_idx) & 1: |
Mirrors the gateway-side r3_serializer change: the per-token matrix shape (num_moe_layers, top_k) is no longer required and is no longer written into the r3/v1 binary header. Per-token matrix byte size is recovered as matrix_byte_length / replayed_token_count. - HEADER_FORMAT: "<4sBBBBIIHHIIQ" (36 bytes) -> "<4sBBBBIIIIQ" (32 bytes). - Drop num_moe_layers/top_k from _parse_header() and the metadata dict returned by decompress_and_parse_r3(). - Compute matrix_elem_size from matrix_byte_length / replayed_token_count with a divisibility check that surfaces malformed payloads early. - Update unit tests to use matrix_elem_size as the parameter and drop assertions on the removed header fields; round-trip test no longer passes num_moe_layers/top_k to RouterReplayData. Co-authored-by: Cursor <cursoragent@cursor.com>
ZstdCompressor.compress() (used by the gateway-side r3_serializer) embeds the uncompressed size in the frame header, so passing max_output_size=len(compressed)*20 was both unnecessary and incorrect: highly compressible router-replay payloads (e.g. tokens routing to a small subset of experts) routinely exceed a 20:1 ratio, and would have failed deserialization with ZstdError. Removing the cap lets the library auto-allocate from the embedded content size. Verified locally: a 64 KiB zero-filled matrix payload compresses to ~35 bytes (>1800x ratio) and now deserializes cleanly. Adds a regression test covering the high-compression case. Co-authored-by: Cursor <cursoragent@cursor.com>
_RoutingDtype(int) and _SelectorMode(int) raise ValueError for any value not in the enum, so the .get() fallback was unreachable: a future routing_dtype=3 in the header would crash metadata construction before str(int) could run. Look up names by raw int instead — IntEnum keys hash-equal their int values, so known modes resolve to their lowercase name and unknown ones fall back to str(int) without ever constructing the enum. Adds a regression test exercising routing_dtype=99. Co-authored-by: Cursor <cursoragent@cursor.com>
decompress_and_parse_r3 now derives matrix_elem_size from matrix_byte_length / replayed_token_count, so the dtype's per-element byte width is no longer referenced anywhere. Removing dead code. Co-authored-by: Cursor <cursoragent@cursor.com>
6be8056 to
6639c01
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6639c01. Configure here.
| elif selector_mode == _SelectorMode.SUFFIX: | ||
| replayed_positions = list( | ||
| range(replay_start_token, replay_start_token + replayed_token_count) | ||
| ) |
There was a problem hiding this comment.
SUFFIX mode can IndexError on out-of-bounds position
Medium Severity
In SUFFIX mode, replayed_positions is computed as range(replay_start_token, replay_start_token + replayed_token_count), but there's no validation that these positions stay within total_token_count. If the header contains replay_start_token + replayed_token_count > total_token_count, matrices[pos] raises an IndexError since matrices has length total_token_count. The ALL and BITMAP modes are naturally bounded by total_token_count, but SUFFIX is not. The caller catches Exception, so this won't crash the application, but it's inconsistent with the defensive truncation handling done for matrix data.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 6639c01. Configure here.


Summary
r3_deserializermodule that decompresses and unpacks R3/v1 binary router-replay payloads (base64-encoded, zstd-compressed) into per-token routing matrices. Supports ALL, SUFFIX, and BITMAP selector modes with uint8/uint16 dtypes.include_payloadsparameter throughFireworksTracingAdapter.get_evaluation_rows(),RemoteRolloutProcessor,DataLoaderConfig, andupdate_row_with_remote_trace()so callers can opt-in to fetching and extracting router replay data from traces.include_payloads=True,convert_trace_dict_to_evaluation_rowautomatically decompresses anypayloads.router_replay.datablob and attachesrouting_matricesandrouting_metadatatoexecution_metadata.extra.zstandard>=0.19.0as a dependency.Test plan
tests/adapters/test_r3_deserializer.pycovering:r3_serializer(skips if serializer not available)convert_trace_dict_to_evaluation_row(with payload, without, empty data)Made with Cursor
Note
Medium Risk
Adds opt-in fetching and zstd decompression/parsing of binary trace payloads, which can affect performance/memory and introduces a new dependency; failures are logged but could change returned
execution_metadata.extracontents.Overview
Adds opt-in support for pulling
payloads.router_replaydata from the Fireworks tracing gateway and attaching per-token routing matrices/metadata ontoexecution_metadata.extrawhen converting traces toEvaluationRow.Introduces a new
r3_deserializerthat base64-decodes + zstd-decompresses the R3/v1 binary format (ALL/SUFFIX/BITMAP selectors) and threads a newinclude_payloadsflag throughFireworksTracingAdapter.get_evaluation_rows,RemoteRolloutProcessor,DataLoaderConfig, andupdate_row_with_remote_traceto request and propagate these payloads.Adds
zstandardas a dependency and comprehensive unit/integration tests for the deserializer and trace conversion behavior.Reviewed by Cursor Bugbot for commit 6639c01. Bugbot is set up for automated code reviews on this repo. Configure here.