Conversation
SGLang may load models in float16 (e.g. MiniMax-M2.5) while training runs in bfloat16. Without an explicit cast, float16 bytes were stored and later interpreted as bfloat16, silently corrupting training data. Introduce HIDDEN_STATES_STORAGE_DTYPE as a single source of truth and cast hidden_states/last_hidden_states/target in EagleMooncakeStore.put() so both SGLang and vLLM paths are covered.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: faa46ef742
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR standardizes the dtype used for Eagle/Mooncake hidden-state storage to address dtype mismatches (“minimax type mismatch”) by introducing a canonical storage dtype and applying it across Mooncake put/get metadata in inference engines.
Changes:
- Introduces
HIDDEN_STATES_STORAGE_DTYPE(canonical hidden-state storage dtype) and casts tensors to it onEagleMooncakeStore.put(). - Updates
EagleMooncakeStore.get()defaults to use the canonical dtype. - Updates SGL and vLLM engines’ Mooncake metadata dtypes to reference the canonical dtype.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
torchspec/transfer/mooncake/eagle_store.py |
Adds canonical dtype constant; casts tensors before writing; uses canonical dtype as default for reads. |
torchspec/inference/engine/vllm_engine.py |
Uses canonical dtype constant when reporting Mooncake tensor dtypes. |
torchspec/inference/engine/sgl_engine.py |
Uses canonical dtype constant when reporting Mooncake tensor dtypes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The previous commit casts hidden states to bfloat16 inside EagleMooncakeStore.put(), but the vLLM worker extension and HF runner still reported the original pre-cast dtype in their metadata dicts. Since the training-side data fetcher trusts that metadata to decode Mooncake bytes, the mismatch would silently corrupt reads. Both emitters now report HIDDEN_STATES_STORAGE_DTYPE so metadata and stored bytes agree.
Make put() the single source of truth for both shapes and dtypes by
returning {"shapes": ..., "dtypes": ...} from the post-cast tensors.
Callers now use the store's return value instead of reading dtypes from
their own pre-cast local variables.
This eliminates the class of bugs where a producer emits metadata with
the wrong dtype because put() silently cast under the hood.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fix minimax type mismatch