Fix minimax type mismatch by yubofredwang · Pull Request #49 · torchspec-project/TorchSpec

yubofredwang · 2026-03-20T01:32:22Z

Fix minimax type mismatch

SGLang may load models in float16 (e.g. MiniMax-M2.5) while training runs in bfloat16. Without an explicit cast, float16 bytes were stored and later interpreted as bfloat16, silently corrupting training data. Introduce HIDDEN_STATES_STORAGE_DTYPE as a single source of truth and cast hidden_states/last_hidden_states/target in EagleMooncakeStore.put() so both SGLang and vLLM paths are covered.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: faa46ef742

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

torchspec/transfer/mooncake/eagle_store.py

Copilot

Pull request overview

This PR standardizes the dtype used for Eagle/Mooncake hidden-state storage to address dtype mismatches (“minimax type mismatch”) by introducing a canonical storage dtype and applying it across Mooncake put/get metadata in inference engines.

Changes:

Introduces HIDDEN_STATES_STORAGE_DTYPE (canonical hidden-state storage dtype) and casts tensors to it on EagleMooncakeStore.put().
Updates EagleMooncakeStore.get() defaults to use the canonical dtype.
Updates SGL and vLLM engines’ Mooncake metadata dtypes to reference the canonical dtype.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`torchspec/transfer/mooncake/eagle_store.py`	Adds canonical dtype constant; casts tensors before writing; uses canonical dtype as default for reads.
`torchspec/inference/engine/vllm_engine.py`	Uses canonical dtype constant when reporting Mooncake tensor dtypes.
`torchspec/inference/engine/sgl_engine.py`	Uses canonical dtype constant when reporting Mooncake tensor dtypes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

torchspec/transfer/mooncake/eagle_store.py

torchspec/inference/engine/vllm_engine.py

torchspec/inference/engine/sgl_engine.py

The previous commit casts hidden states to bfloat16 inside EagleMooncakeStore.put(), but the vLLM worker extension and HF runner still reported the original pre-cast dtype in their metadata dicts. Since the training-side data fetcher trusts that metadata to decode Mooncake bytes, the mismatch would silently corrupt reads. Both emitters now report HIDDEN_STATES_STORAGE_DTYPE so metadata and stored bytes agree.

Make put() the single source of truth for both shapes and dtypes by returning {"shapes": ..., "dtypes": ...} from the post-cast tensors. Callers now use the store's return value instead of reading dtypes from their own pre-cast local variables. This eliminates the class of bugs where a producer emits metadata with the wrong dtype because put() silently cast under the hood.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

torchspec/transfer/mooncake/eagle_store.py

torchspec/inference/engine/hf_runner.py

torchspec/inference/engine/vllm_worker_extension.py

yubofredwang added 3 commits March 19, 2026 22:36

[Docs] add PyTorch blog link to README

cc713ed

Merge remote-tracking branch 'origin/main' into ywang/readme-refresh

faa46ef

yubofredwang marked this pull request as ready for review March 20, 2026 01:32

Copilot AI review requested due to automatic review settings March 20, 2026 01:32

Copilot started reviewing on behalf of yubofredwang March 20, 2026 01:33 View session

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

torchspec/transfer/mooncake/eagle_store.py Show resolved Hide resolved

Copilot AI reviewed Mar 20, 2026

View reviewed changes

torchspec/transfer/mooncake/eagle_store.py Show resolved Hide resolved

torchspec/transfer/mooncake/eagle_store.py Show resolved Hide resolved

torchspec/inference/engine/vllm_engine.py Show resolved Hide resolved

torchspec/inference/engine/sgl_engine.py Show resolved Hide resolved

yubofredwang added 2 commits March 20, 2026 01:40

Copilot AI review requested due to automatic review settings March 20, 2026 01:42

Copilot started reviewing on behalf of yubofredwang March 20, 2026 01:43 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

torchspec/transfer/mooncake/eagle_store.py Show resolved Hide resolved

torchspec/transfer/mooncake/eagle_store.py Show resolved Hide resolved

torchspec/inference/engine/hf_runner.py Show resolved Hide resolved

torchspec/inference/engine/vllm_worker_extension.py Show resolved Hide resolved

yubofredwang merged commit 985f90d into main Mar 20, 2026
5 checks passed

yubofredwang deleted the ywang/fix-minimax-type-mismatch branch March 20, 2026 01:51

zhubohao911 pushed a commit to zhubohao911/TorchSpec that referenced this pull request Mar 23, 2026

Fix minimax type mismatch (torchspec-project#49)

c77a4f4

zhubohao911 pushed a commit to zhubohao911/TorchSpec that referenced this pull request Mar 23, 2026

Fix minimax type mismatch (torchspec-project#49)

5e50e38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix minimax type mismatch#49

Fix minimax type mismatch#49
yubofredwang merged 5 commits intomainfrom
ywang/fix-minimax-type-mismatch

yubofredwang commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yubofredwang commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants