[Feature] add Minimax support and Kimi parser updates#45
Conversation
Enable Minimax chat formatting, parsing, draft-model config, and parser-focused tests so M2.5 data and training flows work end to end. Include the related Kimi parser/template coverage updates and align the Kimi Eagle3 draft config with the intended KV-head setting, while keeping checkpoint export dtype control plus runtime env passthrough for FP8-compatible serving.
…ning Add dataset.shuffle_dataset (default True) so users can disable automatic dataset shuffling when the training data is intentionally ordered (e.g. curriculum learning, staged difficulty). Threads through both the offline preprocessing path and the online training controller epoch reload.
Test-only concern: the HF model paths were only used by test_loss_mask_cross_validation to load tokenizers for validation. Keep ChatTemplate focused on chat format metadata.
There was a problem hiding this comment.
Pull request overview
Adds end-to-end support for MiniMax-M2.5 chat formatting/parsing alongside Kimi-K2.5 parser/template updates, enabling Minimax/Kimi data flows to work through preprocessing, training, and conversion/serving utilities.
Changes:
- Introduce
minimax-m2chat template +MiniMaxParser, with comprehensive unit tests and multimodal/tool-call handling. - Extend Kimi-K2.5 formatting to support
expand_media_tokens=Falsepassthrough behavior and add focused tests. - Add dataset shuffle control (
shuffle_dataset) and add--dtypeoutput casting support to the HF conversion tool (plus env passthrough for SGLang VLM cache sizing).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| torchspec/utils/env.py | Forward SGLANG_VLM_CACHE_SIZE_MB to Ray actors for serving/runtime configuration. |
| torchspec/data/template.py | Add reference_model metadata to templates and register the new minimax-m2 template. |
| torchspec/data/preprocessing.py | Make dataset shuffling conditional on shuffle_seed being set. |
| torchspec/data/parse.py | Add MiniMaxParser; update thinking detection; add media-token passthrough option to Kimi parser. |
| torchspec/controller/training_controller.py | Add optional deterministic shuffling toggle via shuffle_dataset and rename dataset prep helper. |
| torchspec/config/train_config.py | Add shuffle_dataset to DatasetConfig so it can be configured via YAML. |
| tools/convert_to_hf.py | Add --dtype to control output weight dtype during HF conversion. |
| tests/test_minimax_parser.py | New unit tests covering MiniMax formatting/parsing, tools, thinking, multimodal, truncation, passthrough. |
| tests/test_kimi_k25_parser.py | Add tests for expand_media_tokens=False; remove real-tokenizer integration tests. |
| configs/draft_models/minimax_m25_eagle3.json | Add draft-model config for MiniMax M2.5 Eagle3. |
| configs/draft_models/kimi_k25_eagle3.json | Update Kimi K2.5 Eagle3 KV-head setting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c63185234
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
json.loads on string arguments crashed on malformed or plain-text payloads, aborting formatting for the whole job. Catch JSONDecodeError and preserve the raw string as a fallback.
…-prune-vocab The prune-vocab path wrote raw_config from disk without updating torch_dtype, causing exported weights and config metadata to diverge when --dtype was specified.
Enable Minimax chat formatting, parsing, draft-model config, and parser-focused tests so M2.5 data and training flows work end to end. Include the related Kimi parser/template coverage updates and align the Kimi Eagle3 draft config with the intended KV-head setting, while keeping checkpoint export dtype control plus runtime env passthrough for FP8-compatible serving.