feat: add Xiaomi MiMo speech support by xyuai · Pull Request #2560 · Hmbown/CodeWhale

xyuai · 2026-06-02T01:02:30Z

Summary

Add Xiaomi MiMo speech support and configuration wiring.
Register the speech tool through TUI/agent setup paths.
Update provider docs and example configuration.

Validation

cargo fmt --check
cargo check -p codewhale-config -p codewhale-agent -p codewhale-cli -p codewhale-tui

Greptile Summary

This PR adds end-to-end Xiaomi MiMo TTS support: a synthesize_speech client method, a SpeechTool / tts alias tool, a codewhale speech CLI command, and all the config/model-registry wiring needed to make them work in both the interactive TUI and subagent contexts. The three issues raised in the previous review round (empty user message on no instruction, chat-only models in the model-visible list, and speech_output_dir not forwarded to subagents) have all been addressed and are backed by new tests.

crates/tui/src/client.rs – adds SpeechSynthesisRequest/Response and synthesize_speech, which POSTs to chat/completions with the spoken text in an assistant message; the optional style instruction is conditionally included as a user message only when non-empty.
crates/tui/src/tools/speech.rs – new model-visible SpeechTool with model inference (tts / voice-design / voice-clone), voice-clone data-URI encoding, network-policy checks, and workspace-bounded output-path resolution.
Config plumbing (SpeechConfig, [speech].output_dir, env-var overrides, EngineConfig::speech_output_dir, SubAgentRuntime::speech_output_dir) ensures the configured output directory is consistently inherited across the main engine and all subagent spawn paths.

Confidence Score: 5/5

Safe to merge; the three previously blocking issues are all fixed and covered by tests.

All three issues flagged in the prior review are resolved: the empty user-message bug is fixed in build_speech_synthesis_body, SUPPORTED_XIAOMI_MIMO_SPEECH_MODELS now contains only TTS model IDs (enforced by a new assertion test), and speech_output_dir is threaded through SubAgentRuntime and EngineConfig into every subagent speech-tool registration site. The remaining findings are minor validation and formatting gaps that don't affect correctness in normal usage.

The voice-resolution block in crates/tui/src/tools/speech.rs (and its mirror in crates/tui/src/main.rs) would benefit from a tighter guard when the resolved model is voiceclone but the supplied voice is a plain built-in ID rather than a data URI.

Important Files Changed

Filename	Overview
crates/tui/src/tools/speech.rs	New speech tool file implementing SpeechTool (model-visible) with model inference, voice clone encoding, network policy checks, and output path resolution. SUPPORTED_XIAOMI_MIMO_SPEECH_MODELS now correctly lists only TTS models. Validation gap: voiceclone model + non-data-URI voice silently passes through instead of giving a clear error.
crates/tui/src/client.rs	Adds SpeechSynthesisRequest/Response structs and synthesize_speech method. Previously flagged empty-user-message bug is fixed via filter on instruction. parse_speech_audio_response handles both message.audio and top-level audio shapes.
crates/tui/src/main.rs	Adds SpeechArgs and run_speech function for the CLI speech/tts command. Same voiceclone-with-non-data-URI validation gap as in speech.rs. Config-based output_dir fallback chain is correct.
crates/tui/src/tools/registry.rs	Adds with_speech_tools builder method registering both 'speech' and 'tts' aliases. speech_output_dir is now correctly threaded through from SubAgentRuntime into with_full_agent_surface, resolving the previously flagged forwarding gap.
crates/tui/src/tools/subagent/mod.rs	Adds speech_output_dir field to SubAgentRuntime and with_speech_output_dir builder. child_runtime() propagates the value. Test confirms inheritance.
crates/tui/src/config.rs	Adds SpeechConfig struct, speech field in Config, speech_output_dir() resolver (env vars + toml), and canonical_xiaomi_mimo_model_id for TTS alias normalization. TTS model IDs added to model completion list.
crates/config/src/lib.rs	Adds TTS model constants and canonical_xiaomi_mimo_model_id. normalize_model_for_provider correctly applies TTS alias expansion before other normalization paths.
crates/agent/src/lib.rs	Registers four MiMo TTS ModelInfo entries with supports_tools=false, supports_reasoning=false. Aliases are consistent with config normalization.
config.example.toml	Documents new TTS model IDs and [speech] config section. Three new TTS model comment lines use '?' as separator instead of the '—' used by all adjacent entries.
crates/tui/src/core/engine.rs	Adds speech_output_dir field to EngineConfig and threads it into two SubAgentRuntime construction sites.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as codewhale speech CLI
    participant Tool as SpeechTool (TUI)
    participant Client as DeepSeekClient
    participant API as Xiaomi MiMo API

    User->>CLI: codewhale speech "Hello" --model tts -o out.wav
    CLI->>CLI: infer_speech_model → mimo-v2.5-tts
    CLI->>CLI: "validate provider == xiaomi-mimo"
    CLI->>CLI: resolve output path
    CLI->>Client: synthesize_speech(model, text, instruction?, voice?)
    Client->>Client: wire_model_for_provider → canonical model ID
    Client->>Client: build_speech_synthesis_body
    Client->>API: POST /v1/chat/completions
    API-->>Client: JSON with audio.data base64
    Client->>Client: parse_speech_audio_response → decode base64
    Client-->>CLI: SpeechSynthesisResponse
    CLI->>CLI: fs::write(output_path, audio_bytes)
    CLI-->>User: Generated speech: out.wav (N bytes)

    Note over Tool,API: Agent/YOLO path: SpeechTool follows same flow
    Note over Tool,API: with network-policy check and workspace-bounded path resolution

Comments Outside Diff (1)

crates/tui/src/tools/speech.rs, line 1595-1617 (link)

Significant helper duplication between this file and crates/tui/src/main.rs. combine_speech_instructions, normalize_speech_format, default_speech_output_name, encode_voice_clone_data_uri, and describe_speech_voice are copied verbatim (or near-verbatim) into both files. Additionally, canonical_xiaomi_mimo_model_id is duplicated between crates/config/src/lib.rs and crates/tui/src/config.rs. Any future fix to one copy will likely miss the other. These should be extracted to a shared module or, for the config normalizer, re-exported from the single canonical location.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (2): Last reviewed commit: "fix: harden Xiaomi MiMo speech flow" | Re-trigger Greptile}

gemini-code-assist · 2026-06-02T01:02:35Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Hmbown · 2026-06-02T02:20:50Z

Thanks for adding the Xiaomi MiMo speech path. This is promising, but I would not harvest it into v0.8.50 yet because the current branch still has a few runtime correctness issues that would affect the default user flow.

Concrete next steps:

In DeepSeekClient::synthesize_speech, only include the user message when the instruction is non-empty. The documented codewhale speech "text" --model tts path should not send "content": "".
Keep SUPPORTED_XIAOMI_MIMO_SPEECH_MODELS to TTS-capable models only, so the tool does not advertise chat-only models that the TTS guard will reject.
Thread the configured [speech].output_dir / XIAOMI_MIMO_SPEECH_OUTPUT_DIR through the subagent tool registration path too; right now parent and subagent invocations can disagree.
Deduplicate the speech helper functions between crates/tui/src/main.rs and crates/tui/src/tools/speech.rs, and keep Xiaomi MiMo model normalization canonical in one module.
Add focused tests for the no-instruction request body, supported-model list, configured output dir in the tool path, and one CLI passthrough smoke.

Once those are fixed, this looks like a good provider feature to revisit. I am keeping it out of the release harvest for now because provider features need to be boringly correct at the first documented invocation.

xyuai · 2026-06-02T03:03:56Z

Thanks for the detailed feedback. I pushed an update in 2c34fee2 that addresses the five items:

Omit the user message when instruction is empty.
Restrict supported Xiaomi MiMo speech models to TTS-capable models.
Thread speech_output_dir through the subagent tool registration/runtime path.
Deduplicate speech helpers between the CLI and tool code, keeping MiMo speech model normalization in one module.
Add focused tests for the no-instruction request body, supported-model list, configured output dir path, subagent inheritance, and CLI passthrough smoke.

Validation:

cargo fmt --check
cargo check -p codewhale-config -p codewhale-agent -p codewhale-cli -p codewhale-tui
focused cargo test -p codewhale-tui --bin codewhale-tui ... speech/subagent tests

xyuai · 2026-06-02T04:04:02Z

@Hmbown I pushed the requested fixes in 2c34fee2 and added focused tests. The GitHub Actions check appears to need maintainer approval to run when you have a moment.

Hmbown · 2026-06-02T04:29:35Z

Hey @xyuai — the Xiaomi MiMo speech support has been harvested into v0.8.50 (#2504)! The fix commit addressing the review feedback was solid — all 17 speech tests pass and the code is clean. Love seeing the full stack: model registry, CLI, tool registration, config, and tests all wired together. Really appreciate you pushing the fixes through. Thank you! 🐋🎤

…#2560)

feat: add Xiaomi MiMo speech support

036c341

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread crates/tui/src/client.rs Outdated

Comment thread crates/tui/src/tools/speech.rs

Comment thread crates/tui/src/tools/registry.rs Outdated

Hmbown mentioned this pull request Jun 2, 2026

[codex] v0.8.50 triage harvest #2504

Merged

fix: harden Xiaomi MiMo speech flow

2c34fee

Hmbown added a commit that referenced this pull request Jun 2, 2026

docs(changelog): credit new harvests for v0.8.50 (#2514, #2519, #2503, …

e763b44

…#2560)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Xiaomi MiMo speech support#2560

feat: add Xiaomi MiMo speech support#2560
xyuai wants to merge 2 commits into
Hmbown:mainfrom
xyuai:feat/xiaomi-mimo-speech

xyuai commented Jun 2, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

gemini-code-assist Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hmbown commented Jun 2, 2026

Uh oh!

xyuai commented Jun 2, 2026

Uh oh!

xyuai commented Jun 2, 2026

Uh oh!

Hmbown commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xyuai commented Jun 2, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

gemini-code-assist Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Hmbown commented Jun 2, 2026

Uh oh!

xyuai commented Jun 2, 2026

Uh oh!

xyuai commented Jun 2, 2026

Uh oh!

Hmbown commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xyuai commented Jun 2, 2026 •

edited by greptile-apps Bot

Loading