feat(voice): make ASR/control provider configurable via voice config#900
Merged
yanyihan-xiaomi merged 13 commits intoJun 17, 2026
Merged
Conversation
c6aaf66 to
b24de65
Compare
b24de65 to
84b7c03
Compare
Add voice.asr_model and voice.control_model to config schema, allowing users to point voice input at any configured provider instead of hardcoding xiaomi. Defaults remain xiaomi/mimo-v2.5-asr and xiaomi/mimo-v2.5 for backward compatibility.
…ages Capture stderr from the recorder process and detect early non-zero exits to show actionable errors (e.g. "no microphone found") instead of a generic message. From PR XiaomiMiMo#704.
Prevent the reading loop catch from calling onError when aborted is already set (proc.exited handler already fired). Add tests for transcribeAudio and processVoiceControl with custom model parameter.
Move provider/model parsing logic into a testable exported function. Add tests covering defaults, custom providers, mixed config, and multi-slash model IDs.
- Model IDs without '/' now default to xiaomi provider instead of producing an empty model string - Control provider no longer silently inherits ASR provider's baseUrl when they are different providers
OpenRouter has mimo-v2.5 (voice control) but not mimo-v2.5-asr; update test model IDs accordingly.
…o off - OpenRouter models use vendor prefix (xiaomi/mimo-v2.5), update test examples accordingly - Default voice_send_command to false (user must opt in via /voice-send)
Log recording start/stop, transcription requests/results, voice control requests/results, and recorder stderr errors at appropriate levels (info for lifecycle, debug for requests, warn for failures).
…piKey Provider.key is only set from auth store (/connect) or env vars. Config-only providers have apiKey in options. Support both paths so config-defined providers like openrouter work without /connect.
MiMo officially supports both api-key and Authorization Bearer headers. Use the standard Bearer format so OpenRouter and other OpenAI-compatible providers work without special-casing.
…providers
Both MiMo and OpenRouter accept the OpenAI-standard input_audio format
({data: base64, format: "wav"}) and Authorization Bearer header. Use
the standard format unconditionally for voice control, eliminating the
need for provider-specific format detection.
ASR (transcribeAudio) retains the MiMo data-URL format as it only runs
against xiaomi's endpoint.
- Log response body on API failures for easier debugging - Add comment noting xiaomi ASR uses a proprietary data-URL format distinct from the standard OpenAI input_audio schema
- Read baseURL from model.api.url for built-in providers (openrouter) that don't have options.baseURL set explicitly - Split auth errors: "provider not found" (no models) vs "not authenticated" (no apiKey) - Add non-MiMo voice provider docs to README (en/zh) with OpenRouter and internal API examples
84b7c03 to
b7ffba9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
voice.asr_modelandvoice.control_modelto the config schema (provider/modelformat), allowing users to route voice features through any configured providerxiaomi/mimo-v2.5-asrandxiaomi/mimo-v2.5for backward compatibilityvoice-sendto off (users must opt in via/voice-send)model.api.urlfor built-in providers,options.baseURLfor custom providers, and auth metadata for xiaomi (token plan support). Only xiaomi has a hardcoded fallback URL; other providers error if no URL found.{data, format}) for voice control andAuthorization: Bearerfor all providersonErrorfire in streaming error pathsIncorporates #704
This PR absorbs the voice-related changes from #704 (surface recorder errors / device detection), which can no longer be merged cleanly due to a git history restructuring. The workspace dialog i18n portion of #704 is not included as it's unrelated to voice.
Config examples
See README for full examples. Quick summary:
/connectto MiMo/connectto OpenRouter, add"voice": {"control_model": "openrouter/xiaomi/mimo-v2.5"}baseURL,apiKey, andmodels, then pointvoice.*_modelat itTest plan
bun test test/cli/tui/voice.test.ts— 30 tests passbun typecheckpasses across all packages