Skip to content

feat(voice): make ASR/control provider configurable via voice config#900

Merged
yanyihan-xiaomi merged 13 commits into
XiaomiMiMo:mainfrom
yanyihan-xiaomi:feat/voice-input-provider
Jun 17, 2026
Merged

feat(voice): make ASR/control provider configurable via voice config#900
yanyihan-xiaomi merged 13 commits into
XiaomiMiMo:mainfrom
yanyihan-xiaomi:feat/voice-input-provider

Conversation

@yanyihan-xiaomi

@yanyihan-xiaomi yanyihan-xiaomi commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add voice.asr_model and voice.control_model to the config schema (provider/model format), allowing users to route voice features through any configured provider
  • Defaults remain xiaomi/mimo-v2.5-asr and xiaomi/mimo-v2.5 for backward compatibility
  • Primary use case: use mimo-v2.5 voice control mode via relay platforms like OpenRouter or self-hosted API gateways (ASR still uses the MiMo subscription by default)
  • Default voice-send to off (users must opt in via /voice-send)
  • Improved error messages: "provider not found" (not connected / no models) vs "not authenticated" (no apiKey) vs "no baseURL"
  • Resolve baseURL from model.api.url for built-in providers, options.baseURL for custom providers, and auth metadata for xiaomi (token plan support). Only xiaomi has a hardcoded fallback URL; other providers error if no URL found.
  • Use standard OpenAI audio format ({data, format}) for voice control and Authorization: Bearer for all providers
  • Surface actual recorder errors (stderr capture + early exit detection)
  • Guard against double onError fire in streaming error paths

Incorporates #704

This PR absorbs the voice-related changes from #704 (surface recorder errors / device detection), which can no longer be merged cleanly due to a git history restructuring. The workspace dialog i18n portion of #704 is not included as it's unrelated to voice.

Config examples

See README for full examples. Quick summary:

  • MiMo (default): no config needed, /connect to MiMo
  • OpenRouter: /connect to OpenRouter, add "voice": {"control_model": "openrouter/xiaomi/mimo-v2.5"}
  • Internal relay: configure provider with baseURL, apiKey, and models, then point voice.*_model at it

Test plan

  • bun test test/cli/tui/voice.test.ts — 30 tests pass
  • bun typecheck passes across all packages
  • MiMo (xiaomi provider via /connect): ASR + voice control both work
  • OpenRouter (via /connect): voice control works, correct baseURL resolved from model.api.url
  • Internal API (custom provider with models + baseURL + apiKey in config): both ASR and voice control work
  • Error: "provider not found" when provider not connected and no models configured
  • Error: "not authenticated" when provider exists but no apiKey
  • Error: "no baseURL" when custom provider has no URL configured

@yanyihan-xiaomi yanyihan-xiaomi marked this pull request as draft June 17, 2026 08:56
@yanyihan-xiaomi yanyihan-xiaomi force-pushed the feat/voice-input-provider branch 4 times, most recently from c6aaf66 to b24de65 Compare June 17, 2026 11:46
@yanyihan-xiaomi yanyihan-xiaomi self-assigned this Jun 17, 2026
@yanyihan-xiaomi yanyihan-xiaomi marked this pull request as ready for review June 17, 2026 12:00
@yanyihan-xiaomi yanyihan-xiaomi force-pushed the feat/voice-input-provider branch from b24de65 to 84b7c03 Compare June 17, 2026 12:10
Add voice.asr_model and voice.control_model to config schema, allowing
users to point voice input at any configured provider instead of
hardcoding xiaomi. Defaults remain xiaomi/mimo-v2.5-asr and
xiaomi/mimo-v2.5 for backward compatibility.
…ages

Capture stderr from the recorder process and detect early non-zero exits
to show actionable errors (e.g. "no microphone found") instead of a
generic message. From PR XiaomiMiMo#704.
Prevent the reading loop catch from calling onError when aborted is
already set (proc.exited handler already fired). Add tests for
transcribeAudio and processVoiceControl with custom model parameter.
Move provider/model parsing logic into a testable exported function.
Add tests covering defaults, custom providers, mixed config, and
multi-slash model IDs.
- Model IDs without '/' now default to xiaomi provider instead of
  producing an empty model string
- Control provider no longer silently inherits ASR provider's baseUrl
  when they are different providers
OpenRouter has mimo-v2.5 (voice control) but not mimo-v2.5-asr;
update test model IDs accordingly.
…o off

- OpenRouter models use vendor prefix (xiaomi/mimo-v2.5), update test
  examples accordingly
- Default voice_send_command to false (user must opt in via /voice-send)
Log recording start/stop, transcription requests/results, voice control
requests/results, and recorder stderr errors at appropriate levels
(info for lifecycle, debug for requests, warn for failures).
…piKey

Provider.key is only set from auth store (/connect) or env vars.
Config-only providers have apiKey in options. Support both paths so
config-defined providers like openrouter work without /connect.
MiMo officially supports both api-key and Authorization Bearer headers.
Use the standard Bearer format so OpenRouter and other OpenAI-compatible
providers work without special-casing.
…providers

Both MiMo and OpenRouter accept the OpenAI-standard input_audio format
({data: base64, format: "wav"}) and Authorization Bearer header. Use
the standard format unconditionally for voice control, eliminating the
need for provider-specific format detection.

ASR (transcribeAudio) retains the MiMo data-URL format as it only runs
against xiaomi's endpoint.
- Log response body on API failures for easier debugging
- Add comment noting xiaomi ASR uses a proprietary data-URL format
  distinct from the standard OpenAI input_audio schema
- Read baseURL from model.api.url for built-in providers (openrouter)
  that don't have options.baseURL set explicitly
- Split auth errors: "provider not found" (no models) vs "not
  authenticated" (no apiKey)
- Add non-MiMo voice provider docs to README (en/zh) with OpenRouter
  and internal API examples
@yanyihan-xiaomi yanyihan-xiaomi force-pushed the feat/voice-input-provider branch from 84b7c03 to b7ffba9 Compare June 17, 2026 14:21
@yanyihan-xiaomi yanyihan-xiaomi merged commit 98c283a into XiaomiMiMo:main Jun 17, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant