Skip to content

feat: Add SenseVoice as alternative STT backend — 5x faster, multilingual, emotion detection #7562

@LauraGPT

Description

@LauraGPT

Motivation

Omi currently relies on Whisper/Deepgram for speech-to-text. SenseVoice (8.3K+ stars) offers significant advantages for a wearable AI device:

  • 5x faster than Whisper large-v3 (non-autoregressive architecture, 234M params)
  • 50+ languages with automatic language detection
  • Built-in emotion detection (happy, sad, angry, neutral) — valuable for an AI companion
  • Audio event detection (laughter, applause, music, crying) — enriches memory context
  • Low latency — critical for real-time wearable experiences

Why this matters for Omi

  1. Battery life: Faster inference = less compute = longer battery on wearable hardware
  2. Richer context: Emotion + audio events give the AI companion deeper understanding of conversations
  3. Self-hosted option: Users concerned about privacy can run SenseVoice locally (works on CPU)
  4. Cost reduction: For cloud processing, faster inference means lower API costs

Integration

SenseVoice provides an OpenAI-compatible API server:

pip install funasr
funasr-server --device cuda
# Drop-in replacement for Whisper API
import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1")
result = client.audio.transcriptions.create(
    model="iic/SenseVoiceSmall",
    file=open("audio.wav", "rb"),
)
print(result.text)  # transcription with emotion tags

Or use the Python API directly:

from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall")
result = model.generate(input="audio.wav")
# Returns: text + language + emotion + audio events

Benchmarks

Model Params Speed (RTF) CER (AISHELL-1) Languages
Whisper-large-v3 1.5B 1.0x 8.4% 99
SenseVoice-Small 234M 5x 3.6% 50+

References

Happy to help with integration if there's interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    p3Priority: Backlog (score <14)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions