On-device speech SDK for Android and embedded Linux, powered by ONNX Runtime and speech-core.
Speech recognition (114 languages), text-to-speech, voice activity detection, and noise cancellation — all running locally. No cloud APIs, no data leaves the device.
Models · speech-swift (Apple counterpart) · speech-core (pipeline engine)
| Platform | API | Acceleration | Directory |
|---|---|---|---|
| Android | Kotlin (SpeechPipeline) |
NNAPI (Snapdragon, Exynos, Tensor) | sdk/ |
| Embedded Linux | C (speech.h) |
QNN (Hexagon DSP) | linux/ |
| Model | Task | INT8 Size | Languages |
|---|---|---|---|
| Parakeet TDT v3 | Speech recognition | 891 MB | 114 |
| Kokoro 82M | Text-to-speech | 330 MB | English |
| Silero VAD v5 | Voice activity detection | 2 MB | Any |
| DeepFilterNet3 | Noise cancellation | ~8 MB | Any |
Models are downloaded automatically on first launch (Android) or placed manually (Linux).
dependencies {
implementation("audio.soniqo:speech:0.0.1")
}val modelDir = ModelManager.ensureModels(context)
val pipeline = SpeechPipeline(
SpeechConfig(modelDir = modelDir, useNnapi = true)
)
pipeline.events.collect { event ->
when (event) {
is SpeechEvent.TranscriptionCompleted -> println(event.text)
is SpeechEvent.ResponseDone -> pipeline.resumeListening()
else -> {}
}
}
pipeline.start()
// Feed 16kHz mono float32 PCM from microphone
pipeline.pushAudio(samples)git clone --recursive https://github.com/soniqo/speech-android.git
cd speech-android
./setup.sh
./gradlew :app:assembleDebug
./gradlew :sdk:connectedDebugAndroidTest # 18 e2e testsMinimal C API for automotive and embedded platforms. See linux/README.md for full documentation.
#include <speech.h>
void on_event(const speech_event_t* event, void* ctx) {
if (event->type == SPEECH_EVENT_TRANSCRIPTION)
printf("%s\n", event->text);
}
speech_config_t cfg = speech_config_default();
cfg.model_dir = "/opt/speech/models";
cfg.use_qnn = true; // Hexagon DSP acceleration
speech_pipeline_t p = speech_create(cfg, on_event, NULL);
speech_start(p);
speech_push_audio(p, pcm_samples, 512);cd linux && ./setup_linux.sh
cmake -B build -DORT_DIR=../ort-linux
cmake --build build
./build/speech_demo --model-dir /path/to/modelssource /opt/poky/environment-setup-aarch64-poky-linux
cmake -B build -DCMAKE_TOOLCHAIN_FILE=toolchain-aarch64.cmake -DORT_DIR=...
cmake --build buildIdle → Listening → Transcribing → Speaking → Idle
↑ |
└─── resumeListening() ───┘
Barge-in supported: speaking during TTS playback interrupts and starts a new transcription.
┌──────────────────────────────────────────────┐
│ Android: SpeechPipeline (Kotlin/JNI) │
│ Linux: speech.h (C API) │
└──────────────────┬───────────────────────────┘
│
┌──────────────────┴───────────────────────────┐
│ speech-core (C++ submodule) │
│ Turn detection · Interruptions · Context │
└──┬────────┬────────┬────────┬────────────────┘
│ │ │ │ vtables
┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌─┴────────┐
│ VAD │ │ STT │ │ TTS │ │ Enhancer │
│Silero│ │Para-│ │Koko-│ │DeepFilter│
│ │ │keet │ │ro │ │Net3 │
└──┬──┘ └──┬──┘ └──┬──┘ └─┬────────┘
└────────┴────────┴────────┘
ONNX Runtime (CPU / NNAPI / QNN)
| Platform | Chipset | Acceleration |
|---|---|---|
| Android | Snapdragon 8 Gen 1+ | NNAPI → Hexagon NPU |
| Android | Samsung Exynos 2200+ | NNAPI → Samsung NPU |
| Android | Google Tensor G2+ | NNAPI → Google TPU |
| Automotive | SA8295P / SA8255P | QNN → Hexagon DSP |
| Any | CPU fallback | XNNPACK |
| Repository | Platform |
|---|---|
| speech-swift | Apple (macOS, iOS) — MLX + CoreML |
| speech-core | Cross-platform C++ pipeline engine |
| speech-android | Android + embedded Linux — ONNX Runtime |
Apache 2.0