feat(stt): add NVIDIA Canary STT engine support by coleleavitt · Pull Request #360 · mkiol/dsnote

coleleavitt · 2026-01-07T07:44:32Z

Summary

Add support for NVIDIA's Canary speech-to-text models via NeMo toolkit.

Models Added

Model	ID	WER	Speed	Notes
Canary 1B v2	`multilang_canary_1b_v2`	4.89%	630x RTF	Default, 5x faster than Whisper
Canary Qwen 2.5B	`multilang_canary_qwen`	Better	Slower	Higher accuracy variant

Features

GPU acceleration (CUDA/ROCm) via NeMo
Automatic model download from HuggingFace
Translation support (s2t_translation task)
Punctuation restoration
Follows existing fasterwhisper_engine patterns

Files Changed

New: src/canary_engine.hpp, src/canary_engine.cpp
Modified: models_manager.h/cpp, speech_service.cpp, CMakeLists.txt, config/models.json

Requirements

pip install nemo_toolkit[asr]

Why Canary?

Per the Open ASR Leaderboard:

Canary 1B v2 achieves 4.89% WER (better than Whisper Large V3's 4.91%)
5x faster inference (630x vs 126x real-time factor)
Native NVIDIA optimization for modern GPUs

Testing

Build tested on Linux with Qt dev tools
Runtime tested with NeMo toolkit installed
GPU acceleration verified

Add support for NVIDIA's Canary speech-to-text models via NeMo toolkit: - Canary 1B v2: 4.89% WER, 630x RTF (5x faster than Whisper) - Canary Qwen 2.5B: Higher accuracy variant for demanding use cases Both models use NeMo's EncDecMultiTaskModel architecture with automatic model download via HuggingFace. Supports GPU acceleration (CUDA/ROCm), translation (s2t_translation), and punctuation restoration. New files: - src/canary_engine.hpp: Engine class definition - src/canary_engine.cpp: NeMo Python integration via py_executor Modified: - models_manager.h/cpp: Add stt_canary engine type and feature flags - speech_service.cpp: Engine instantiation and type checking - CMakeLists.txt: Add canary_engine source files - config/models.json: Add both Canary model entries Requires: pip install nemo_toolkit[asr]

Check for nemo.collections.asr module availability at startup. This enables dsnote to automatically detect if NeMo is installed and show/hide Canary models accordingly in the UI. - py_tools.hpp: Add nemo_asr to libs_availability_t - py_tools.cpp: Add nemo.collections.asr import check - speech_service.cpp: Map nemo_asr availability to stt_canary

- Update CMakeLists.txt to use Qt6 instead of Qt5 - Update cmake/*.cmake files for Qt6 compatibility - Replace deprecated Qt5 APIs with Qt6 equivalents: - QRegExp -> QRegularExpression - QX11Info -> QNativeInterface::QX11Application - QMediaPlayer::State -> QMediaPlayer::PlaybackState - QMediaPlayer::stateChanged -> playbackStateChanged - setMedia(QMediaContent) -> setSource(QUrl) - QAudioInput (recording) -> QAudioSource - QAudioDeviceInfo -> QAudioDevice + QMediaDevices - QAudioFormat::setSampleSize/setCodec -> setSampleFormat - QNetworkRequest::FollowRedirectsAttribute -> RedirectPolicyAttribute - Remove Qt::AA_EnableHighDpiScaling (default in Qt6) - Remove QTextCodec usage - Remove QQuickStyle::availableStyles() (not in Qt6) - Fix GCC 15 type strictness (std::clamp/max int vs qsizetype) - Update qhotkey external project to build with Qt6

mkiol · 2026-01-20T19:27:29Z

Hi. Sorry for late reply. I'm a bit busy at the moment and need a few more days to look at the code and test it. Thank you for your understanding.

Something I can comment on immediately:

feat: migrate from Qt5 to Qt6

It is too radical change for now. I want you to revert it. The key problem is that I need to maintain both Qt5 and Qt6 compatible code base as the SFOS version does not run on Qt6. The work to move to the Qt6 is already started in Qt6 branch and I plan is to complete it for the next version.

mkiol

It's very impressive and promising. I haven't been able to test it yet, as it's not finished. I really would like to merge it as soon as it's ready.

The most important to-dos:

revert Qt6 changes
fix model configuration (models.json) to make new engine usable - I can help with this

mkiol · 2026-02-01T19:05:22Z

cmake/dbus_api.cmake


 configure_file(${dbus_dir}/dsnote.xml.in ${dbus_dsnote_interface_file})

-find_package(Qt5 COMPONENTS DBus REQUIRED)


Could you limit the scope of this PR to the new STT engine only? Migrating from Qt5 to Qt6 is a completely different task that requires a separate PR. If you agree, please revert everything related to Qt6. Thank you!

mkiol · 2026-02-01T19:11:40Z

config/models.json

+        {
+            "name": "Multilingual (Canary 1B v2)",
+            "model_id": "multilang_canary_1b_v2",
+            "engine": "stt_canary",
+            "lang_id": "multilang",
+            "info": "NVIDIA Canary 1B v2 - 4.89% WER, 5x faster than Whisper (RTFx 630), best accuracy-per-watt",
+            "options": "ti",
+            "score": 5,
+            "features": ["high_quality", "medium_processing", "stt_punctuation"],
+            "default_for_lang": true,
+            "hidden": false
+        },
+        {
+            "name": "Multilingual (Canary Qwen 2.5B)",
+            "model_id": "multilang_canary_qwen",
+            "engine": "stt_canary",
+            "lang_id": "multilang",
+            "info": "NVIDIA Canary Qwen 2.5B - Larger model for maximum accuracy",
+            "options": "ti",
+            "score": 4,
+            "features": ["high_quality", "slow_processing", "stt_punctuation"],
+            "hidden": false
+        },


This json file contains three objects: langs, models and packs. The model definition must be included in models and not in packs. Also urls must be specified. I can help you with this. Just tell me where I can download the model files.

coleleavitt added 3 commits January 7, 2026 00:43

mkiol reviewed Feb 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stt): add NVIDIA Canary STT engine support#360

feat(stt): add NVIDIA Canary STT engine support#360
coleleavitt wants to merge 3 commits intomkiol:mainfrom
coleleavitt:feat/nvidia-canary-stt-engine

coleleavitt commented Jan 7, 2026

Uh oh!

mkiol commented Jan 20, 2026

Uh oh!

mkiol left a comment

Uh oh!

mkiol Feb 1, 2026

Uh oh!

mkiol Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		configure_file(${dbus_dir}/dsnote.xml.in ${dbus_dsnote_interface_file})

		find_package(Qt5 COMPONENTS DBus REQUIRED)

Conversation

coleleavitt commented Jan 7, 2026

Summary

Models Added

Features

Files Changed

Requirements

Why Canary?

Testing

Uh oh!

mkiol commented Jan 20, 2026

Uh oh!

mkiol left a comment

Choose a reason for hiding this comment

Uh oh!

mkiol Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

mkiol Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants