Stub optional streaming test deps on Windows by tianmind-studio · Pull Request #7800 · BasedHardware/omi

tianmind-studio · 2026-06-10T13:00:30Z

Summary

Stub utils.stt.speaker_embedding in the Deepgram start-guard and streaming backoff unit tests so they do not require SciPy in a minimal backend test environment.
Stub the narrow utils.stt.vad surface needed by the streaming backoff death-reason tests so they can import GatedDeepgramSocket without installing onnxruntime.
Stub optional Deepgram/websockets/speaker-embedding imports in test_parakeet_diarization.py, including a local cosine distance helper, so the diarization tests stay focused on speaker clustering and fallback behavior.
Stub the narrow SciPy cdist and torch buffer/model-loading surfaces in test_parakeet_stream_session.py so Parakeet stream-session tests run in a lightweight Windows backend venv without native ML dependencies.
Keep production streaming, speaker embedding, VAD, Parakeet, and audio-session code unchanged.

Why

On this Windows backend venv, several streaming-related unit tests failed during collection before reaching their assertions because imports pulled in optional native/transitive dependencies:

scipy through utils.stt.speaker_embedding and Parakeet stream-session code
onnxruntime through utils.stt.vad when the backoff tests import GatedDeepgramSocket
websockets/Deepgram client modules through utils.stt.streaming in Parakeet diarization tests
torch model-loading helpers before the stream-session tests can use their manual VAD mock

These tests exercise Deepgram connection retry/start/death-reason behavior and Parakeet diarization/session control flow, not hosted speaker embedding, ONNX VAD inference, SciPy distance kernels, or torch model loading. Local test stubs keep the unit tests focused and runnable in lightweight Windows environments.

Testing

python -m pytest tests\unit\test_dg_start_guard.py -q -> 2 passed
python -m pytest tests\unit\test_streaming_deepgram_backoff.py -q -> 79 passed
python -m pytest tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py -q -> 81 passed
python -m pytest tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py -q -> 19 passed
python -m pytest tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py -q -> 100 passed
python -m black --line-length 120 --skip-string-normalization tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py --check
python -m py_compile tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py
git diff --check -- backend/tests/unit/test_dg_start_guard.py backend/tests/unit/test_streaming_deepgram_backoff.py backend/tests/unit/test_parakeet_diarization.py backend/tests/unit/test_parakeet_stream_session.py

greptile-apps · 2026-06-10T13:04:32Z

Greptile Summary

This PR adds lightweight sys.modules stubs for utils.stt.speaker_embedding and utils.stt.vad so that two Deepgram unit test files can be collected and run on Windows without scipy or onnxruntime installed. Production code is untouched.

test_dg_start_guard.py gains a ModuleType stub for speaker_embedding covering the three names streaming.py imports (SPEAKER_MATCH_THRESHOLD, async_extract_embedding_from_bytes, compare_embeddings).
test_streaming_deepgram_backoff.py gains both the same speaker_embedding stub (via setdefault) and a vad stub covering all four names vad_gate.py imports (_get_ort_session, make_fresh_state, run_vad_window, VAD_WINDOW_SAMPLES), enabling GatedDeepgramSocket death-reason tests to import without onnxruntime.

Confidence Score: 4/5

Safe to merge — test-only change with no production code modified; both stubs cover exactly the symbols imported at collection time and the combined run passes 81 tests.

The stub attributes match what streaming.py and vad_gate.py actually import from the real modules. The minor inconsistency is that test_dg_start_guard.py force-assigns to sys.modules rather than using setdefault like its companion file, which could silently replace the registered stub object mid-session when files are collected in a specific order. This is harmless today because the stubs are functionally identical and streaming.py's from-import bindings are already resolved, but it is a latent maintenance trap if the stubs ever diverge.

backend/tests/unit/test_dg_start_guard.py — the hard sys.modules assignment on line 56 is the only point worth a second look.

Important Files Changed

Filename	Overview
backend/tests/unit/test_dg_start_guard.py	Adds a speaker_embedding stub (ModuleType with SPEAKER_MATCH_THRESHOLD, AsyncMock, MagicMock) to allow collection without scipy; uses a hard sys.modules assignment rather than setdefault, inconsistent with the companion test file's strategy.
backend/tests/unit/test_streaming_deepgram_backoff.py	Adds both speaker_embedding and vad stubs using setdefault; all four names imported by vad_gate.py (_get_ort_session, make_fresh_state, run_vad_window, VAD_WINDOW_SAMPLES) and the three names imported by streaming.py are correctly covered.

Sequence Diagram

sequenceDiagram
    participant pytest
    participant dg_start_guard as test_dg_start_guard.py
    participant backoff as test_streaming_deepgram_backoff.py
    participant sys_modules as sys.modules
    participant streaming as utils.stt.streaming
    participant vad_gate as utils.stt.vad_gate

    note over pytest: Collection phase

    pytest->>dg_start_guard: collect (module-level code)
    dg_start_guard->>sys_modules: setdefault(deepgram, database, ...) stubs
    dg_start_guard->>sys_modules: force-write speaker_embedding stub
    dg_start_guard->>streaming: import connect_to_deepgram
    streaming->>sys_modules: resolve speaker_embedding stub symbols

    pytest->>backoff: collect (module-level code)
    backoff->>sys_modules: setdefault(deepgram, database, ...) stubs
    backoff->>sys_modules: setdefault speaker_embedding stub (no-op if already set)
    backoff->>sys_modules: setdefault vad stub
    backoff->>streaming: import (already cached)
    backoff->>vad_gate: import GatedDeepgramSocket
    vad_gate->>sys_modules: resolve vad stub symbols

    note over pytest: Test execution phase
    pytest->>dg_start_guard: run 2 tests
    pytest->>backoff: run 79 tests

_{Reviews (1): Last reviewed commit: "test(backend): stub optional streaming d..." | Re-trigger Greptile}

greptile-apps · 2026-06-10T13:04:36Z

+_speaker_embedding = ModuleType('utils.stt.speaker_embedding')
+_speaker_embedding.SPEAKER_MATCH_THRESHOLD = 0.45
+_speaker_embedding.async_extract_embedding_from_bytes = AsyncMock(return_value=None)
+_speaker_embedding.compare_embeddings = MagicMock(return_value=0.0)
+sys.modules['utils.stt.speaker_embedding'] = _speaker_embedding


Hard overwrite vs setdefault inconsistency

test_streaming_deepgram_backoff.py uses sys.modules.setdefault(...) for this same stub while this file uses a bare assignment. If pytest collects the backoff file first (installing its stub), this line then replaces sys.modules['utils.stt.speaker_embedding'] with a different ModuleType object — a silent no-op in practice because streaming.py's from ... import bindings are already resolved to the earlier stub's objects, but it can confuse future maintainers and any tool that inspects sys.modules after collection. Using setdefault here too would be consistent with the existing pattern in the companion file and is also less surprising given the NOTE about avoiding pollution already present in this file.

Addressed in 1091d01d5 by switching this stub registration to sys.modules.setdefault(...).

I also applied the same pattern to the new Parakeet diarization test stub so the PR consistently avoids replacing an existing utils.stt.speaker_embedding module during collection.

Revalidated on the Windows backend venv:

python -m pytest tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py -q -> 100 passed

python -m black --line-length 120 --skip-string-normalization tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py --check

python -m py_compile tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py

tianmind-studio · 2026-06-10T15:17:34Z

Expanded this PR with the same Windows lightweight-test-environment fix pattern for the Parakeet tests:

test_parakeet_diarization.py now stubs optional Deepgram/websockets/speaker-embedding imports and uses a local cosine distance helper so it can exercise diarization clustering/fallback behavior without hosted embedding/native deps.
test_parakeet_stream_session.py now stubs the narrow SciPy cdist and torch buffer/model-loading surfaces needed by the test path, while still using the test's manual VAD mock.

Local validation on the Windows backend venv:

python -m pytest tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py -q -> 100 passed
python -m black --line-length 120 --skip-string-normalization tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py --check
python -m py_compile tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py
git diff --check -- backend/tests/unit/test_dg_start_guard.py backend/tests/unit/test_streaming_deepgram_backoff.py backend/tests/unit/test_parakeet_diarization.py backend/tests/unit/test_parakeet_stream_session.py

test(backend): stub optional streaming deps on Windows

5fb8c9a

greptile-apps Bot reviewed Jun 10, 2026

View reviewed changes

test(backend): stub parakeet optional deps on Windows

92414a8

test(backend): avoid replacing streaming test stubs

1091d01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stub optional streaming test deps on Windows#7800

Stub optional streaming test deps on Windows#7800
tianmind-studio wants to merge 3 commits into
BasedHardware:mainfrom
tianmind-studio:codex/windows-dg-streaming-optional-stubs

tianmind-studio commented Jun 10, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Uh oh!

greptile-apps Bot Jun 10, 2026

Uh oh!

tianmind-studio Jun 10, 2026

Uh oh!

tianmind-studio commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tianmind-studio commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Testing

Uh oh!

greptile-apps Bot commented Jun 10, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

tianmind-studio Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

tianmind-studio commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tianmind-studio commented Jun 10, 2026 •

edited

Loading