Skip to content

Stub optional streaming test deps on Windows#7800

Open
tianmind-studio wants to merge 3 commits into
BasedHardware:mainfrom
tianmind-studio:codex/windows-dg-streaming-optional-stubs
Open

Stub optional streaming test deps on Windows#7800
tianmind-studio wants to merge 3 commits into
BasedHardware:mainfrom
tianmind-studio:codex/windows-dg-streaming-optional-stubs

Conversation

@tianmind-studio

@tianmind-studio tianmind-studio commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Stub utils.stt.speaker_embedding in the Deepgram start-guard and streaming backoff unit tests so they do not require SciPy in a minimal backend test environment.
  • Stub the narrow utils.stt.vad surface needed by the streaming backoff death-reason tests so they can import GatedDeepgramSocket without installing onnxruntime.
  • Stub optional Deepgram/websockets/speaker-embedding imports in test_parakeet_diarization.py, including a local cosine distance helper, so the diarization tests stay focused on speaker clustering and fallback behavior.
  • Stub the narrow SciPy cdist and torch buffer/model-loading surfaces in test_parakeet_stream_session.py so Parakeet stream-session tests run in a lightweight Windows backend venv without native ML dependencies.
  • Keep production streaming, speaker embedding, VAD, Parakeet, and audio-session code unchanged.

Why

On this Windows backend venv, several streaming-related unit tests failed during collection before reaching their assertions because imports pulled in optional native/transitive dependencies:

  • scipy through utils.stt.speaker_embedding and Parakeet stream-session code
  • onnxruntime through utils.stt.vad when the backoff tests import GatedDeepgramSocket
  • websockets/Deepgram client modules through utils.stt.streaming in Parakeet diarization tests
  • torch model-loading helpers before the stream-session tests can use their manual VAD mock

These tests exercise Deepgram connection retry/start/death-reason behavior and Parakeet diarization/session control flow, not hosted speaker embedding, ONNX VAD inference, SciPy distance kernels, or torch model loading. Local test stubs keep the unit tests focused and runnable in lightweight Windows environments.

Testing

  • python -m pytest tests\unit\test_dg_start_guard.py -q -> 2 passed
  • python -m pytest tests\unit\test_streaming_deepgram_backoff.py -q -> 79 passed
  • python -m pytest tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py -q -> 81 passed
  • python -m pytest tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py -q -> 19 passed
  • python -m pytest tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py -q -> 100 passed
  • python -m black --line-length 120 --skip-string-normalization tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py --check
  • python -m py_compile tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py
  • git diff --check -- backend/tests/unit/test_dg_start_guard.py backend/tests/unit/test_streaming_deepgram_backoff.py backend/tests/unit/test_parakeet_diarization.py backend/tests/unit/test_parakeet_stream_session.py

@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds lightweight sys.modules stubs for utils.stt.speaker_embedding and utils.stt.vad so that two Deepgram unit test files can be collected and run on Windows without scipy or onnxruntime installed. Production code is untouched.

  • test_dg_start_guard.py gains a ModuleType stub for speaker_embedding covering the three names streaming.py imports (SPEAKER_MATCH_THRESHOLD, async_extract_embedding_from_bytes, compare_embeddings).
  • test_streaming_deepgram_backoff.py gains both the same speaker_embedding stub (via setdefault) and a vad stub covering all four names vad_gate.py imports (_get_ort_session, make_fresh_state, run_vad_window, VAD_WINDOW_SAMPLES), enabling GatedDeepgramSocket death-reason tests to import without onnxruntime.

Confidence Score: 4/5

Safe to merge — test-only change with no production code modified; both stubs cover exactly the symbols imported at collection time and the combined run passes 81 tests.

The stub attributes match what streaming.py and vad_gate.py actually import from the real modules. The minor inconsistency is that test_dg_start_guard.py force-assigns to sys.modules rather than using setdefault like its companion file, which could silently replace the registered stub object mid-session when files are collected in a specific order. This is harmless today because the stubs are functionally identical and streaming.py's from-import bindings are already resolved, but it is a latent maintenance trap if the stubs ever diverge.

backend/tests/unit/test_dg_start_guard.py — the hard sys.modules assignment on line 56 is the only point worth a second look.

Important Files Changed

Filename Overview
backend/tests/unit/test_dg_start_guard.py Adds a speaker_embedding stub (ModuleType with SPEAKER_MATCH_THRESHOLD, AsyncMock, MagicMock) to allow collection without scipy; uses a hard sys.modules assignment rather than setdefault, inconsistent with the companion test file's strategy.
backend/tests/unit/test_streaming_deepgram_backoff.py Adds both speaker_embedding and vad stubs using setdefault; all four names imported by vad_gate.py (_get_ort_session, make_fresh_state, run_vad_window, VAD_WINDOW_SAMPLES) and the three names imported by streaming.py are correctly covered.

Sequence Diagram

sequenceDiagram
    participant pytest
    participant dg_start_guard as test_dg_start_guard.py
    participant backoff as test_streaming_deepgram_backoff.py
    participant sys_modules as sys.modules
    participant streaming as utils.stt.streaming
    participant vad_gate as utils.stt.vad_gate

    note over pytest: Collection phase

    pytest->>dg_start_guard: collect (module-level code)
    dg_start_guard->>sys_modules: setdefault(deepgram, database, ...) stubs
    dg_start_guard->>sys_modules: force-write speaker_embedding stub
    dg_start_guard->>streaming: import connect_to_deepgram
    streaming->>sys_modules: resolve speaker_embedding stub symbols

    pytest->>backoff: collect (module-level code)
    backoff->>sys_modules: setdefault(deepgram, database, ...) stubs
    backoff->>sys_modules: setdefault speaker_embedding stub (no-op if already set)
    backoff->>sys_modules: setdefault vad stub
    backoff->>streaming: import (already cached)
    backoff->>vad_gate: import GatedDeepgramSocket
    vad_gate->>sys_modules: resolve vad stub symbols

    note over pytest: Test execution phase
    pytest->>dg_start_guard: run 2 tests
    pytest->>backoff: run 79 tests
Loading

Reviews (1): Last reviewed commit: "test(backend): stub optional streaming d..." | Re-trigger Greptile

Comment on lines +52 to +56
_speaker_embedding = ModuleType('utils.stt.speaker_embedding')
_speaker_embedding.SPEAKER_MATCH_THRESHOLD = 0.45
_speaker_embedding.async_extract_embedding_from_bytes = AsyncMock(return_value=None)
_speaker_embedding.compare_embeddings = MagicMock(return_value=0.0)
sys.modules['utils.stt.speaker_embedding'] = _speaker_embedding

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hard overwrite vs setdefault inconsistency

test_streaming_deepgram_backoff.py uses sys.modules.setdefault(...) for this same stub while this file uses a bare assignment. If pytest collects the backoff file first (installing its stub), this line then replaces sys.modules['utils.stt.speaker_embedding'] with a different ModuleType object — a silent no-op in practice because streaming.py's from ... import bindings are already resolved to the earlier stub's objects, but it can confuse future maintainers and any tool that inspects sys.modules after collection. Using setdefault here too would be consistent with the existing pattern in the companion file and is also less surprising given the NOTE about avoiding pollution already present in this file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 1091d01d5 by switching this stub registration to sys.modules.setdefault(...).

I also applied the same pattern to the new Parakeet diarization test stub so the PR consistently avoids replacing an existing utils.stt.speaker_embedding module during collection.

Revalidated on the Windows backend venv:

  • python -m pytest tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py -q -> 100 passed
  • python -m black --line-length 120 --skip-string-normalization tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py --check
  • python -m py_compile tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py

Copy link
Copy Markdown
Contributor Author

Expanded this PR with the same Windows lightweight-test-environment fix pattern for the Parakeet tests:

  • test_parakeet_diarization.py now stubs optional Deepgram/websockets/speaker-embedding imports and uses a local cosine distance helper so it can exercise diarization clustering/fallback behavior without hosted embedding/native deps.
  • test_parakeet_stream_session.py now stubs the narrow SciPy cdist and torch buffer/model-loading surfaces needed by the test path, while still using the test's manual VAD mock.

Local validation on the Windows backend venv:

  • python -m pytest tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py -q -> 100 passed
  • python -m black --line-length 120 --skip-string-normalization tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py --check
  • python -m py_compile tests\unit\test_dg_start_guard.py tests\unit\test_streaming_deepgram_backoff.py tests\unit\test_parakeet_diarization.py tests\unit\test_parakeet_stream_session.py
  • git diff --check -- backend/tests/unit/test_dg_start_guard.py backend/tests/unit/test_streaming_deepgram_backoff.py backend/tests/unit/test_parakeet_diarization.py backend/tests/unit/test_parakeet_stream_session.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant