Skip to content

Conversation

@yujonglee
Copy link
Contributor

@yujonglee yujonglee commented Dec 5, 2025

refactor(owhisper-client): extract shared utilities for adapters

Summary

This PR reduces code duplication in the STT adapter implementations by extracting common patterns into shared utility modules:

  • adapter/audio.rs: Shared audio decoding utilities (decode_audio_to_linear16, decode_audio_to_bytes, mix_to_mono) that were previously duplicated across deepgram, argmax, and assemblyai batch adapters
  • adapter/http.rs: Shared HTTP error handling (ensure_success, parse_json_response, parse_provider_json) to standardize response validation
  • parsing.rs: Added TranscriptResponseBuilder and build_transcript_response helper for constructing StreamResponse objects
  • test_utils.rs: Added define_realtime_e2e_tests! macro to reduce boilerplate in E2E tests

The deepgram, argmax, and assemblyai batch adapters have been refactored to use these shared utilities.

Updates since last revision

  • Fixed audio resampling: The original duplicated code passed the source sample_rate to resample_audio(), which was a no-op. Now it properly resamples to 16kHz (TARGET_SAMPLE_RATE = 16000), which is the standard rate expected by STT services.

Review & Testing Checklist for Human

  • Verify 16kHz resampling is correct: The shared decode_audio_to_linear16 now resamples all audio to 16kHz. The original code was NOT resampling (passing source rate was a no-op). Confirm this behavioral change is intended and doesn't break STT providers that expect different sample rates.
  • Run E2E tests with real STT providers: Use infisical run --env=dev --projectId=87dad7b5-72a6-4791-9228-b3b86b169db1 --path="/stt" -- cargo test --ignored to verify deepgram, argmax, and assemblyai batch transcription still works with the 16kHz resampling
  • Verify stereo-to-mono mixing is unchanged: The mix_to_mono function was extracted from duplicated code - confirm the mixing logic produces identical results
  • Decide on unused utilities: TranscriptResponseBuilder, build_transcript_response, parse_json_response, and parse_provider_json are added but not yet used (compiler warnings confirm). Decide if these should be removed or kept for follow-up work

Notes

  • The new utilities in parsing.rs and some in http.rs are not yet adopted by adapters - they were added as infrastructure for future refactoring
  • JSON parse errors in http.rs are mapped to Error::AudioProcessing as a workaround to avoid adding a new error variant - this is semantically imprecise
  • The E2E test macro was added but existing tests weren't migrated to use it yet

Link to Devin run: https://app.devin.ai/sessions/127bbb6142c340ffba9fedd68f22ed9c
Requested by: yujonglee (@yujonglee)

- Add shared audio decoding utilities in adapter/audio.rs
- Add shared HTTP error handling utilities in adapter/http.rs
- Add TranscriptResponseBuilder and build_transcript_response helper in parsing.rs
- Add define_realtime_e2e_tests! macro in test_utils.rs
- Refactor deepgram/batch.rs to use shared audio and HTTP utilities
- Refactor argmax/batch.rs to use shared audio and HTTP utilities
- Refactor assemblyai/batch.rs to use shared audio and HTTP utilities

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@netlify
Copy link

netlify bot commented Dec 5, 2025

Deploy Preview for hyprnote ready!

Name Link
🔨 Latest commit 23ccca4
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote/deploys/69328e460628160008331159
😎 Deploy Preview https://deploy-preview-2130--hyprnote.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

📝 Walkthrough

Walkthrough

Shared HTTP response handling and audio decoding were extracted into new adapter/http.rs and adapter/audio.rs modules. Adapter batch implementations (argmax, assemblyai, deepgram) were updated to use these helpers. A transcript response builder and a realtime test-generation macro were added; adapter module now exposes audio and http.

Changes

Cohort / File(s) Summary
HTTP helpers (new)
src/adapter/http.rs
Added ensure_success(Response) -> Result<Response, Error>, parse_json_response<T>(Response, provider) -> Result<T, Error>, and parse_provider_json<T>(raw, provider) -> Option<T> with tests and centralized status/body error handling.
Audio decoding (new)
src/adapter/audio.rs
Added async decode_audio_to_linear16(PathBuf) -> Result<(Bytes, u32), Error> and decode_audio_to_bytes(PathBuf) -> Result<Bytes, Error>, plus mix_to_mono and tests; uses spawn_blocking, resampling to 16k, and i16 encoding.
Adapter batch refactors
src/adapter/argmax/batch.rs, src/adapter/assemblyai/batch.rs, src/adapter/deepgram/batch.rs
Replaced per-file HTTP status checks with ensure_success, removed local audio decode implementations and import decode_audio_to_*; JSON deserialization moved after success check.
Adapter module surface
src/adapter/mod.rs
Exposed new public modules: audio and http.
Transcript parsing API
src/adapter/parsing.rs
Added build_transcript_response(...) -> StreamResponse and TranscriptResponseBuilder fluent API; updated imports to include Alternatives, Channel, Metadata, StreamResponse, Word.
Test utilities
src/test_utils.rs
Added #[macro_export] define_realtime_e2e_tests! macro (two variants) to generate tokio-based realtime E2E test templates.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

  • Review audio resampling/mixing/encoding and error mapping in src/adapter/audio.rs.
  • Verify ensure_success and parse_json_response handle edge cases (non-UTF8 bodies, large bodies) and used consistently across batch adapters.
  • Confirm adapter refactors preserved prior control flow and error types in argmax, assemblyai, and deepgram batch modules.
  • Check TranscriptResponseBuilder output matches existing contract and macro syntax in test_utils.rs.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.58% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor(owhisper-client): extract shared utilities for adapters' directly and concisely describes the main objective of the changeset—extracting duplicate code into shared utility modules across STT adapters.
Description check ✅ Passed The description is comprehensive and relates directly to the changeset, detailing the new utility modules (audio.rs, http.rs, parsing.rs, test_utils.rs), refactored adapters, and behavioral changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1764917644-adapter-refactoring

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ad5a95d and 23ccca4.

📒 Files selected for processing (1)
  • owhisper/owhisper-client/src/adapter/audio.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • owhisper/owhisper-client/src/adapter/audio.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: fmt
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: Devin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@netlify
Copy link

netlify bot commented Dec 5, 2025

Deploy Preview for hyprnote-storybook ready!

Name Link
🔨 Latest commit 23ccca4
🔍 Latest deploy log https://app.netlify.com/projects/hyprnote-storybook/deploys/69328e469164fd0008da37c3
😎 Deploy Preview https://deploy-preview-2130--hyprnote-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
owhisper/owhisper-client/src/test_utils.rs (1)

11-78: Verify that the macro is intended for single invocation per test module.

Both arms of this macro generate test functions with identical names (test_build_single and test_build_dual). If the macro is invoked multiple times within the same module, it will cause duplicate definition errors. Please confirm this is the intended design—that each test module should invoke the macro only once for a specific adapter.

The macro implementation is correct and follows proper hygiene practices with $crate:: paths. The two forms appropriately handle default vs. custom parameters.

Optional: Consider reducing duplication between the two macro arms.

The two forms are nearly identical except for the params line (lines 26 vs 59, and 39 vs 72). Consider whether a single form with an optional params parameter might simplify maintenance, though the current explicit design may be clearer for users.

Optional: Add documentation for the macro.

Consider adding doc comments explaining:

  • When to use each form (default params vs custom params)
  • The purpose of each parameter (adapter type, provider name, env key, base URL)
  • Expected usage pattern (one invocation per test module)
owhisper/owhisper-client/src/adapter/parsing.rs (1)

80-172: Well-designed fluent builder API.

The TranscriptResponseBuilder provides a clean, ergonomic API with sensible defaults. The fallback to computed timing from words when not explicitly set is a good design choice.

One minor observation: there's no confidence setter on the builder, so it always defaults to 1.0. If this is intentional (confidence is always assumed to be 1.0 for these use cases), this is fine. Otherwise, consider adding a confidence method for completeness.

owhisper/owhisper-client/src/adapter/http.rs (2)

26-31: Consider potential PII/sensitive data in logged bodies.

Both parse_json_response and parse_provider_json log the full response body/raw JSON on parse failure. If API responses could contain sensitive information (user data, API keys in error messages, etc.), this might inadvertently log PII.

Consider truncating the logged body or sanitizing it:

 tracing::warn!(
     error = ?e,
     %provider,
-    body = %text,
+    body = %text.chars().take(500).collect::<String>(),
     "stt_json_parse_failed"
 );

Also applies to: 44-49


32-35: Consider a more specific error variant for JSON parsing.

Using Error::AudioProcessing for JSON parse errors is semantically confusing since it's not actually an audio processing issue. If the Error enum supports it, consider a more descriptive variant like Error::JsonParsing or similar.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f898bb and ad5a95d.

📒 Files selected for processing (8)
  • owhisper/owhisper-client/src/adapter/argmax/batch.rs (2 hunks)
  • owhisper/owhisper-client/src/adapter/assemblyai/batch.rs (4 hunks)
  • owhisper/owhisper-client/src/adapter/audio.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/deepgram/batch.rs (2 hunks)
  • owhisper/owhisper-client/src/adapter/http.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/mod.rs (1 hunks)
  • owhisper/owhisper-client/src/adapter/parsing.rs (2 hunks)
  • owhisper/owhisper-client/src/test_utils.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
owhisper/owhisper-client/src/adapter/deepgram/batch.rs (3)
owhisper/owhisper-client/src/adapter/audio.rs (1)
  • decode_audio_to_linear16 (7-31)
owhisper/owhisper-client/src/adapter/deepgram_compat/mod.rs (1)
  • build_batch_url (97-145)
owhisper/owhisper-client/src/adapter/http.rs (1)
  • ensure_success (6-14)
owhisper/owhisper-client/src/adapter/assemblyai/batch.rs (2)
owhisper/owhisper-client/src/adapter/audio.rs (1)
  • decode_audio_to_bytes (33-36)
owhisper/owhisper-client/src/adapter/http.rs (1)
  • ensure_success (6-14)
owhisper/owhisper-client/src/adapter/audio.rs (1)
crates/audio-utils/src/lib.rs (3)
  • f32_to_i16_bytes (66-76)
  • resample_audio (171-220)
  • source_from_path (129-135)
owhisper/owhisper-client/src/adapter/parsing.rs (3)
packages/store/src/schema-external.ts (1)
  • Word (171-171)
owhisper/owhisper-client/src/adapter/assemblyai/live.rs (2)
  • words (240-243)
  • words (249-252)
owhisper/owhisper-interface/src/batch.rs (1)
  • channel (85-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Redirect rules - hyprnote-storybook
  • GitHub Check: Header rules - hyprnote-storybook
  • GitHub Check: Pages changed - hyprnote-storybook
  • GitHub Check: Redirect rules - hyprnote
  • GitHub Check: Header rules - hyprnote
  • GitHub Check: Pages changed - hyprnote
  • GitHub Check: Devin
  • GitHub Check: fmt
🔇 Additional comments (9)
owhisper/owhisper-client/src/adapter/parsing.rs (1)

48-78: LGTM - Clean helper function for transcript response construction.

The function correctly computes timing from words and constructs a well-formed StreamResponse. The hardcoded confidence: 1.0 is consistent with the builder implementation below.

owhisper/owhisper-client/src/adapter/mod.rs (1)

3-3: LGTM - New modules correctly exposed.

The new audio and http modules are appropriately declared as public, enabling their use across adapter implementations.

Also applies to: 8-8

owhisper/owhisper-client/src/adapter/deepgram/batch.rs (1)

62-63: LGTM - Clean refactor to centralized HTTP handling.

The replacement of manual status checking with ensure_success followed by response.json() simplifies the code while maintaining equivalent error handling. This is a good application of DRY principles.

owhisper/owhisper-client/src/adapter/assemblyai/batch.rs (2)

127-128: LGTM - Consistent use of ensure_success across all API calls.

The upload, transcript creation, and polling responses all now use the centralized ensure_success helper, providing uniform error handling throughout the transcription workflow.


176-177: Good integration within the polling loop.

The ensure_success call inside the polling closure maintains the same error semantics while simplifying the code. Non-2xx responses during polling will now be handled consistently.

owhisper/owhisper-client/src/adapter/argmax/batch.rs (1)

60-61: LGTM - Matches the pattern in other batch adapters.

Consistent refactoring to use ensure_success followed by response.json(), aligning with the Deepgram adapter implementation.

owhisper/owhisper-client/src/adapter/http.rs (1)

6-14: LGTM - Clean HTTP status validation.

The ensure_success function correctly checks for 2xx status codes and captures both status and body for error reporting.

owhisper/owhisper-client/src/adapter/audio.rs (2)

38-53: LGTM - Correct mono mixing implementation.

The mix_to_mono function correctly handles edge cases:

  • Returns input unchanged for single-channel audio
  • Properly averages all channels per frame
  • Handles empty input gracefully

The use of frame.len() as f32 for division is safe since empty frames are skipped via continue.


59-76: Good test coverage for audio decoding.

Tests verify both decode_audio_to_linear16 and decode_audio_to_bytes produce non-empty output with valid sample rates. The mono mixing tests cover single-channel, stereo, and empty input cases.

Previously, resample_audio was called with the source sample rate,
which was a no-op. Now it properly resamples to 16kHz, which is the
standard sample rate expected by STT services.

Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants