-
Notifications
You must be signed in to change notification settings - Fork 452
refactor(owhisper-client): extract shared utilities for adapters #2130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add shared audio decoding utilities in adapter/audio.rs - Add shared HTTP error handling utilities in adapter/http.rs - Add TranscriptResponseBuilder and build_transcript_response helper in parsing.rs - Add define_realtime_e2e_tests! macro in test_utils.rs - Refactor deepgram/batch.rs to use shared audio and HTTP utilities - Refactor argmax/batch.rs to use shared audio and HTTP utilities - Refactor assemblyai/batch.rs to use shared audio and HTTP utilities Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
✅ Deploy Preview for hyprnote ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughShared HTTP response handling and audio decoding were extracted into new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Deploy Preview for hyprnote-storybook ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
owhisper/owhisper-client/src/test_utils.rs (1)
11-78: Verify that the macro is intended for single invocation per test module.Both arms of this macro generate test functions with identical names (
test_build_singleandtest_build_dual). If the macro is invoked multiple times within the same module, it will cause duplicate definition errors. Please confirm this is the intended design—that each test module should invoke the macro only once for a specific adapter.The macro implementation is correct and follows proper hygiene practices with
$crate::paths. The two forms appropriately handle default vs. custom parameters.Optional: Consider reducing duplication between the two macro arms.
The two forms are nearly identical except for the
paramsline (lines 26 vs 59, and 39 vs 72). Consider whether a single form with an optional params parameter might simplify maintenance, though the current explicit design may be clearer for users.Optional: Add documentation for the macro.
Consider adding doc comments explaining:
- When to use each form (default params vs custom params)
- The purpose of each parameter (adapter type, provider name, env key, base URL)
- Expected usage pattern (one invocation per test module)
owhisper/owhisper-client/src/adapter/parsing.rs (1)
80-172: Well-designed fluent builder API.The
TranscriptResponseBuilderprovides a clean, ergonomic API with sensible defaults. The fallback to computed timing from words when not explicitly set is a good design choice.One minor observation: there's no
confidencesetter on the builder, so it always defaults to1.0. If this is intentional (confidence is always assumed to be 1.0 for these use cases), this is fine. Otherwise, consider adding aconfidencemethod for completeness.owhisper/owhisper-client/src/adapter/http.rs (2)
26-31: Consider potential PII/sensitive data in logged bodies.Both
parse_json_responseandparse_provider_jsonlog the full response body/raw JSON on parse failure. If API responses could contain sensitive information (user data, API keys in error messages, etc.), this might inadvertently log PII.Consider truncating the logged body or sanitizing it:
tracing::warn!( error = ?e, %provider, - body = %text, + body = %text.chars().take(500).collect::<String>(), "stt_json_parse_failed" );Also applies to: 44-49
32-35: Consider a more specific error variant for JSON parsing.Using
Error::AudioProcessingfor JSON parse errors is semantically confusing since it's not actually an audio processing issue. If theErrorenum supports it, consider a more descriptive variant likeError::JsonParsingor similar.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
owhisper/owhisper-client/src/adapter/argmax/batch.rs(2 hunks)owhisper/owhisper-client/src/adapter/assemblyai/batch.rs(4 hunks)owhisper/owhisper-client/src/adapter/audio.rs(1 hunks)owhisper/owhisper-client/src/adapter/deepgram/batch.rs(2 hunks)owhisper/owhisper-client/src/adapter/http.rs(1 hunks)owhisper/owhisper-client/src/adapter/mod.rs(1 hunks)owhisper/owhisper-client/src/adapter/parsing.rs(2 hunks)owhisper/owhisper-client/src/test_utils.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
owhisper/owhisper-client/src/adapter/deepgram/batch.rs (3)
owhisper/owhisper-client/src/adapter/audio.rs (1)
decode_audio_to_linear16(7-31)owhisper/owhisper-client/src/adapter/deepgram_compat/mod.rs (1)
build_batch_url(97-145)owhisper/owhisper-client/src/adapter/http.rs (1)
ensure_success(6-14)
owhisper/owhisper-client/src/adapter/assemblyai/batch.rs (2)
owhisper/owhisper-client/src/adapter/audio.rs (1)
decode_audio_to_bytes(33-36)owhisper/owhisper-client/src/adapter/http.rs (1)
ensure_success(6-14)
owhisper/owhisper-client/src/adapter/audio.rs (1)
crates/audio-utils/src/lib.rs (3)
f32_to_i16_bytes(66-76)resample_audio(171-220)source_from_path(129-135)
owhisper/owhisper-client/src/adapter/parsing.rs (3)
packages/store/src/schema-external.ts (1)
Word(171-171)owhisper/owhisper-client/src/adapter/assemblyai/live.rs (2)
words(240-243)words(249-252)owhisper/owhisper-interface/src/batch.rs (1)
channel(85-89)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Redirect rules - hyprnote-storybook
- GitHub Check: Header rules - hyprnote-storybook
- GitHub Check: Pages changed - hyprnote-storybook
- GitHub Check: Redirect rules - hyprnote
- GitHub Check: Header rules - hyprnote
- GitHub Check: Pages changed - hyprnote
- GitHub Check: Devin
- GitHub Check: fmt
🔇 Additional comments (9)
owhisper/owhisper-client/src/adapter/parsing.rs (1)
48-78: LGTM - Clean helper function for transcript response construction.The function correctly computes timing from words and constructs a well-formed
StreamResponse. The hardcodedconfidence: 1.0is consistent with the builder implementation below.owhisper/owhisper-client/src/adapter/mod.rs (1)
3-3: LGTM - New modules correctly exposed.The new
audioandhttpmodules are appropriately declared as public, enabling their use across adapter implementations.Also applies to: 8-8
owhisper/owhisper-client/src/adapter/deepgram/batch.rs (1)
62-63: LGTM - Clean refactor to centralized HTTP handling.The replacement of manual status checking with
ensure_successfollowed byresponse.json()simplifies the code while maintaining equivalent error handling. This is a good application of DRY principles.owhisper/owhisper-client/src/adapter/assemblyai/batch.rs (2)
127-128: LGTM - Consistent use of ensure_success across all API calls.The upload, transcript creation, and polling responses all now use the centralized
ensure_successhelper, providing uniform error handling throughout the transcription workflow.
176-177: Good integration within the polling loop.The
ensure_successcall inside the polling closure maintains the same error semantics while simplifying the code. Non-2xx responses during polling will now be handled consistently.owhisper/owhisper-client/src/adapter/argmax/batch.rs (1)
60-61: LGTM - Matches the pattern in other batch adapters.Consistent refactoring to use
ensure_successfollowed byresponse.json(), aligning with the Deepgram adapter implementation.owhisper/owhisper-client/src/adapter/http.rs (1)
6-14: LGTM - Clean HTTP status validation.The
ensure_successfunction correctly checks for 2xx status codes and captures both status and body for error reporting.owhisper/owhisper-client/src/adapter/audio.rs (2)
38-53: LGTM - Correct mono mixing implementation.The
mix_to_monofunction correctly handles edge cases:
- Returns input unchanged for single-channel audio
- Properly averages all channels per frame
- Handles empty input gracefully
The use of
frame.len() as f32for division is safe since empty frames are skipped viacontinue.
59-76: Good test coverage for audio decoding.Tests verify both
decode_audio_to_linear16anddecode_audio_to_bytesproduce non-empty output with valid sample rates. The mono mixing tests cover single-channel, stereo, and empty input cases.
Previously, resample_audio was called with the source sample rate, which was a no-op. Now it properly resamples to 16kHz, which is the standard sample rate expected by STT services. Co-Authored-By: yujonglee <yujonglee.dev@gmail.com>
refactor(owhisper-client): extract shared utilities for adapters
Summary
This PR reduces code duplication in the STT adapter implementations by extracting common patterns into shared utility modules:
adapter/audio.rs: Shared audio decoding utilities (decode_audio_to_linear16,decode_audio_to_bytes,mix_to_mono) that were previously duplicated across deepgram, argmax, and assemblyai batch adaptersadapter/http.rs: Shared HTTP error handling (ensure_success,parse_json_response,parse_provider_json) to standardize response validationparsing.rs: AddedTranscriptResponseBuilderandbuild_transcript_responsehelper for constructingStreamResponseobjectstest_utils.rs: Addeddefine_realtime_e2e_tests!macro to reduce boilerplate in E2E testsThe deepgram, argmax, and assemblyai batch adapters have been refactored to use these shared utilities.
Updates since last revision
sample_ratetoresample_audio(), which was a no-op. Now it properly resamples to 16kHz (TARGET_SAMPLE_RATE = 16000), which is the standard rate expected by STT services.Review & Testing Checklist for Human
decode_audio_to_linear16now resamples all audio to 16kHz. The original code was NOT resampling (passing source rate was a no-op). Confirm this behavioral change is intended and doesn't break STT providers that expect different sample rates.infisical run --env=dev --projectId=87dad7b5-72a6-4791-9228-b3b86b169db1 --path="/stt" -- cargo test --ignoredto verify deepgram, argmax, and assemblyai batch transcription still works with the 16kHz resamplingmix_to_monofunction was extracted from duplicated code - confirm the mixing logic produces identical resultsTranscriptResponseBuilder,build_transcript_response,parse_json_response, andparse_provider_jsonare added but not yet used (compiler warnings confirm). Decide if these should be removed or kept for follow-up workNotes
parsing.rsand some inhttp.rsare not yet adopted by adapters - they were added as infrastructure for future refactoringhttp.rsare mapped toError::AudioProcessingas a workaround to avoid adding a new error variant - this is semantically impreciseLink to Devin run: https://app.devin.ai/sessions/127bbb6142c340ffba9fedd68f22ed9c
Requested by: yujonglee (@yujonglee)