Add SRT2Voice workflow#673
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a new Qwen VoiceDesign TTS engine, a full SRT-driven dubbing backend (DB models, migrations, router and 24+ endpoints), frontend timeline/editor components and story-editor refactor, profile UI for designed voices, generation plumbing (temperature, model-cache checks), voice-prompt CPU caching, audio utilities, build/spec updates, and extensive documentation. ChangesVoiceDesign & Dubbing
Estimated code review effort 🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
sequenceDiagram
participant Client
participant API as Voicebox API (/dubbing)
participant DB
participant TTS as TTS Backend (qwen_voice_design / others)
participant STT as STT Backend
Client->>API: POST /import-srt (file)
API->>DB: create DubbingProject + DubbingSegments
Client->>API: POST /projects/{id}/segments/{seg}/generate
API->>TTS: ensure_model_cached_or_raise -> queue run_generation(engine, model_size)
API->>STT: (optional) release memory / request timestamps
TTS-->>API: generation complete (audio path)
API->>DB: update segment generation_id/status
Client->>API: GET /projects/{id}/export-audio -> returns WAV blob
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 13
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (8)
SRT2VOICE.md-1151-1153 (1)
1151-1153:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winResolve contradictory NumPy version guidance.
Line 1152 documents validated runtime with
numpy 1.26.4, while Line 1995 states the safe version isnumpy 2.0.0. Keep one canonical version/range (or clearly separate “validated on machine X” vs “required for packaging”) to avoid build drift.Also applies to: 1994-1996
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@SRT2VOICE.md` around lines 1151 - 1153, The docs currently contradict on NumPy: one spot lists "numpy 1.26.4" and another lists "numpy 2.0.0"; pick a single canonical statement or explicitly separate "validated on X" vs "required for packaging". Update both occurrences that mention "numpy 1.26.4" and "numpy 2.0.0" to either the chosen canonical version/range or to a clear pair of statements (e.g., "Validated runtime: numpy 1.26.4 on GPU machine" and "Required for packaging: numpy >=2.0.0") and make the language consistent across the document so both places match the chosen approach.voicedesign.md-229-237 (1)
229-237:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign NumPy pin with the other integration docs.
Line 233 pins
numpy==2.0.2, which conflicts with the NumPy guidance elsewhere in this PR docs. Please unify to one documented compatibility target (or explicitly scope by environment) so packaging instructions stay reproducible.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@voicedesign.md` around lines 229 - 237, The NumPy pin "numpy==2.0.2" in the voicedesign.md snippet conflicts with other PR docs; update the document so the NumPy version is consistent across all integration docs (either change "numpy==2.0.2" to the agreed common pin used elsewhere or explicitly scope the pin by environment, e.g., "Windows build: numpy==2.0.2" vs. other environments), and mention the compatibility constraint with Numba/qwen_tts next to the pin to keep packaging instructions reproducible.backend/voicebox-server.spec-85-85 (1)
85-85:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winHardcoded
console=Falsemay break log capture on macOS/Linux.The build_binary.py only adds
--noconsoleon Windows because "On macOS/Linux the sidecar needs stdout/stderr for Tauri to capture logs" (lines 87-88). If this spec file is used directly on non-Windows platforms, log capture would break.If this spec is Windows-specific, consider adding a comment to clarify. If it should work cross-platform, consider keeping
console=True(the dynamic build_binary.py will handle Windows-specific console hiding).Option 1: Add clarifying comment
exe = EXE( pyz, a.scripts, a.binaries, a.datas, [], - name='voicebox-server', + name='voicebox-server', # Windows-specific spec; use build_binary.py for cross-platform builds debug=False, bootloader_ignore_signals=False, strip=False, upx=True, upx_exclude=[], runtime_tmpdir=None, console=False,Option 2: Restore console=True for cross-platform compatibility
runtime_tmpdir=None, - console=False, + console=True, disable_windowed_traceback=False,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/voicebox-server.spec` at line 85, The spec hardcodes console=False which will disable stdout/stderr capture on macOS/Linux; either restore console=True in backend/voicebox-server.spec so cross-platform builds preserve logs (the build_binary.py already injects --noconsole for Windows), or if this spec is intended only for Windows add a clarifying comment above the console=False line noting it is Windows-specific and that build_binary.py handles console suppression on Windows via --noconsole; update the file accordingly to prevent breaking log capture on non-Windows platforms.app/src/components/Sidebar.tsx-25-25 (1)
25-25:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winLocalize the new tab label instead of hard-coding English.
Line 25 uses
label: 'Dubbing', which bypasses i18n while other tabs uselabelKey.Suggested fix
- { id: 'dubbing', path: '/dubbing', icon: Clapperboard, label: 'Dubbing' }, + { id: 'dubbing', path: '/dubbing', icon: Clapperboard, labelKey: 'nav.dubbing' },Add
nav.dubbingto locale resources.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/Sidebar.tsx` at line 25, The new sidebar entry currently hard-codes label: 'Dubbing' in the Sidebar component; change that to use the same i18n pattern as the other tabs (replace the literal label with labelKey: 'nav.dubbing' on the object with id 'dubbing') and add the corresponding "nav.dubbing" key/value to your locale resource files so translations are available for all supported languages.app/src/components/VoiceProfiles/ProfileForm.tsx-616-621 (1)
616-621:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winLocalize the new VoiceDesign copy instead of hardcoding strings.
These new user-facing strings bypass
t(...)and include a hardcoded language assumption (“French”), which breaks localization consistency and can mislead users when another language is selected.Also applies to: 656-657, 969-970, 1034-1054, 1228-1236, 1359-1369
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/VoiceProfiles/ProfileForm.tsx` around lines 616 - 621, The new user-facing strings in ProfileForm.tsx are hardcoded (e.g., the toast title/description "Voice description required" and "Describe the French voice...") and must be localized; replace these literal strings with calls to the translation function (t(...)) and use interpolation for the language name (e.g., t('voiceDesign.descriptionRequired') and t('voiceDesign.describeLanguage', { language: selectedLanguage })) so the message reflects the current locale instead of assuming “French”; apply the same change pattern to the other occurrences referenced (lines ~656, 969, 1034-1054, 1228-1236, 1359-1369) and reference the component state/prop that holds the chosen language (e.g., selectedLanguage or voice.language) when building the interpolated translation.backend/utils/audio.py-139-147 (1)
139-147:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winProbe non-Windows bundled FFmpeg paths too.
find_ffmpeg()only checks bundled*.exenames. On macOS/Linux packaged deployments with localffmpeg, this returnsNoneand disables file-based stretching unnecessarily.Suggested patch
def find_ffmpeg() -> str | None: """Find a local ffmpeg executable without downloading anything.""" env_path = os.environ.get("VOICEBOX_FFMPEG_PATH") candidates = [ Path(env_path) if env_path else None, + Path(sys.executable).resolve().parent / "ffmpeg", + Path(sys.executable).resolve().parent / "_internal" / "ffmpeg", Path(sys.executable).resolve().parent / "ffmpeg.exe", Path(sys.executable).resolve().parent / "_internal" / "ffmpeg.exe", Path("C:/ffmpeg/bin/ffmpeg.exe"), ]Also applies to: 153-156
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/utils/audio.py` around lines 139 - 147, find_ffmpeg() currently only probes Windows-style bundled names (ffmpeg.exe) so packaged macOS/Linux installs are missed; update find_ffmpeg() to also check for non-.exe binaries and common Unix locations and PATH. Specifically, in function find_ffmpeg() add candidate Paths without the .exe suffix (e.g., sibling "ffmpeg" and "_internal/ffmpeg"), include standard locations like /usr/bin and /usr/local/bin, and fall back to shutil.which("ffmpeg") before returning None. Apply the same changes to the analogous probe that checks ffprobe (the block referenced around lines 153-156 / the find_ffprobe logic) so both ffmpeg and ffprobe are discovered on non-Windows systems.backend/routes/dubbing.py-726-739 (1)
726-739:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDon't count stale resets as cancelled tasks.
cancelledis incremented before checking whethercancel_generation_job()actually found a live task. When it returnsNone, the response still reports an active task was cancelled even though this code only reset stale DB state.🐛 Proposed fix
- cancellation_state = cancel_generation_job(generation.id) - cancelled += 1 - if cancellation_state is not None: + cancellation_state = cancel_generation_job(generation.id) + if cancellation_state is not None: + cancelled += 1 task_manager.complete_generation(generation.id) @@ - cancellation_state = cancel_generation_job(full_narration.id) - cancelled += 1 - if cancellation_state is not None: + cancellation_state = cancel_generation_job(full_narration.id) + if cancellation_state is not None: + cancelled += 1 task_manager.complete_generation(full_narration.id)Also applies to: 748-760
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/routes/dubbing.py` around lines 726 - 739, The code increments cancelled before verifying cancel_generation_job(generation.id) actually found a live task, so change the logic to only increment the cancelled counter when cancellation_state is not None (i.e., after a successful cancellation and task_manager.complete_generation(generation.id) call); if cancel_generation_job returns None, do not increment cancelled and still call history.update_generation_status(...) with the "Stale generation reset by user" error. Apply the same fix to the other similar block that uses cancel_generation_job, cancelled, task_manager.complete_generation, and history.update_generation_status so stale resets are not counted as cancelled tasks.backend/routes/dubbing.py-149-166 (1)
149-166:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winInclude
acceptablefits in the project summary.
fit_status="acceptable"is part of the segment contract, but this aggregate ignores it. A project whose segments are all acceptable can end up withsegment_count > 0while every fit bucket stays at zero.🐛 Proposed fix
- exact_count = sum(1 for segment in segments if segment.fit_status == "exact") + exact_count = sum( + 1 for segment in segments if segment.fit_status in {"exact", "acceptable"} + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/routes/dubbing.py` around lines 149 - 166, The project summary in _serialize_project_list_item ignores segments with fit_status == "acceptable", so add an acceptable_count = sum(1 for segment in segments if segment.fit_status == "acceptable") and pass acceptable_count into the models.DubbingProjectListItemResponse construction (alongside exact_count, warning_count, failed_count, pending_count) so acceptable fits are reflected in the aggregated response.
🧹 Nitpick comments (5)
backend/build_binary.py (1)
51-55: 💤 Low valueConsider adding a log statement for build observability.
The quiet
-qflag suppresses pip output, but a log message before the command would help operators understand what's happening during longer builds.Suggested logging
def pin_numpy_for_numba(): """Keep qwen_tts/numba compatible after torch wheel swaps.""" import subprocess + logger.info("Pinning numpy==2.0.0 for numba compatibility...") subprocess.run([sys.executable, "-m", "pip", "install", "numpy==2.0.0", "-q"], check=True)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/build_binary.py` around lines 51 - 55, Add an informational log message in pin_numpy_for_numba before the subprocess.run call so build logs show that numpy==2.0.0 is being installed for numba compatibility; within that function (pin_numpy_for_numba) obtain a logger (e.g., logging.getLogger(__name__) or use the module's existing logger) and call logger.info(...) with a short descriptive message immediately before the subprocess.run([sys.executable, "-m", "pip", "install", "numpy==2.0.0", "-q"], check=True) invocation.backend/services/cuda.py (1)
381-385: ⚡ Quick winUpdate the function docstring to match the new manual-update behavior.
The implementation now warns and exits, but the docstring still states auto-download behavior.
Suggested patch
async def check_and_update_cuda_binary(): - """Check if the CUDA binary is outdated and auto-download if so. + """Check if the CUDA binary is outdated and log manual-update guidance. Called on server startup. Checks both server version and CUDA libs - version. Downloads only what's needed. + version. In this fork, automatic CUDA downloads are disabled. """Also applies to: 405-409
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/cuda.py` around lines 381 - 385, Update the docstring for the function currently titled "Check if the CUDA binary is outdated and auto-download if so." to reflect the new manual-update behavior: remove any claim that binaries are auto-downloaded and explicitly state that the function now warns and exits when CUDA/server versions are incompatible and that manual intervention is required; apply the same docstring wording change to the second similar docstring instance present later in the file (the one around the second occurrence of the same check).backend/database/migrations.py (1)
298-298: ⚡ Quick winRemove the unreachable duplicate cleanup block.
The
if False and ...guard makes this branch permanently dead, so the legacydefault_intentdrop logic now exists in two places. Keeping the second copy around is confusing and tiescolumnsto whichever dubbing-table branch ran last.♻️ Proposed cleanup
- if False and "default_intent" in columns: - if _supports_drop_column(engine): - with engine.connect() as conn: - conn.execute(text("ALTER TABLE mcp_client_bindings DROP COLUMN default_intent")) - conn.commit() - logger.info("Dropped legacy default_intent column from mcp_client_bindings") - else: - # ALTER TABLE … DROP COLUMN on SQLite requires 3.35+ (Mar - # 2021). Production PyInstaller builds bundle Python 3.12 - # which links to SQLite 3.40+; this branch only fires for - # dev environments running the backend directly against an - # old system SQLite (Ubuntu 20.04 = 3.31, Debian 11 = 3.34). - # Leaving the unused column in place is harmless — the ORM - # only maps declared columns, so a stray one does no work - # and gets no reads or writes. - logger.warning( - "SQLite %s too old to DROP COLUMN (need 3.35+); leaving unused default_intent column on mcp_client_bindings in place.", - sqlite3.sqlite_version, - )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/database/migrations.py` at line 298, Remove the dead duplicate cleanup block guarded by "if False and 'default_intent' in columns" so the legacy drop logic for default_intent only exists in the primary cleanup path; delete the unreachable if-statement and its body (the branch that checks for "default_intent" under the False guard) to avoid coupling the columns variable to whichever dubbing-table branch executed last and eliminate the redundant drop logic.backend/services/srt_parser.py (2)
9-9: 💤 Low valueConsider adding range validation for minutes and seconds.
The regex allows minutes and seconds to be 00-99, but valid time values require minutes < 60 and seconds < 60. While most SRT files will have valid ranges, malformed input like
00:99:99,999would pass regex validation but represent an invalid time.The current implementation is functional (calculations still work), but adding validation would improve error messages for malformed SRT files.
🔍 Optional enhancement: stricter timecode validation
-TIMECODE_RE = re.compile(r"^(?P<h>\d{2}):(?P<m>\d{2}):(?P<s>\d{2}),(?P<ms>\d{3})$") +TIMECODE_RE = re.compile(r"^(?P<h>\d{2}):(?P<m>[0-5]\d):(?P<s>[0-5]\d),(?P<ms>\d{3})$")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/srt_parser.py` at line 9, The TIMECODE_RE currently allows minutes and seconds 00-99; update the pattern (the TIMECODE_RE constant) to constrain minutes and seconds to 00-59 (e.g., use a range like [0-5]\d) so malformed timecodes like "00:99:99,999" are rejected; adjust any error handling around the regex match in srt_parser.py to raise/return a clear validation error when the match fails so callers get a descriptive message about invalid timecode ranges.
60-62: ⚡ Quick winPosition metadata is non-standard; consider if this edge case warrants handling.
SRT position metadata (e.g.,
X1:40 X2:600 Y1:20 Y2:50) is not part of the official SRT specification and lacks broad support across players. While technically the current code would fail if such metadata were present—becauseTIMECODE_REexpects the timecode to end after milliseconds ($anchor)—this is a non-standard edge case. If your use case requires handling non-standard SRT variants, the proposed fix to strip position data is reasonable. Otherwise, this is unlikely to occur in typical SRT files.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/services/srt_parser.py` around lines 60 - 62, The parser currently assumes lines[1] contains only "start --> end" and will fail if non-standard SRT position metadata is present; update the logic around lines[1] / the variables start_tc and end_tc to strip any trailing position metadata before parsing (e.g., split each timecode on whitespace or apply a regex to capture only the HH:MM:SS,mmm portion), then pass the cleaned start_tc and end_tc into parse_timecode_to_ms; also ensure TIMECODE_RE usage (if present) matches only the timecode portion rather than the whole token so parse_timecode_to_ms receives valid input.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/AudioTimeline/AudioTrackEditor.tsx`:
- Around line 571-622: The icon-only transport and zoom Buttons (the Button with
onClick={onPlayPause}, the Button with onClick={onStop}, the Button with
onClick={handleZoomOut}, and the Button with onClick={handleZoomIn}) need
accessible names; add aria-label attributes (or a dynamic aria-label for the
play/pause button that toggles between "Play" and "Pause" based on isPlaying) to
each Button (e.g., aria-label="Play"/"Pause", aria-label="Stop",
aria-label="Zoom out", aria-label="Zoom in"), ensuring the Button component
supports aria-label and keeping existing title props if present.
- Around line 762-774: The playhead div with role="slider" is not
keyboard-focusable or operable; make it accessible by adding tabIndex={0} and
wiring an onKeyDown handler (e.g., onPlayheadKeyDown) that maps
ArrowLeft/ArrowRight (small seeks), PageUp/PageDown (larger seeks), Home/End
(start/end) to update currentTimeMs via the same state/update function used by
handlePlayheadMouseDown (or the component's seek dispatcher), clamp values
between 0 and totalDurationMs, and update aria-valuenow (and optionally
aria-valuetext) accordingly so msToPixels/currentTimeMs/totalDurationMs remain
the single source of truth for position.
In `@app/src/components/Generation/FloatingGenerateBox.tsx`:
- Around line 175-179: The early return after setting engine/modelSize for
designed voices (the block using selectedProfile?.voice_type === 'designed' and
form.setValue('engine', 'qwen_voice_design'); form.setValue('modelSize',
'1.7B')) prevents the downstream profile/state synchronization
(effects/personality sync) from running; remove the return and let execution
continue so the subsequent sync logic that reads selectedProfile and updates
effect/personality state runs. Also apply the same change to the analogous
branch later (the block around the other designed-profile handling between the
~193-219 area) so both designed-profile paths perform engine/model adjustments
but do not short-circuit the rest of the sync.
In `@backend/app.py`:
- Around line 244-253: The reset SQL in the db.execute call (the sa_text passed
to db.execute that assigns reset_result) omits generations with status 'failed',
so update the subquery WHERE clause to include 'failed' alongside 'generating'
and 'loading_model' (e.g., WHERE status IN
('generating','loading_model','failed') AND source = 'dubbing_segment') so stale
dubbing_segments linked to failed generations are reset; edit the SQL string
inside the reset_result = db.execute(...) call in app.py accordingly.
In `@backend/backends/__init__.py`:
- Around line 783-786: The finally block currently returns True which suppresses
any exceptions from backend.unload_model(); to fix this, remove the return from
the finally block in drop_tts_backend_for_engine and instead return True only on
successful unload (place return True inside the try after
backend.unload_model()), allow exceptions to propagate (or catch/log and
re-raise) so failures are not masked; reference backend.unload_model() and the
function drop_tts_backend_for_engine to locate and apply the change.
In `@backend/backends/pytorch_backend.py`:
- Around line 416-422: The pipeline call passes device=self.device which will
break on Windows DirectML backends; either remove the device argument (since the
model is already loaded on the correct device) or only pass device when it is
not a DirectML device. Update the code around the pipeline(...) creation (the
call to pipeline with model=self.model, tokenizer=self.processor.tokenizer,
feature_extractor=self.processor.feature_extractor) to omit device=self.device,
or add a small conditional that detects DirectML (e.g., check str(self.device)
for "dml" or self.device.type == "dml") and only include device=self.device when
that check is false.
In `@backend/models.py`:
- Around line 457-473: DubbingAutoFitRequest is missing the temperature field so
user-supplied temperature gets ignored; add a temperature attribute to
DubbingAutoFitRequest (matching the shape/validation used in
DubbingSegmentGenerateRequest) with the same default, type, and constraints so
incoming auto-fit requests preserve the caller's temperature setting (update the
class DubbingAutoFitRequest and ensure its Field parameters mirror those in
DubbingSegmentGenerateRequest).
In `@backend/routes/dubbing.py`:
- Around line 187-190: When decoding the uploaded SRT bytes (variable raw) the
code currently falls back from "utf-8-sig" straight to "cp1252" which will
silently produce garbage for UTF-16 files; update the decode fallback in the
try/except around raw.decode(...) so that on UnicodeDecodeError it next attempts
raw.decode("utf-16") before finally falling back to raw.decode("cp1252"),
keeping the same exception handling for UnicodeDecodeError and assigning the
result to content.
In `@backend/routes/generations.py`:
- Around line 166-167: Replace the hardcoded fallback gen.model_size or "1.7B"
before calling ensure_model_cached_or_raise by computing an engine-aware model
size like generate_speech does: if gen.model_size is present use it; else if
gen.engine is "tada" use "1B"; else if the engine is size-aware use "1.7B";
otherwise leave size empty/no-size. Then call ensure_model_cached_or_raise with
gen.engine (or "qwen" default) and that computed model_size. Apply the same
change for the other occurrence referenced (the retry/regenerate path).
In `@backend/services/generation.py`:
- Around line 163-170: The current finally path unconditionally calls
drop_tts_backend_for_engine(engine) when unload_after is True, which can tear
down a shared backend used by other concurrent generations; change this to a
safe unload strategy: do not directly call drop_tts_backend_for_engine(engine)
from the task finally block — instead use a backend lifecycle manager or
reference-counting API (e.g., increment/decrement active users) and only call
drop_tts_backend_for_engine when the manager confirms no other tasks are using
that engine, or acquire a per-engine lock and verify exclusivity before
unloading; update the code around unload_after and the call to
drop_tts_backend_for_engine(engine) to consult the manager/refcount or lock
before performing the actual unload.
In `@backend/utils/chunked_tts.py`:
- Around line 252-260: The call to backend.generate forwards temperature
unconditionally; to avoid TypeError for backends whose generate() doesn't accept
a temperature kwarg, inspect the backend.generate signature before adding
temperature to extra_kwargs (e.g., use inspect.signature(backend.generate) and
only include "temperature" when it's a parameter or check a capability flag like
backend.supports_temperature). Apply the same guard at both locations where
extra_kwargs is built (the block around the backend.generate call shown and the
similar block at 287-295) so temperature is only forwarded to backends that
declare support.
In `@tauri/src-tauri/src/main.rs`:
- Around line 243-257: The loop currently treats state.child presence as
readiness and can wait past a failed starter; change the completion checks so
success is only returned when check_health(SERVER_PORT) returns true, and add an
early exit that returns Err if the startup marker (state.starting) is
false/cleared while health is still down so waiters stop instead of timing out;
update the loop in the startup_guard.is_none() branch to remove reliance on
state.child.lock().unwrap().is_some() for readiness and to inspect
state.starting (the startup flag) before sleeping/continuing.
---
Minor comments:
In `@app/src/components/Sidebar.tsx`:
- Line 25: The new sidebar entry currently hard-codes label: 'Dubbing' in the
Sidebar component; change that to use the same i18n pattern as the other tabs
(replace the literal label with labelKey: 'nav.dubbing' on the object with id
'dubbing') and add the corresponding "nav.dubbing" key/value to your locale
resource files so translations are available for all supported languages.
In `@app/src/components/VoiceProfiles/ProfileForm.tsx`:
- Around line 616-621: The new user-facing strings in ProfileForm.tsx are
hardcoded (e.g., the toast title/description "Voice description required" and
"Describe the French voice...") and must be localized; replace these literal
strings with calls to the translation function (t(...)) and use interpolation
for the language name (e.g., t('voiceDesign.descriptionRequired') and
t('voiceDesign.describeLanguage', { language: selectedLanguage })) so the
message reflects the current locale instead of assuming “French”; apply the same
change pattern to the other occurrences referenced (lines ~656, 969, 1034-1054,
1228-1236, 1359-1369) and reference the component state/prop that holds the
chosen language (e.g., selectedLanguage or voice.language) when building the
interpolated translation.
In `@backend/routes/dubbing.py`:
- Around line 726-739: The code increments cancelled before verifying
cancel_generation_job(generation.id) actually found a live task, so change the
logic to only increment the cancelled counter when cancellation_state is not
None (i.e., after a successful cancellation and
task_manager.complete_generation(generation.id) call); if cancel_generation_job
returns None, do not increment cancelled and still call
history.update_generation_status(...) with the "Stale generation reset by user"
error. Apply the same fix to the other similar block that uses
cancel_generation_job, cancelled, task_manager.complete_generation, and
history.update_generation_status so stale resets are not counted as cancelled
tasks.
- Around line 149-166: The project summary in _serialize_project_list_item
ignores segments with fit_status == "acceptable", so add an acceptable_count =
sum(1 for segment in segments if segment.fit_status == "acceptable") and pass
acceptable_count into the models.DubbingProjectListItemResponse construction
(alongside exact_count, warning_count, failed_count, pending_count) so
acceptable fits are reflected in the aggregated response.
In `@backend/utils/audio.py`:
- Around line 139-147: find_ffmpeg() currently only probes Windows-style bundled
names (ffmpeg.exe) so packaged macOS/Linux installs are missed; update
find_ffmpeg() to also check for non-.exe binaries and common Unix locations and
PATH. Specifically, in function find_ffmpeg() add candidate Paths without the
.exe suffix (e.g., sibling "ffmpeg" and "_internal/ffmpeg"), include standard
locations like /usr/bin and /usr/local/bin, and fall back to
shutil.which("ffmpeg") before returning None. Apply the same changes to the
analogous probe that checks ffprobe (the block referenced around lines 153-156 /
the find_ffprobe logic) so both ffmpeg and ffprobe are discovered on non-Windows
systems.
In `@backend/voicebox-server.spec`:
- Line 85: The spec hardcodes console=False which will disable stdout/stderr
capture on macOS/Linux; either restore console=True in
backend/voicebox-server.spec so cross-platform builds preserve logs (the
build_binary.py already injects --noconsole for Windows), or if this spec is
intended only for Windows add a clarifying comment above the console=False line
noting it is Windows-specific and that build_binary.py handles console
suppression on Windows via --noconsole; update the file accordingly to prevent
breaking log capture on non-Windows platforms.
In `@SRT2VOICE.md`:
- Around line 1151-1153: The docs currently contradict on NumPy: one spot lists
"numpy 1.26.4" and another lists "numpy 2.0.0"; pick a single canonical
statement or explicitly separate "validated on X" vs "required for packaging".
Update both occurrences that mention "numpy 1.26.4" and "numpy 2.0.0" to either
the chosen canonical version/range or to a clear pair of statements (e.g.,
"Validated runtime: numpy 1.26.4 on GPU machine" and "Required for packaging:
numpy >=2.0.0") and make the language consistent across the document so both
places match the chosen approach.
In `@voicedesign.md`:
- Around line 229-237: The NumPy pin "numpy==2.0.2" in the voicedesign.md
snippet conflicts with other PR docs; update the document so the NumPy version
is consistent across all integration docs (either change "numpy==2.0.2" to the
agreed common pin used elsewhere or explicitly scope the pin by environment,
e.g., "Windows build: numpy==2.0.2" vs. other environments), and mention the
compatibility constraint with Numba/qwen_tts next to the pin to keep packaging
instructions reproducible.
---
Nitpick comments:
In `@backend/build_binary.py`:
- Around line 51-55: Add an informational log message in pin_numpy_for_numba
before the subprocess.run call so build logs show that numpy==2.0.0 is being
installed for numba compatibility; within that function (pin_numpy_for_numba)
obtain a logger (e.g., logging.getLogger(__name__) or use the module's existing
logger) and call logger.info(...) with a short descriptive message immediately
before the subprocess.run([sys.executable, "-m", "pip", "install",
"numpy==2.0.0", "-q"], check=True) invocation.
In `@backend/database/migrations.py`:
- Line 298: Remove the dead duplicate cleanup block guarded by "if False and
'default_intent' in columns" so the legacy drop logic for default_intent only
exists in the primary cleanup path; delete the unreachable if-statement and its
body (the branch that checks for "default_intent" under the False guard) to
avoid coupling the columns variable to whichever dubbing-table branch executed
last and eliminate the redundant drop logic.
In `@backend/services/cuda.py`:
- Around line 381-385: Update the docstring for the function currently titled
"Check if the CUDA binary is outdated and auto-download if so." to reflect the
new manual-update behavior: remove any claim that binaries are auto-downloaded
and explicitly state that the function now warns and exits when CUDA/server
versions are incompatible and that manual intervention is required; apply the
same docstring wording change to the second similar docstring instance present
later in the file (the one around the second occurrence of the same check).
In `@backend/services/srt_parser.py`:
- Line 9: The TIMECODE_RE currently allows minutes and seconds 00-99; update the
pattern (the TIMECODE_RE constant) to constrain minutes and seconds to 00-59
(e.g., use a range like [0-5]\d) so malformed timecodes like "00:99:99,999" are
rejected; adjust any error handling around the regex match in srt_parser.py to
raise/return a clear validation error when the match fails so callers get a
descriptive message about invalid timecode ranges.
- Around line 60-62: The parser currently assumes lines[1] contains only "start
--> end" and will fail if non-standard SRT position metadata is present; update
the logic around lines[1] / the variables start_tc and end_tc to strip any
trailing position metadata before parsing (e.g., split each timecode on
whitespace or apply a regex to capture only the HH:MM:SS,mmm portion), then pass
the cleaned start_tc and end_tc into parse_timecode_to_ms; also ensure
TIMECODE_RE usage (if present) matches only the timecode portion rather than the
whole token so parse_timecode_to_ms receives valid input.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 41f70470-aac5-45cd-bcb1-037134563413
📒 Files selected for processing (52)
.gitignoreSRT2VOICE.mdapp/src/components/AudioTimeline/AudioTrackEditor.tsxapp/src/components/AudioTimeline/ClipWaveform.tsxapp/src/components/AudioTimeline/TimelineScrollbar.tsxapp/src/components/DubbingTab/DubbingTab.tsxapp/src/components/Generation/EngineModelSelector.tsxapp/src/components/Generation/FloatingGenerateBox.tsxapp/src/components/ServerSettings/ModelManagement.tsxapp/src/components/Sidebar.tsxapp/src/components/StoriesTab/StoryTrackEditor.tsxapp/src/components/VoiceProfiles/ProfileForm.tsxapp/src/components/VoiceProfiles/ProfileList.tsxapp/src/lib/api/client.tsapp/src/lib/api/types.tsapp/src/lib/constants/languages.tsapp/src/lib/hooks/useGenerationForm.tsapp/src/router.tsxapp/src/stores/uiStore.tsbackend/app.pybackend/backends/__init__.pybackend/backends/hume_backend.pybackend/backends/kokoro_backend.pybackend/backends/luxtts_backend.pybackend/backends/mlx_backend.pybackend/backends/pytorch_backend.pybackend/backends/qwen_custom_voice_backend.pybackend/backends/qwen_voice_design_backend.pybackend/build_binary.pybackend/database/__init__.pybackend/database/migrations.pybackend/database/models.pybackend/models.pybackend/routes/__init__.pybackend/routes/audio.pybackend/routes/dubbing.pybackend/routes/generations.pybackend/routes/models.pybackend/services/cuda.pybackend/services/dubbing.pybackend/services/generation.pybackend/services/history.pybackend/services/profiles.pybackend/services/srt_parser.pybackend/utils/audio.pybackend/utils/cache.pybackend/utils/chunked_tts.pybackend/voicebox-server.specconformitycheck.mddenoiser.mdtauri/src-tauri/src/main.rsvoicedesign.md
| if unload_after: | ||
| voice_prompt = None | ||
| audio = None | ||
| try: | ||
| from ..backends import drop_tts_backend_for_engine | ||
|
|
||
| drop_tts_backend_for_engine(engine) | ||
| except Exception: |
There was a problem hiding this comment.
Avoid unloading shared engine state from a single task’s finally path.
drop_tts_backend_for_engine(engine) runs unconditionally when unload_after=True. If multiple generations share the same backend instance, one task can unload it while another is still generating, causing intermittent failures.
Also applies to: 172-179
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/services/generation.py` around lines 163 - 170, The current finally
path unconditionally calls drop_tts_backend_for_engine(engine) when unload_after
is True, which can tear down a shared backend used by other concurrent
generations; change this to a safe unload strategy: do not directly call
drop_tts_backend_for_engine(engine) from the task finally block — instead use a
backend lifecycle manager or reference-counting API (e.g., increment/decrement
active users) and only call drop_tts_backend_for_engine when the manager
confirms no other tasks are using that engine, or acquire a per-engine lock and
verify exclusivity before unloading; update the code around unload_after and the
call to drop_tts_backend_for_engine(engine) to consult the manager/refcount or
lock before performing the actual unload.
| if startup_guard.is_none() { | ||
| let timeout = tokio::time::Duration::from_secs(120); | ||
| let start_time = tokio::time::Instant::now(); | ||
|
|
||
| loop { | ||
| if check_health(SERVER_PORT) || state.child.lock().unwrap().is_some() { | ||
| return Ok(format!("http://127.0.0.1:{}", SERVER_PORT)); | ||
| } | ||
|
|
||
| if start_time.elapsed() > timeout { | ||
| return Err("Server startup already in progress but did not complete in time".to_string()); | ||
| } | ||
|
|
||
| tokio::time::sleep(tokio::time::Duration::from_millis(250)).await; | ||
| } |
There was a problem hiding this comment.
Fix concurrent wait completion criteria to avoid false-ready and timeout stalls.
This wait branch can return success before the server is actually ready (state.child set), and can also keep waiting the full timeout after the starter already failed (starting reset, health still down). Success should be health-based, and waiters should stop once startup is no longer in progress but health is still failing.
Suggested patch
if startup_guard.is_none() {
let timeout = tokio::time::Duration::from_secs(120);
let start_time = tokio::time::Instant::now();
loop {
- if check_health(SERVER_PORT) || state.child.lock().unwrap().is_some() {
+ if check_health(SERVER_PORT) {
return Ok(format!("http://127.0.0.1:{}", SERVER_PORT));
}
+ if !*state.starting.lock().unwrap() {
+ return Err("Concurrent startup finished without a healthy server".to_string());
+ }
if start_time.elapsed() > timeout {
return Err("Server startup already in progress but did not complete in time".to_string());
}
tokio::time::sleep(tokio::time::Duration::from_millis(250)).await;
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if startup_guard.is_none() { | |
| let timeout = tokio::time::Duration::from_secs(120); | |
| let start_time = tokio::time::Instant::now(); | |
| loop { | |
| if check_health(SERVER_PORT) || state.child.lock().unwrap().is_some() { | |
| return Ok(format!("http://127.0.0.1:{}", SERVER_PORT)); | |
| } | |
| if start_time.elapsed() > timeout { | |
| return Err("Server startup already in progress but did not complete in time".to_string()); | |
| } | |
| tokio::time::sleep(tokio::time::Duration::from_millis(250)).await; | |
| } | |
| if startup_guard.is_none() { | |
| let timeout = tokio::time::Duration::from_secs(120); | |
| let start_time = tokio::time::Instant::now(); | |
| loop { | |
| if check_health(SERVER_PORT) { | |
| return Ok(format!("http://127.0.0.1:{}", SERVER_PORT)); | |
| } | |
| if !*state.starting.lock().unwrap() { | |
| return Err("Concurrent startup finished without a healthy server".to_string()); | |
| } | |
| if start_time.elapsed() > timeout { | |
| return Err("Server startup already in progress but did not complete in time".to_string()); | |
| } | |
| tokio::time::sleep(tokio::time::Duration::from_millis(250)).await; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tauri/src-tauri/src/main.rs` around lines 243 - 257, The loop currently
treats state.child presence as readiness and can wait past a failed starter;
change the completion checks so success is only returned when
check_health(SERVER_PORT) returns true, and add an early exit that returns Err
if the startup marker (state.starting) is false/cleared while health is still
down so waiters stop instead of timing out; update the loop in the
startup_guard.is_none() branch to remove reliance on
state.child.lock().unwrap().is_some() for readiness and to inspect
state.starting (the startup flag) before sleeping/continuing.
|
@coderabbitai review the last commit and last files on github I added woth the modifications. There is also a debug.md to follow what's been modified and what's not and why so you can auto complete |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainI'll fetch the latest commit details and read 🏁 Script executed: Length of output: 1078 🏁 Script executed: Length of output: 158706
Files changed
|
| Area | Status |
|---|---|
debug.md content |
✅ Complete and accurate |
ensureVoiceSelected hint text |
✅ Correct |
Indentation in ensureVoiceSelected |
|
| Language restriction to fr/en | ℹ️ Intentional, worth a TODO |
| Hidden group pace UI | ℹ️ Intentional, worth a comment |
| All debug.md deferred items | ✅ Properly documented with rationale |
The commit is clean. The only actionable item is the cosmetic indentation fix in ensureVoiceSelected.
|
adressing this now to integrate this change : |
|
@coderabbitai review : this has been adressed on the last commit (3/3) |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 5748
All 10 Qwen languages are now properly available and engine-filtered. ✅ 🐇 ✨ If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
|
@coderabbitai I have submitted a bug.md you've reviewed already and in the second, third and last fixed a lot things you mentionned. I now need a list of what's not been fixed. Don't rely on stack comments if you have not properly checked the files first |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainI'll check the actual files before giving you a definitive list. Let me fetch all my review comments and verify the current state of each flagged item. 🏁 Script executed: Length of output: 6028 Now let me check the current state of each flagged file. 🏁 Script executed: Length of output: 20135
❌ Not yet fixed (verified in current files)1.
|
| Issue | Status |
|---|---|
| Language selector limited to fr/en | ✅ Fixed (dynamic engine-filtered list) |
ensureVoiceSelected indentation |
Not yet confirmed (cosmetic) |
| Group pace UI hidden comment | Not yet confirmed (cosmetic) |
| AudioTrackEditor playhead keyboard | ⏸️ Deferred (pre-existing gap, user confirmed) |
| TimelineScrollbar keyboard | ⏸️ Deferred (pre-existing gap, user confirmed) |
generation.py shared-engine unload race |
⏸️ Heavy lift — per debug.md deferred list |
🐇 10 open, 3 deferred. Let me know which ones you want to tackle next.
@coderabbitai can give me the instructions for AI for each the 10 opened |
|
Tip For best results, initiate chat on the files or code changes.
Fix 1 —
|
|
@coderabbitai FloatingGenerateBox designed profile / effects : déjà corrigé. Il n’y a plus de return prématuré, donc la synchro effects/persona continue après sélection d’une voix designed. Startup stale dubbing generations : déjà corrigé. La requête inclut maintenant failed en plus de generating / loading_model. return True dans finally unload : déjà corrigé. drop_tts_backend_for_engine() ne masque plus les exceptions, log + raise. DubbingAutoFitRequest.temperature : déjà corrigé. Le champ temperature est présent. Import SRT UTF-16 : déjà corrigé. Le code tente utf-8-sig, puis utf-16, puis cp1252. Retry/regenerate model size engine-aware : déjà corrigé. Le code utilise _resolve_existing_generation_model_size(engine, gen.model_size) avant ensure_model_cached_or_raise(). chunked_tts temperature guard : déjà corrigé. Il inspecte la signature backend et ne passe temperature que si generate() l’accepte. Boucle startup Tauri : pas corrigé volontairement, à garder en dette technique. Bénéfice théorique : démarrage plus robuste avec /health strict. Risque : très sensible, peut recasser le lancement serveur/ghost server. On avait déjà décidé de ne pas toucher sans chantier dédié. pytorch_backend pipeline device : partiellement sécurisé. Le cas DirectML est explicitement évité. Le risque résiduel concerne surtout xpu/plateformes non testées. Bénéfice d’une correction : meilleure compatibilité future. Risque faible à moyen, mais non prioritaire Windows CUDA actuel. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me verify each fix directly in the files before confirming. 🏁 Script executed: Length of output: 8714 I need one more check on Fix 3 before giving the full picture. 🏁 Script executed: Length of output: 2981 🏁 Script executed: Length of output: 1129
Please push your local commits and I'll re-verify once they land on the branch. 🐇 |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
backend/models.py (1)
477-480: 💤 Low valueRemove unnecessary
passstatement.
DubbingFullNarrationRequestinherits fromDubbingSegmentGenerateRequestand adds no new members, so thepassis redundant. Consider using an ellipsis or removing it.Suggested fix
class DubbingFullNarrationRequest(DubbingSegmentGenerateRequest): """Request model for beta whole-SRT narration generation.""" - - pass + ...Or simply:
class DubbingFullNarrationRequest(DubbingSegmentGenerateRequest): """Request model for beta whole-SRT narration generation."""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/models.py` around lines 477 - 480, The class DubbingFullNarrationRequest currently contains a redundant "pass"; since it only inherits from DubbingSegmentGenerateRequest and has a docstring, remove the unnecessary pass statement (or replace it with an ellipsis literal) so the class body is minimal—locate the class declaration for DubbingFullNarrationRequest and delete the pass line.backend/routes/generations.py (1)
86-87: 💤 Low valueChain exception context for better traceability.
The exception should be raised with
from eto preserve the original traceback.Suggested fix
except ValueError as e: - raise HTTPException(status_code=400, detail=str(e)) + raise HTTPException(status_code=400, detail=str(e)) from e🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/routes/generations.py` around lines 86 - 87, The current exception handler in backend/routes/generations.py catches ValueError as e and re-raises HTTPException without chaining; update the raise to include exception chaining (raise HTTPException(status_code=400, detail=str(e)) from e) so the original traceback is preserved when re-raising from the ValueError caught in that except block.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/AudioTimeline/AudioTrackEditor.tsx`:
- Around line 689-706: The two Plus icon buttons in the AudioTrackEditor
component (the conditional top button where index === 0 and the bottom button
where index === tracks.length - 1) lack accessible names; update each <button>
to include an appropriate aria-label (e.g., "Add track above" for the top button
and "Add track below" for the bottom button) or a dynamically generated label
including the track index, while keeping the existing onClick handlers that call
setExtraTracks and using the same classNames and structure.
In `@backend/backends/__init__.py`:
- Around line 783-788: The reference to logger in drop_tts_backend_for_engine
will raise NameError because no logger is defined; fix by either adding a
module-level logger (import logging and set logger =
logging.getLogger(__name__)) near the top of the file so the existing
logger.exception call works, or mirror the pattern from unload_all_tts_backends
by performing a local import/lookup of the logger inside
drop_tts_backend_for_engine before calling logger.exception; ensure the symbol
logger is defined before use.
---
Nitpick comments:
In `@backend/models.py`:
- Around line 477-480: The class DubbingFullNarrationRequest currently contains
a redundant "pass"; since it only inherits from DubbingSegmentGenerateRequest
and has a docstring, remove the unnecessary pass statement (or replace it with
an ellipsis literal) so the class body is minimal—locate the class declaration
for DubbingFullNarrationRequest and delete the pass line.
In `@backend/routes/generations.py`:
- Around line 86-87: The current exception handler in
backend/routes/generations.py catches ValueError as e and re-raises
HTTPException without chaining; update the raise to include exception chaining
(raise HTTPException(status_code=400, detail=str(e)) from e) so the original
traceback is preserved when re-raising from the ValueError caught in that except
block.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 355d4bd4-0391-4237-964e-7c4bf2929ff3
📒 Files selected for processing (9)
app/src/components/AudioTimeline/AudioTrackEditor.tsxapp/src/components/DubbingTab/DubbingTab.tsxapp/src/components/Generation/FloatingGenerateBox.tsxbackend/app.pybackend/backends/__init__.pybackend/models.pybackend/routes/dubbing.pybackend/routes/generations.pybackend/utils/chunked_tts.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/app.py
- app/src/components/Generation/FloatingGenerateBox.tsx
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/utils/cache.py (1)
87-90:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep corrupted-cache cleanup best-effort.
If
unlink()fails here,get_cached_voice_prompt()can raise instead of safely falling back toNone.Proposed fix
- except Exception: - # Cache file corrupted, delete it - cache_file.unlink() + except Exception: + # Cache file corrupted, delete it (best effort) + try: + cache_file.unlink(missing_ok=True) + except Exception as e: + logger.warning("Failed to delete corrupted cache file %s: %s", cache_file, e)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/utils/cache.py` around lines 87 - 90, In get_cached_voice_prompt(), the except block that handles a corrupted cache calls cache_file.unlink() without guarding against errors; wrap the unlink call in its own try/except (or handle OSError) so failures to delete the cache are swallowed and do not propagate, ensuring the function still returns None on failure; specifically update the except handling around cache_file.unlink() in backend.utils.cache.get_cached_voice_prompt to perform best-effort deletion and ignore any exception from unlink.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/backends/qwen_voice_design_backend.py`:
- Around line 34-63: load_model_async mutates shared state without
synchronization which allows concurrent calls to race; add an asyncio.Lock
(e.g., self._load_lock) initialized in __init__ and use "async with
self._load_lock" inside load_model_async to serialize the pre-checks,
unload_model calls and the to_thread(_load_model_sync) call, and similarly guard
unload_model and any other code paths that mutate self.model or
self._current_model_size (references: __init__, load_model_async, unload_model,
_load_model_sync, is_loaded, _current_model_size, model).
In `@backend/services/generation.py`:
- Around line 294-305: The current retry flow saves every retry to the same
filename (audio_path = .../f"{generation_id}.wav") so subsequent retries
overwrite the file and older version records created via
versions_mod.create_version(generation_id=..., label="retry",
audio_path=storage_path, ...) can point to replaced audio; change the save to
generate a unique filename per retry (e.g., include a UUID, timestamp, or
incremented retry_count in the filename like f"{generation_id}-{retry_id}.wav"),
update audio_path and storage_path accordingly before calling save_audio and
versions_mod.create_version, ensure the created version uses the unique
storage_path and that the function still returns that storage_path so each retry
preserves its own audio file and version record (update references to
audio_path, save_audio, and versions_mod.create_version).
---
Outside diff comments:
In `@backend/utils/cache.py`:
- Around line 87-90: In get_cached_voice_prompt(), the except block that handles
a corrupted cache calls cache_file.unlink() without guarding against errors;
wrap the unlink call in its own try/except (or handle OSError) so failures to
delete the cache are swallowed and do not propagate, ensuring the function still
returns None on failure; specifically update the except handling around
cache_file.unlink() in backend.utils.cache.get_cached_voice_prompt to perform
best-effort deletion and ignore any exception from unlink.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7c1e2ebd-1533-4b69-a835-50b5d80c4576
📒 Files selected for processing (6)
backend/app.pybackend/backends/qwen_voice_design_backend.pybackend/database/migrations.pybackend/database/models.pybackend/services/generation.pybackend/utils/cache.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/app.py
- backend/database/migrations.py
- backend/database/models.py
Summary
This PR adds a new SRT2Voice workflow to Voicebox.
SRT2Voice is a timeline-based SRT-to-speech module designed for narrated training/tutorial videos. It imports an SRT file, preserves subtitle blocks as visual timing references, generates a continuous narration WAV, allows manual and automatic cutting/alignment, and exports a mounted timeline WAV/package for external video editing.
The PR also includes Qwen VoiceDesign integration and several engine/model fixes needed to keep the new workflow consistent with the existing Voicebox architecture.
Main Changes
SRT2VOICE.mdvoicedesign.mddenoiser.mdconformitycheck.mdValidation
Tested locally on Windows 11.
Checks performed:
py_compilecargo checkNote: local full Tauri packaging stops at updater signing because
TAURI_SIGNING_PRIVATE_KEYis not available in the fork environment, which is expected.Notes
No user data, generated audio, local database, voice profiles, HuggingFace cache, CUDA runtime, or build artifacts are included in this PR.
The SRT2Voice workflow is intentionally implemented as a dedicated module while reusing existing Voicebox concepts where possible, especially project history, timeline editing, model management, and generation/export patterns.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation