Skip to content

Add SRT2Voice workflow#673

Open
Trucabulles wants to merge 7 commits into
jamiepine:mainfrom
Trucabulles:feature/srt2voice
Open

Add SRT2Voice workflow#673
Trucabulles wants to merge 7 commits into
jamiepine:mainfrom
Trucabulles:feature/srt2voice

Conversation

@Trucabulles
Copy link
Copy Markdown

@Trucabulles Trucabulles commented May 17, 2026

Summary

This PR adds a new SRT2Voice workflow to Voicebox.

SRT2Voice is a timeline-based SRT-to-speech module designed for narrated training/tutorial videos. It imports an SRT file, preserves subtitle blocks as visual timing references, generates a continuous narration WAV, allows manual and automatic cutting/alignment, and exports a mounted timeline WAV/package for external video editing.

The PR also includes Qwen VoiceDesign integration and several engine/model fixes needed to keep the new workflow consistent with the existing Voicebox architecture.

Main Changes

  • Add new SRT2Voice module and route.
  • Add SRT parser and SRT2Voice backend service.
  • Add reusable audio timeline components.
  • Add continuous full-SRT narration generation.
  • Add automatic cut workflow using word alignment and acoustic boundary detection.
  • Add manual timeline editing support.
  • Add timeline WAV/package export support.
  • Add Qwen VoiceDesign backend support.
  • Add UI support for designed voices and compatible engine filtering.
  • Add project-level pace and temperature controls for Qwen-based engines.
  • Add model/generation fixes for existing Voicebox engines.
  • Add documentation:
    • SRT2VOICE.md
    • voicedesign.md
    • denoiser.md
    • conformitycheck.md

Validation

Tested locally on Windows 11.

Checks performed:

  • Python backend syntax check with py_compile
  • TypeScript typecheck
  • Vite app build
  • Tauri frontend build
  • Rust cargo check
  • Full Tauri build reached executable and bundle generation

Note: local full Tauri packaging stops at updater signing because TAURI_SIGNING_PRIVATE_KEY is not available in the fork environment, which is expected.

Notes

No user data, generated audio, local database, voice profiles, HuggingFace cache, CUDA runtime, or build artifacts are included in this PR.

The SRT2Voice workflow is intentionally implemented as a dedicated module while reusing existing Voicebox concepts where possible, especially project history, timeline editing, model management, and generation/export patterns.

Summary by CodeRabbit

  • New Features

    • Dubbing workflow: new Dubbing tab, SRT import, project/segment management, multi-track timeline editor (clips, waveform, trim/split/duplicate, per-clip volume, playhead, scrollbar) and audio/package export
    • Generation: per-segment & full-narration generation, auto-fit/auto-cut, tempo suggestions/apply
    • Voice Design: create designed-voice profiles from text prompts and new VoiceDesign engine option
  • Bug Fixes

    • Improved messaging/blocking when VoiceDesign models are unavailable; audio endpoints set no-store cache headers
  • Documentation

    • New detailed docs: SRT2VOICE, VoiceDesign, denoiser, conformity, debug and integration notes

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 737a9b4c-6db7-4780-9248-ed6d807692fb

📥 Commits

Reviewing files that changed from the base of the PR and between 3654b5b and b32b995.

📒 Files selected for processing (1)
  • backend/backends/qwen_voice_design_backend.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/backends/qwen_voice_design_backend.py

📝 Walkthrough

Walkthrough

Adds a new Qwen VoiceDesign TTS engine, a full SRT-driven dubbing backend (DB models, migrations, router and 24+ endpoints), frontend timeline/editor components and story-editor refactor, profile UI for designed voices, generation plumbing (temperature, model-cache checks), voice-prompt CPU caching, audio utilities, build/spec updates, and extensive documentation.

Changes

VoiceDesign & Dubbing

Layer / File(s) Summary
Documentation & gitignore
SRT2VOICE.md, voicedesign.md, denoiser.md, conformitycheck.md, debug.md, .gitignore
Adds SRT2Voice/VoiceDesign/denoiser/conformity/debug docs and expands .gitignore for additional venv patterns.
Audio timeline UI
app/src/components/AudioTimeline/AudioTrackEditor.tsx, .../ClipWaveform.tsx, .../TimelineScrollbar.tsx
New AudioTrackEditor, ClipWaveform, and TimelineScrollbar providing timeline/clip interactions, waveform rendering, trimming, dragging, playhead, zoom and a synchronized scrollbar.
Story editor refactor
app/src/components/StoriesTab/StoryTrackEditor.tsx
Refactors StoryTrackEditor to map story items to AudioTrackClip[] and delegate UI/interaction to AudioTrackEditor; wires mutations for move/trim/split/regen.
Generation/profile UI
app/src/components/Generation/*, app/src/components/VoiceProfiles/*, app/src/components/FloatingGenerateBox.tsx, app/src/lib/hooks/useGenerationForm.ts, app/src/lib/constants/languages.ts, app/src/stores/uiStore.ts, app/src/components/Sidebar.tsx, app/src/router.tsx
Adds qwen_voice_design options, safe language selection, designed-profile UI (designPrompt), profile compatibility sync, and UI routing for dubbing.
Frontend API/types & client
app/src/lib/api/types.ts, app/src/lib/api/client.ts, app/src/router.tsx
Adds many dubbing TS request/response types and ApiClient methods for SRT import, project/segment CRUD, generate/auto-fit/full-narration/export/cancel, and modifies getAudioUrl to accept revision.
Database models & migrations
backend/database/models.py, backend/database/migrations.py, backend/database/__init__.py
Adds DubbingProject and DubbingSegment ORM models and migration logic to add dubbing-related columns; re-exports models.
SRT parser
backend/services/srt_parser.py
New strict SRT parsing utilities producing normalized ParsedSrtSegment instances with ms timings and validations.
Dubbing router & serialization
backend/routes/dubbing.py, backend/routes/__init__.py
New /dubbing FastAPI router with import/list/get/delete, per-segment generate/update/timing/manual-cut/auto-fit, project-level generate/post-process/auto-cut/tempo, export audio/package endpoints, cancel and memory release; serializers compute audio paths and timeline metadata.
Backend engine & model registry
backend/backends/qwen_voice_design_backend.py, backend/backends/__init__.py
Implements QwenVoiceDesign backend, registers qwen_voice_design engine and model configs, and integrates it into engine load/cache helpers.
Model cache & routes changes
backend/routes/models.py, backend/routes/generations.py
Implements HF cache inspection for TADA, ensures model-cached checks used in downloads and generation endpoints, and resolves engine-aware model_size defaults.
Generation orchestration & chunking
backend/services/generation.py, backend/utils/chunked_tts.py
Extends run_generation with temperature, use_voice_prompt_cache, unload_after; ensures models cached before load; chunked TTS forwards temperature when supported; retry persistence registers retry versions.
Voice-prompt cache & utilities
backend/utils/cache.py, backend/backends/pytorch_backend.py
Normalize/cpu-move cached prompts, SHA-256 cache keys, torch.load(map_location='cpu'), clear in-memory prompt cache helper, and device-moving helper for PyTorch backend.
STT/TTS robustness
backend/backends/kokoro_backend.py, backend/backends/luxtts_backend.py, backend/backends/mlx_backend.py, backend/backends/pytorch_backend.py, backend/backends/hume_backend.py
Adds padding/retry for kokoro/luxTTS, MLX STT timestamp stub, PyTorch STT word timestamps, Hume/TADA cache normalization and model-size helpers.
Audio utilities
backend/utils/audio.py
Adds in-memory librosa time-stretch, FFmpeg discovery, and FFmpeg-based file time-stretch helper with atomic replace and cleanup.
App startup & cleanup
backend/app.py
Startup stale cleanup detaches/reset dubbing_segments linked to stale generations before marking generations failed; logs adjusted.
Build & packaging
backend/build_binary.py, backend/voicebox-server.spec
Adds CUDA build validation, numpy pin helper, runtime flags, hidden import for VoiceDesign backend, metadata copies, and sets EXE console=False; writes CUDA manifest for CUDA builds.
Misc routes & headers
backend/routes/audio.py, backend/services/history.py
Adds Cache-Control: no-store to audio FileResponses and filters dubbing_segment-sourced generations from public history.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • rhmod09-dev

Poem

"I’m a rabbit with a tiny mic,
I stitched SRTs and prompts alike,
Tracks snap, models hum, timelines blend,
Clips split, regenerations send,
Hop, export, sing — dubbing’s delight!"

sequenceDiagram
  participant Client
  participant API as Voicebox API (/dubbing)
  participant DB
  participant TTS as TTS Backend (qwen_voice_design / others)
  participant STT as STT Backend

  Client->>API: POST /import-srt (file)
  API->>DB: create DubbingProject + DubbingSegments
  Client->>API: POST /projects/{id}/segments/{seg}/generate
  API->>TTS: ensure_model_cached_or_raise -> queue run_generation(engine, model_size)
  API->>STT: (optional) release memory / request timestamps
  TTS-->>API: generation complete (audio path)
  API->>DB: update segment generation_id/status
  Client->>API: GET /projects/{id}/export-audio -> returns WAV blob
Loading
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (8)
SRT2VOICE.md-1151-1153 (1)

1151-1153: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve contradictory NumPy version guidance.

Line 1152 documents validated runtime with numpy 1.26.4, while Line 1995 states the safe version is numpy 2.0.0. Keep one canonical version/range (or clearly separate “validated on machine X” vs “required for packaging”) to avoid build drift.

Also applies to: 1994-1996

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@SRT2VOICE.md` around lines 1151 - 1153, The docs currently contradict on
NumPy: one spot lists "numpy 1.26.4" and another lists "numpy 2.0.0"; pick a
single canonical statement or explicitly separate "validated on X" vs "required
for packaging". Update both occurrences that mention "numpy 1.26.4" and "numpy
2.0.0" to either the chosen canonical version/range or to a clear pair of
statements (e.g., "Validated runtime: numpy 1.26.4 on GPU machine" and "Required
for packaging: numpy >=2.0.0") and make the language consistent across the
document so both places match the chosen approach.
voicedesign.md-229-237 (1)

229-237: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align NumPy pin with the other integration docs.

Line 233 pins numpy==2.0.2, which conflicts with the NumPy guidance elsewhere in this PR docs. Please unify to one documented compatibility target (or explicitly scope by environment) so packaging instructions stay reproducible.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@voicedesign.md` around lines 229 - 237, The NumPy pin "numpy==2.0.2" in the
voicedesign.md snippet conflicts with other PR docs; update the document so the
NumPy version is consistent across all integration docs (either change
"numpy==2.0.2" to the agreed common pin used elsewhere or explicitly scope the
pin by environment, e.g., "Windows build: numpy==2.0.2" vs. other environments),
and mention the compatibility constraint with Numba/qwen_tts next to the pin to
keep packaging instructions reproducible.
backend/voicebox-server.spec-85-85 (1)

85-85: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded console=False may break log capture on macOS/Linux.

The build_binary.py only adds --noconsole on Windows because "On macOS/Linux the sidecar needs stdout/stderr for Tauri to capture logs" (lines 87-88). If this spec file is used directly on non-Windows platforms, log capture would break.

If this spec is Windows-specific, consider adding a comment to clarify. If it should work cross-platform, consider keeping console=True (the dynamic build_binary.py will handle Windows-specific console hiding).

Option 1: Add clarifying comment
 exe = EXE(
     pyz,
     a.scripts,
     a.binaries,
     a.datas,
     [],
-    name='voicebox-server',
+    name='voicebox-server',  # Windows-specific spec; use build_binary.py for cross-platform builds
     debug=False,
     bootloader_ignore_signals=False,
     strip=False,
     upx=True,
     upx_exclude=[],
     runtime_tmpdir=None,
     console=False,
Option 2: Restore console=True for cross-platform compatibility
     runtime_tmpdir=None,
-    console=False,
+    console=True,
     disable_windowed_traceback=False,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/voicebox-server.spec` at line 85, The spec hardcodes console=False
which will disable stdout/stderr capture on macOS/Linux; either restore
console=True in backend/voicebox-server.spec so cross-platform builds preserve
logs (the build_binary.py already injects --noconsole for Windows), or if this
spec is intended only for Windows add a clarifying comment above the
console=False line noting it is Windows-specific and that build_binary.py
handles console suppression on Windows via --noconsole; update the file
accordingly to prevent breaking log capture on non-Windows platforms.
app/src/components/Sidebar.tsx-25-25 (1)

25-25: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Localize the new tab label instead of hard-coding English.

Line 25 uses label: 'Dubbing', which bypasses i18n while other tabs use labelKey.

Suggested fix
-  { id: 'dubbing', path: '/dubbing', icon: Clapperboard, label: 'Dubbing' },
+  { id: 'dubbing', path: '/dubbing', icon: Clapperboard, labelKey: 'nav.dubbing' },

Add nav.dubbing to locale resources.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/Sidebar.tsx` at line 25, The new sidebar entry currently
hard-codes label: 'Dubbing' in the Sidebar component; change that to use the
same i18n pattern as the other tabs (replace the literal label with labelKey:
'nav.dubbing' on the object with id 'dubbing') and add the corresponding
"nav.dubbing" key/value to your locale resource files so translations are
available for all supported languages.
app/src/components/VoiceProfiles/ProfileForm.tsx-616-621 (1)

616-621: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Localize the new VoiceDesign copy instead of hardcoding strings.

These new user-facing strings bypass t(...) and include a hardcoded language assumption (“French”), which breaks localization consistency and can mislead users when another language is selected.

Also applies to: 656-657, 969-970, 1034-1054, 1228-1236, 1359-1369

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/VoiceProfiles/ProfileForm.tsx` around lines 616 - 621, The
new user-facing strings in ProfileForm.tsx are hardcoded (e.g., the toast
title/description "Voice description required" and "Describe the French
voice...") and must be localized; replace these literal strings with calls to
the translation function (t(...)) and use interpolation for the language name
(e.g., t('voiceDesign.descriptionRequired') and
t('voiceDesign.describeLanguage', { language: selectedLanguage })) so the
message reflects the current locale instead of assuming “French”; apply the same
change pattern to the other occurrences referenced (lines ~656, 969, 1034-1054,
1228-1236, 1359-1369) and reference the component state/prop that holds the
chosen language (e.g., selectedLanguage or voice.language) when building the
interpolated translation.
backend/utils/audio.py-139-147 (1)

139-147: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Probe non-Windows bundled FFmpeg paths too.

find_ffmpeg() only checks bundled *.exe names. On macOS/Linux packaged deployments with local ffmpeg, this returns None and disables file-based stretching unnecessarily.

Suggested patch
 def find_ffmpeg() -> str | None:
     """Find a local ffmpeg executable without downloading anything."""
     env_path = os.environ.get("VOICEBOX_FFMPEG_PATH")
     candidates = [
         Path(env_path) if env_path else None,
+        Path(sys.executable).resolve().parent / "ffmpeg",
+        Path(sys.executable).resolve().parent / "_internal" / "ffmpeg",
         Path(sys.executable).resolve().parent / "ffmpeg.exe",
         Path(sys.executable).resolve().parent / "_internal" / "ffmpeg.exe",
         Path("C:/ffmpeg/bin/ffmpeg.exe"),
     ]

Also applies to: 153-156

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/utils/audio.py` around lines 139 - 147, find_ffmpeg() currently only
probes Windows-style bundled names (ffmpeg.exe) so packaged macOS/Linux installs
are missed; update find_ffmpeg() to also check for non-.exe binaries and common
Unix locations and PATH. Specifically, in function find_ffmpeg() add candidate
Paths without the .exe suffix (e.g., sibling "ffmpeg" and "_internal/ffmpeg"),
include standard locations like /usr/bin and /usr/local/bin, and fall back to
shutil.which("ffmpeg") before returning None. Apply the same changes to the
analogous probe that checks ffprobe (the block referenced around lines 153-156 /
the find_ffprobe logic) so both ffmpeg and ffprobe are discovered on non-Windows
systems.
backend/routes/dubbing.py-726-739 (1)

726-739: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't count stale resets as cancelled tasks.

cancelled is incremented before checking whether cancel_generation_job() actually found a live task. When it returns None, the response still reports an active task was cancelled even though this code only reset stale DB state.

🐛 Proposed fix
-        cancellation_state = cancel_generation_job(generation.id)
-        cancelled += 1
-        if cancellation_state is not None:
+        cancellation_state = cancel_generation_job(generation.id)
+        if cancellation_state is not None:
+            cancelled += 1
             task_manager.complete_generation(generation.id)
@@
-        cancellation_state = cancel_generation_job(full_narration.id)
-        cancelled += 1
-        if cancellation_state is not None:
+        cancellation_state = cancel_generation_job(full_narration.id)
+        if cancellation_state is not None:
+            cancelled += 1
             task_manager.complete_generation(full_narration.id)

Also applies to: 748-760

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/routes/dubbing.py` around lines 726 - 739, The code increments
cancelled before verifying cancel_generation_job(generation.id) actually found a
live task, so change the logic to only increment the cancelled counter when
cancellation_state is not None (i.e., after a successful cancellation and
task_manager.complete_generation(generation.id) call); if cancel_generation_job
returns None, do not increment cancelled and still call
history.update_generation_status(...) with the "Stale generation reset by user"
error. Apply the same fix to the other similar block that uses
cancel_generation_job, cancelled, task_manager.complete_generation, and
history.update_generation_status so stale resets are not counted as cancelled
tasks.
backend/routes/dubbing.py-149-166 (1)

149-166: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include acceptable fits in the project summary.

fit_status="acceptable" is part of the segment contract, but this aggregate ignores it. A project whose segments are all acceptable can end up with segment_count > 0 while every fit bucket stays at zero.

🐛 Proposed fix
-    exact_count = sum(1 for segment in segments if segment.fit_status == "exact")
+    exact_count = sum(
+        1 for segment in segments if segment.fit_status in {"exact", "acceptable"}
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/routes/dubbing.py` around lines 149 - 166, The project summary in
_serialize_project_list_item ignores segments with fit_status == "acceptable",
so add an acceptable_count = sum(1 for segment in segments if segment.fit_status
== "acceptable") and pass acceptable_count into the
models.DubbingProjectListItemResponse construction (alongside exact_count,
warning_count, failed_count, pending_count) so acceptable fits are reflected in
the aggregated response.
🧹 Nitpick comments (5)
backend/build_binary.py (1)

51-55: 💤 Low value

Consider adding a log statement for build observability.

The quiet -q flag suppresses pip output, but a log message before the command would help operators understand what's happening during longer builds.

Suggested logging
 def pin_numpy_for_numba():
     """Keep qwen_tts/numba compatible after torch wheel swaps."""
     import subprocess
 
+    logger.info("Pinning numpy==2.0.0 for numba compatibility...")
     subprocess.run([sys.executable, "-m", "pip", "install", "numpy==2.0.0", "-q"], check=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/build_binary.py` around lines 51 - 55, Add an informational log
message in pin_numpy_for_numba before the subprocess.run call so build logs show
that numpy==2.0.0 is being installed for numba compatibility; within that
function (pin_numpy_for_numba) obtain a logger (e.g.,
logging.getLogger(__name__) or use the module's existing logger) and call
logger.info(...) with a short descriptive message immediately before the
subprocess.run([sys.executable, "-m", "pip", "install", "numpy==2.0.0", "-q"],
check=True) invocation.
backend/services/cuda.py (1)

381-385: ⚡ Quick win

Update the function docstring to match the new manual-update behavior.

The implementation now warns and exits, but the docstring still states auto-download behavior.

Suggested patch
 async def check_and_update_cuda_binary():
-    """Check if the CUDA binary is outdated and auto-download if so.
+    """Check if the CUDA binary is outdated and log manual-update guidance.

     Called on server startup. Checks both server version and CUDA libs
-    version. Downloads only what's needed.
+    version. In this fork, automatic CUDA downloads are disabled.
     """

Also applies to: 405-409

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/cuda.py` around lines 381 - 385, Update the docstring for
the function currently titled "Check if the CUDA binary is outdated and
auto-download if so." to reflect the new manual-update behavior: remove any
claim that binaries are auto-downloaded and explicitly state that the function
now warns and exits when CUDA/server versions are incompatible and that manual
intervention is required; apply the same docstring wording change to the second
similar docstring instance present later in the file (the one around the second
occurrence of the same check).
backend/database/migrations.py (1)

298-298: ⚡ Quick win

Remove the unreachable duplicate cleanup block.

The if False and ... guard makes this branch permanently dead, so the legacy default_intent drop logic now exists in two places. Keeping the second copy around is confusing and ties columns to whichever dubbing-table branch ran last.

♻️ Proposed cleanup
-    if False and "default_intent" in columns:
-        if _supports_drop_column(engine):
-            with engine.connect() as conn:
-                conn.execute(text("ALTER TABLE mcp_client_bindings DROP COLUMN default_intent"))
-                conn.commit()
-            logger.info("Dropped legacy default_intent column from mcp_client_bindings")
-        else:
-            # ALTER TABLE … DROP COLUMN on SQLite requires 3.35+ (Mar
-            # 2021). Production PyInstaller builds bundle Python 3.12
-            # which links to SQLite 3.40+; this branch only fires for
-            # dev environments running the backend directly against an
-            # old system SQLite (Ubuntu 20.04 = 3.31, Debian 11 = 3.34).
-            # Leaving the unused column in place is harmless — the ORM
-            # only maps declared columns, so a stray one does no work
-            # and gets no reads or writes.
-            logger.warning(
-                "SQLite %s too old to DROP COLUMN (need 3.35+); leaving unused default_intent column on mcp_client_bindings in place.",
-                sqlite3.sqlite_version,
-            )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/database/migrations.py` at line 298, Remove the dead duplicate
cleanup block guarded by "if False and 'default_intent' in columns" so the
legacy drop logic for default_intent only exists in the primary cleanup path;
delete the unreachable if-statement and its body (the branch that checks for
"default_intent" under the False guard) to avoid coupling the columns variable
to whichever dubbing-table branch executed last and eliminate the redundant drop
logic.
backend/services/srt_parser.py (2)

9-9: 💤 Low value

Consider adding range validation for minutes and seconds.

The regex allows minutes and seconds to be 00-99, but valid time values require minutes < 60 and seconds < 60. While most SRT files will have valid ranges, malformed input like 00:99:99,999 would pass regex validation but represent an invalid time.

The current implementation is functional (calculations still work), but adding validation would improve error messages for malformed SRT files.

🔍 Optional enhancement: stricter timecode validation
-TIMECODE_RE = re.compile(r"^(?P<h>\d{2}):(?P<m>\d{2}):(?P<s>\d{2}),(?P<ms>\d{3})$")
+TIMECODE_RE = re.compile(r"^(?P<h>\d{2}):(?P<m>[0-5]\d):(?P<s>[0-5]\d),(?P<ms>\d{3})$")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/srt_parser.py` at line 9, The TIMECODE_RE currently allows
minutes and seconds 00-99; update the pattern (the TIMECODE_RE constant) to
constrain minutes and seconds to 00-59 (e.g., use a range like [0-5]\d) so
malformed timecodes like "00:99:99,999" are rejected; adjust any error handling
around the regex match in srt_parser.py to raise/return a clear validation error
when the match fails so callers get a descriptive message about invalid timecode
ranges.

60-62: ⚡ Quick win

Position metadata is non-standard; consider if this edge case warrants handling.

SRT position metadata (e.g., X1:40 X2:600 Y1:20 Y2:50) is not part of the official SRT specification and lacks broad support across players. While technically the current code would fail if such metadata were present—because TIMECODE_RE expects the timecode to end after milliseconds ($ anchor)—this is a non-standard edge case. If your use case requires handling non-standard SRT variants, the proposed fix to strip position data is reasonable. Otherwise, this is unlikely to occur in typical SRT files.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/srt_parser.py` around lines 60 - 62, The parser currently
assumes lines[1] contains only "start --> end" and will fail if non-standard SRT
position metadata is present; update the logic around lines[1] / the variables
start_tc and end_tc to strip any trailing position metadata before parsing
(e.g., split each timecode on whitespace or apply a regex to capture only the
HH:MM:SS,mmm portion), then pass the cleaned start_tc and end_tc into
parse_timecode_to_ms; also ensure TIMECODE_RE usage (if present) matches only
the timecode portion rather than the whole token so parse_timecode_to_ms
receives valid input.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src/components/AudioTimeline/AudioTrackEditor.tsx`:
- Around line 571-622: The icon-only transport and zoom Buttons (the Button with
onClick={onPlayPause}, the Button with onClick={onStop}, the Button with
onClick={handleZoomOut}, and the Button with onClick={handleZoomIn}) need
accessible names; add aria-label attributes (or a dynamic aria-label for the
play/pause button that toggles between "Play" and "Pause" based on isPlaying) to
each Button (e.g., aria-label="Play"/"Pause", aria-label="Stop",
aria-label="Zoom out", aria-label="Zoom in"), ensuring the Button component
supports aria-label and keeping existing title props if present.
- Around line 762-774: The playhead div with role="slider" is not
keyboard-focusable or operable; make it accessible by adding tabIndex={0} and
wiring an onKeyDown handler (e.g., onPlayheadKeyDown) that maps
ArrowLeft/ArrowRight (small seeks), PageUp/PageDown (larger seeks), Home/End
(start/end) to update currentTimeMs via the same state/update function used by
handlePlayheadMouseDown (or the component's seek dispatcher), clamp values
between 0 and totalDurationMs, and update aria-valuenow (and optionally
aria-valuetext) accordingly so msToPixels/currentTimeMs/totalDurationMs remain
the single source of truth for position.

In `@app/src/components/Generation/FloatingGenerateBox.tsx`:
- Around line 175-179: The early return after setting engine/modelSize for
designed voices (the block using selectedProfile?.voice_type === 'designed' and
form.setValue('engine', 'qwen_voice_design'); form.setValue('modelSize',
'1.7B')) prevents the downstream profile/state synchronization
(effects/personality sync) from running; remove the return and let execution
continue so the subsequent sync logic that reads selectedProfile and updates
effect/personality state runs. Also apply the same change to the analogous
branch later (the block around the other designed-profile handling between the
~193-219 area) so both designed-profile paths perform engine/model adjustments
but do not short-circuit the rest of the sync.

In `@backend/app.py`:
- Around line 244-253: The reset SQL in the db.execute call (the sa_text passed
to db.execute that assigns reset_result) omits generations with status 'failed',
so update the subquery WHERE clause to include 'failed' alongside 'generating'
and 'loading_model' (e.g., WHERE status IN
('generating','loading_model','failed') AND source = 'dubbing_segment') so stale
dubbing_segments linked to failed generations are reset; edit the SQL string
inside the reset_result = db.execute(...) call in app.py accordingly.

In `@backend/backends/__init__.py`:
- Around line 783-786: The finally block currently returns True which suppresses
any exceptions from backend.unload_model(); to fix this, remove the return from
the finally block in drop_tts_backend_for_engine and instead return True only on
successful unload (place return True inside the try after
backend.unload_model()), allow exceptions to propagate (or catch/log and
re-raise) so failures are not masked; reference backend.unload_model() and the
function drop_tts_backend_for_engine to locate and apply the change.

In `@backend/backends/pytorch_backend.py`:
- Around line 416-422: The pipeline call passes device=self.device which will
break on Windows DirectML backends; either remove the device argument (since the
model is already loaded on the correct device) or only pass device when it is
not a DirectML device. Update the code around the pipeline(...) creation (the
call to pipeline with model=self.model, tokenizer=self.processor.tokenizer,
feature_extractor=self.processor.feature_extractor) to omit device=self.device,
or add a small conditional that detects DirectML (e.g., check str(self.device)
for "dml" or self.device.type == "dml") and only include device=self.device when
that check is false.

In `@backend/models.py`:
- Around line 457-473: DubbingAutoFitRequest is missing the temperature field so
user-supplied temperature gets ignored; add a temperature attribute to
DubbingAutoFitRequest (matching the shape/validation used in
DubbingSegmentGenerateRequest) with the same default, type, and constraints so
incoming auto-fit requests preserve the caller's temperature setting (update the
class DubbingAutoFitRequest and ensure its Field parameters mirror those in
DubbingSegmentGenerateRequest).

In `@backend/routes/dubbing.py`:
- Around line 187-190: When decoding the uploaded SRT bytes (variable raw) the
code currently falls back from "utf-8-sig" straight to "cp1252" which will
silently produce garbage for UTF-16 files; update the decode fallback in the
try/except around raw.decode(...) so that on UnicodeDecodeError it next attempts
raw.decode("utf-16") before finally falling back to raw.decode("cp1252"),
keeping the same exception handling for UnicodeDecodeError and assigning the
result to content.

In `@backend/routes/generations.py`:
- Around line 166-167: Replace the hardcoded fallback gen.model_size or "1.7B"
before calling ensure_model_cached_or_raise by computing an engine-aware model
size like generate_speech does: if gen.model_size is present use it; else if
gen.engine is "tada" use "1B"; else if the engine is size-aware use "1.7B";
otherwise leave size empty/no-size. Then call ensure_model_cached_or_raise with
gen.engine (or "qwen" default) and that computed model_size. Apply the same
change for the other occurrence referenced (the retry/regenerate path).

In `@backend/services/generation.py`:
- Around line 163-170: The current finally path unconditionally calls
drop_tts_backend_for_engine(engine) when unload_after is True, which can tear
down a shared backend used by other concurrent generations; change this to a
safe unload strategy: do not directly call drop_tts_backend_for_engine(engine)
from the task finally block — instead use a backend lifecycle manager or
reference-counting API (e.g., increment/decrement active users) and only call
drop_tts_backend_for_engine when the manager confirms no other tasks are using
that engine, or acquire a per-engine lock and verify exclusivity before
unloading; update the code around unload_after and the call to
drop_tts_backend_for_engine(engine) to consult the manager/refcount or lock
before performing the actual unload.

In `@backend/utils/chunked_tts.py`:
- Around line 252-260: The call to backend.generate forwards temperature
unconditionally; to avoid TypeError for backends whose generate() doesn't accept
a temperature kwarg, inspect the backend.generate signature before adding
temperature to extra_kwargs (e.g., use inspect.signature(backend.generate) and
only include "temperature" when it's a parameter or check a capability flag like
backend.supports_temperature). Apply the same guard at both locations where
extra_kwargs is built (the block around the backend.generate call shown and the
similar block at 287-295) so temperature is only forwarded to backends that
declare support.

In `@tauri/src-tauri/src/main.rs`:
- Around line 243-257: The loop currently treats state.child presence as
readiness and can wait past a failed starter; change the completion checks so
success is only returned when check_health(SERVER_PORT) returns true, and add an
early exit that returns Err if the startup marker (state.starting) is
false/cleared while health is still down so waiters stop instead of timing out;
update the loop in the startup_guard.is_none() branch to remove reliance on
state.child.lock().unwrap().is_some() for readiness and to inspect
state.starting (the startup flag) before sleeping/continuing.

---

Minor comments:
In `@app/src/components/Sidebar.tsx`:
- Line 25: The new sidebar entry currently hard-codes label: 'Dubbing' in the
Sidebar component; change that to use the same i18n pattern as the other tabs
(replace the literal label with labelKey: 'nav.dubbing' on the object with id
'dubbing') and add the corresponding "nav.dubbing" key/value to your locale
resource files so translations are available for all supported languages.

In `@app/src/components/VoiceProfiles/ProfileForm.tsx`:
- Around line 616-621: The new user-facing strings in ProfileForm.tsx are
hardcoded (e.g., the toast title/description "Voice description required" and
"Describe the French voice...") and must be localized; replace these literal
strings with calls to the translation function (t(...)) and use interpolation
for the language name (e.g., t('voiceDesign.descriptionRequired') and
t('voiceDesign.describeLanguage', { language: selectedLanguage })) so the
message reflects the current locale instead of assuming “French”; apply the same
change pattern to the other occurrences referenced (lines ~656, 969, 1034-1054,
1228-1236, 1359-1369) and reference the component state/prop that holds the
chosen language (e.g., selectedLanguage or voice.language) when building the
interpolated translation.

In `@backend/routes/dubbing.py`:
- Around line 726-739: The code increments cancelled before verifying
cancel_generation_job(generation.id) actually found a live task, so change the
logic to only increment the cancelled counter when cancellation_state is not
None (i.e., after a successful cancellation and
task_manager.complete_generation(generation.id) call); if cancel_generation_job
returns None, do not increment cancelled and still call
history.update_generation_status(...) with the "Stale generation reset by user"
error. Apply the same fix to the other similar block that uses
cancel_generation_job, cancelled, task_manager.complete_generation, and
history.update_generation_status so stale resets are not counted as cancelled
tasks.
- Around line 149-166: The project summary in _serialize_project_list_item
ignores segments with fit_status == "acceptable", so add an acceptable_count =
sum(1 for segment in segments if segment.fit_status == "acceptable") and pass
acceptable_count into the models.DubbingProjectListItemResponse construction
(alongside exact_count, warning_count, failed_count, pending_count) so
acceptable fits are reflected in the aggregated response.

In `@backend/utils/audio.py`:
- Around line 139-147: find_ffmpeg() currently only probes Windows-style bundled
names (ffmpeg.exe) so packaged macOS/Linux installs are missed; update
find_ffmpeg() to also check for non-.exe binaries and common Unix locations and
PATH. Specifically, in function find_ffmpeg() add candidate Paths without the
.exe suffix (e.g., sibling "ffmpeg" and "_internal/ffmpeg"), include standard
locations like /usr/bin and /usr/local/bin, and fall back to
shutil.which("ffmpeg") before returning None. Apply the same changes to the
analogous probe that checks ffprobe (the block referenced around lines 153-156 /
the find_ffprobe logic) so both ffmpeg and ffprobe are discovered on non-Windows
systems.

In `@backend/voicebox-server.spec`:
- Line 85: The spec hardcodes console=False which will disable stdout/stderr
capture on macOS/Linux; either restore console=True in
backend/voicebox-server.spec so cross-platform builds preserve logs (the
build_binary.py already injects --noconsole for Windows), or if this spec is
intended only for Windows add a clarifying comment above the console=False line
noting it is Windows-specific and that build_binary.py handles console
suppression on Windows via --noconsole; update the file accordingly to prevent
breaking log capture on non-Windows platforms.

In `@SRT2VOICE.md`:
- Around line 1151-1153: The docs currently contradict on NumPy: one spot lists
"numpy 1.26.4" and another lists "numpy 2.0.0"; pick a single canonical
statement or explicitly separate "validated on X" vs "required for packaging".
Update both occurrences that mention "numpy 1.26.4" and "numpy 2.0.0" to either
the chosen canonical version/range or to a clear pair of statements (e.g.,
"Validated runtime: numpy 1.26.4 on GPU machine" and "Required for packaging:
numpy >=2.0.0") and make the language consistent across the document so both
places match the chosen approach.

In `@voicedesign.md`:
- Around line 229-237: The NumPy pin "numpy==2.0.2" in the voicedesign.md
snippet conflicts with other PR docs; update the document so the NumPy version
is consistent across all integration docs (either change "numpy==2.0.2" to the
agreed common pin used elsewhere or explicitly scope the pin by environment,
e.g., "Windows build: numpy==2.0.2" vs. other environments), and mention the
compatibility constraint with Numba/qwen_tts next to the pin to keep packaging
instructions reproducible.

---

Nitpick comments:
In `@backend/build_binary.py`:
- Around line 51-55: Add an informational log message in pin_numpy_for_numba
before the subprocess.run call so build logs show that numpy==2.0.0 is being
installed for numba compatibility; within that function (pin_numpy_for_numba)
obtain a logger (e.g., logging.getLogger(__name__) or use the module's existing
logger) and call logger.info(...) with a short descriptive message immediately
before the subprocess.run([sys.executable, "-m", "pip", "install",
"numpy==2.0.0", "-q"], check=True) invocation.

In `@backend/database/migrations.py`:
- Line 298: Remove the dead duplicate cleanup block guarded by "if False and
'default_intent' in columns" so the legacy drop logic for default_intent only
exists in the primary cleanup path; delete the unreachable if-statement and its
body (the branch that checks for "default_intent" under the False guard) to
avoid coupling the columns variable to whichever dubbing-table branch executed
last and eliminate the redundant drop logic.

In `@backend/services/cuda.py`:
- Around line 381-385: Update the docstring for the function currently titled
"Check if the CUDA binary is outdated and auto-download if so." to reflect the
new manual-update behavior: remove any claim that binaries are auto-downloaded
and explicitly state that the function now warns and exits when CUDA/server
versions are incompatible and that manual intervention is required; apply the
same docstring wording change to the second similar docstring instance present
later in the file (the one around the second occurrence of the same check).

In `@backend/services/srt_parser.py`:
- Line 9: The TIMECODE_RE currently allows minutes and seconds 00-99; update the
pattern (the TIMECODE_RE constant) to constrain minutes and seconds to 00-59
(e.g., use a range like [0-5]\d) so malformed timecodes like "00:99:99,999" are
rejected; adjust any error handling around the regex match in srt_parser.py to
raise/return a clear validation error when the match fails so callers get a
descriptive message about invalid timecode ranges.
- Around line 60-62: The parser currently assumes lines[1] contains only "start
--> end" and will fail if non-standard SRT position metadata is present; update
the logic around lines[1] / the variables start_tc and end_tc to strip any
trailing position metadata before parsing (e.g., split each timecode on
whitespace or apply a regex to capture only the HH:MM:SS,mmm portion), then pass
the cleaned start_tc and end_tc into parse_timecode_to_ms; also ensure
TIMECODE_RE usage (if present) matches only the timecode portion rather than the
whole token so parse_timecode_to_ms receives valid input.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41f70470-aac5-45cd-bcb1-037134563413

📥 Commits

Reviewing files that changed from the base of the PR and between b35b909 and f144047.

📒 Files selected for processing (52)
  • .gitignore
  • SRT2VOICE.md
  • app/src/components/AudioTimeline/AudioTrackEditor.tsx
  • app/src/components/AudioTimeline/ClipWaveform.tsx
  • app/src/components/AudioTimeline/TimelineScrollbar.tsx
  • app/src/components/DubbingTab/DubbingTab.tsx
  • app/src/components/Generation/EngineModelSelector.tsx
  • app/src/components/Generation/FloatingGenerateBox.tsx
  • app/src/components/ServerSettings/ModelManagement.tsx
  • app/src/components/Sidebar.tsx
  • app/src/components/StoriesTab/StoryTrackEditor.tsx
  • app/src/components/VoiceProfiles/ProfileForm.tsx
  • app/src/components/VoiceProfiles/ProfileList.tsx
  • app/src/lib/api/client.ts
  • app/src/lib/api/types.ts
  • app/src/lib/constants/languages.ts
  • app/src/lib/hooks/useGenerationForm.ts
  • app/src/router.tsx
  • app/src/stores/uiStore.ts
  • backend/app.py
  • backend/backends/__init__.py
  • backend/backends/hume_backend.py
  • backend/backends/kokoro_backend.py
  • backend/backends/luxtts_backend.py
  • backend/backends/mlx_backend.py
  • backend/backends/pytorch_backend.py
  • backend/backends/qwen_custom_voice_backend.py
  • backend/backends/qwen_voice_design_backend.py
  • backend/build_binary.py
  • backend/database/__init__.py
  • backend/database/migrations.py
  • backend/database/models.py
  • backend/models.py
  • backend/routes/__init__.py
  • backend/routes/audio.py
  • backend/routes/dubbing.py
  • backend/routes/generations.py
  • backend/routes/models.py
  • backend/services/cuda.py
  • backend/services/dubbing.py
  • backend/services/generation.py
  • backend/services/history.py
  • backend/services/profiles.py
  • backend/services/srt_parser.py
  • backend/utils/audio.py
  • backend/utils/cache.py
  • backend/utils/chunked_tts.py
  • backend/voicebox-server.spec
  • conformitycheck.md
  • denoiser.md
  • tauri/src-tauri/src/main.rs
  • voicedesign.md

Comment thread app/src/components/AudioTimeline/AudioTrackEditor.tsx Outdated
Comment thread app/src/components/AudioTimeline/AudioTrackEditor.tsx
Comment thread app/src/components/AudioTimeline/TimelineScrollbar.tsx
Comment thread app/src/components/Generation/FloatingGenerateBox.tsx
Comment thread backend/app.py Outdated
Comment thread backend/routes/dubbing.py Outdated
Comment thread backend/routes/generations.py Outdated
Comment on lines +163 to +170
if unload_after:
voice_prompt = None
audio = None
try:
from ..backends import drop_tts_backend_for_engine

drop_tts_backend_for_engine(engine)
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid unloading shared engine state from a single task’s finally path.

drop_tts_backend_for_engine(engine) runs unconditionally when unload_after=True. If multiple generations share the same backend instance, one task can unload it while another is still generating, causing intermittent failures.

Also applies to: 172-179

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/services/generation.py` around lines 163 - 170, The current finally
path unconditionally calls drop_tts_backend_for_engine(engine) when unload_after
is True, which can tear down a shared backend used by other concurrent
generations; change this to a safe unload strategy: do not directly call
drop_tts_backend_for_engine(engine) from the task finally block — instead use a
backend lifecycle manager or reference-counting API (e.g., increment/decrement
active users) and only call drop_tts_backend_for_engine when the manager
confirms no other tasks are using that engine, or acquire a per-engine lock and
verify exclusivity before unloading; update the code around unload_after and the
call to drop_tts_backend_for_engine(engine) to consult the manager/refcount or
lock before performing the actual unload.

Comment thread backend/utils/chunked_tts.py Outdated
Comment on lines +243 to +257
if startup_guard.is_none() {
let timeout = tokio::time::Duration::from_secs(120);
let start_time = tokio::time::Instant::now();

loop {
if check_health(SERVER_PORT) || state.child.lock().unwrap().is_some() {
return Ok(format!("http://127.0.0.1:{}", SERVER_PORT));
}

if start_time.elapsed() > timeout {
return Err("Server startup already in progress but did not complete in time".to_string());
}

tokio::time::sleep(tokio::time::Duration::from_millis(250)).await;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix concurrent wait completion criteria to avoid false-ready and timeout stalls.

This wait branch can return success before the server is actually ready (state.child set), and can also keep waiting the full timeout after the starter already failed (starting reset, health still down). Success should be health-based, and waiters should stop once startup is no longer in progress but health is still failing.

Suggested patch
     if startup_guard.is_none() {
         let timeout = tokio::time::Duration::from_secs(120);
         let start_time = tokio::time::Instant::now();

         loop {
-            if check_health(SERVER_PORT) || state.child.lock().unwrap().is_some() {
+            if check_health(SERVER_PORT) {
                 return Ok(format!("http://127.0.0.1:{}", SERVER_PORT));
             }
+            if !*state.starting.lock().unwrap() {
+                return Err("Concurrent startup finished without a healthy server".to_string());
+            }

             if start_time.elapsed() > timeout {
                 return Err("Server startup already in progress but did not complete in time".to_string());
             }

             tokio::time::sleep(tokio::time::Duration::from_millis(250)).await;
         }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if startup_guard.is_none() {
let timeout = tokio::time::Duration::from_secs(120);
let start_time = tokio::time::Instant::now();
loop {
if check_health(SERVER_PORT) || state.child.lock().unwrap().is_some() {
return Ok(format!("http://127.0.0.1:{}", SERVER_PORT));
}
if start_time.elapsed() > timeout {
return Err("Server startup already in progress but did not complete in time".to_string());
}
tokio::time::sleep(tokio::time::Duration::from_millis(250)).await;
}
if startup_guard.is_none() {
let timeout = tokio::time::Duration::from_secs(120);
let start_time = tokio::time::Instant::now();
loop {
if check_health(SERVER_PORT) {
return Ok(format!("http://127.0.0.1:{}", SERVER_PORT));
}
if !*state.starting.lock().unwrap() {
return Err("Concurrent startup finished without a healthy server".to_string());
}
if start_time.elapsed() > timeout {
return Err("Server startup already in progress but did not complete in time".to_string());
}
tokio::time::sleep(tokio::time::Duration::from_millis(250)).await;
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tauri/src-tauri/src/main.rs` around lines 243 - 257, The loop currently
treats state.child presence as readiness and can wait past a failed starter;
change the completion checks so success is only returned when
check_health(SERVER_PORT) returns true, and add an early exit that returns Err
if the startup marker (state.starting) is false/cleared while health is still
down so waiters stop instead of timing out; update the loop in the
startup_guard.is_none() branch to remove reliance on
state.child.lock().unwrap().is_some() for readiness and to inspect
state.starting (the startup flag) before sleeping/continuing.

@Trucabulles
Copy link
Copy Markdown
Author

@coderabbitai review the last commit and last files on github I added woth the modifications. There is also a debug.md to follow what's been modified and what's not and why so you can auto complete

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

I'll fetch the latest commit details and read debug.md to guide the review.


🏁 Script executed:

#!/bin/bash
# Get the last commit on the PR branch
gh pr view 673 --repo jamiepine/voicebox --json headRefName,headRefOid,commits | jq '{headRefName, headRefOid, lastCommit: .commits[-1]}'

# List files changed in the last commit
LAST_COMMIT=$(gh pr view 673 --repo jamiepine/voicebox --json commits | jq -r '.commits[-1].oid')
echo "Last commit SHA: $LAST_COMMIT"
git log --name-status -1 $LAST_COMMIT 2>/dev/null || echo "Git log not available, using gh api"
gh api repos/jamiepine/voicebox/commits/$LAST_COMMIT --jq '.files[] | {filename: .filename, status: .status, additions: .additions, deletions: .deletions}'

Length of output: 1078


🏁 Script executed:

#!/bin/bash
cat debug.md

echo "---DUBBINGTAB---"
cat app/src/components/DubbingTab/DubbingTab.tsx

Length of output: 158706


@Trucabulles Here is the review of the last commit (d4fda1b"chore: document debug follow-up and clarify SRT2Voice hint").


Files changed

debug.md (added, 600 lines)

This is a thorough and well-structured stabilization log. All 25 integrated fixes are correctly listed with their CodeRabbit label, action taken, and validation status. The 4 deferred items each have clear rationale and a recommended re-evaluation point. No issues with the document itself.

One minor note on the Watch List → CUDA / VRAM policy section:

"no hidden server restart should be used as the normal unload strategy"

This is consistent with AUTO_RESTART_SERVER_FOR_VRAM_RELEASE = false in DubbingTab.tsx, so the code and the doc are aligned. ✅


app/src/components/DubbingTab/DubbingTab.tsx (+1 / -1)

Change: The ensureVoiceSelected toast description was updated to:

"Select a Qwen cloned, CustomVoice, or VoiceDesign profile before generating."

This is correct — qwen_voice_design profiles are now first-class in the engine/profile compatibility matrix (isProfileCompatibleWithSrt2VoiceEngine), so the hint should mention them.

Minor: cosmetic indentation in ensureVoiceSelected

const ensureVoiceSelected = () => {
    if (selectedProfileId) return true;
        toast({    // ← indented 8 spaces, looks like it's inside the if body
          title: 'Voice required',
          ...
        });
    return false;
};

The function works correctly because of the early return true, but the toast(...) call is visually indented as if it is inside the if block when it is not. Consider aligning it back to 4 spaces for readability:

const ensureVoiceSelected = () => {
  if (selectedProfileId) return true;
  toast({
    title: 'Voice required',
    description: 'Select a Qwen cloned, CustomVoice, or VoiceDesign profile before generating.',
    variant: 'destructive',
  });
  return false;
};

Observations on the full DubbingTab.tsx (current state)

A few non-blocking notes for the watch list:

  1. Language selector is limited to fr/en — The <Select> only exposes French and English. Qwen supports 10 languages and the backend maps them all. This is clearly a deliberate scope reduction for SRT2Voice v1, but it is worth a TODO comment so it does not get forgotten when the module is opened to other languages.

  2. Group pace UI is className="hidden" — The slider/save/reset controls for phrase-group pacing are rendered but hidden. This is intentional (the feature is wired but not exposed). A {/* TODO: expose group pace controls */} comment would make the intent explicit during code review.

  3. AUTO_RESTART_SERVER_FOR_VRAM_RELEASE = false — Consistent with the debug.md VRAM policy. The constant being at the top of the file makes it easy to re-enable when the lifecycle/GPU-refcount issue from the deferred section is resolved. ✅

  4. isPlausibleGenerationElapsed guard — The elapsedMs <= Math.max(30 * 60 * 1000, durationMs * 80) upper-bound is generous (80× duration). This was documented as a guard against stale pre-sidecar values — noted, intentional.


Summary

Area Status
debug.md content ✅ Complete and accurate
ensureVoiceSelected hint text ✅ Correct
Indentation in ensureVoiceSelected ⚠️ Cosmetic — minor cleanup recommended
Language restriction to fr/en ℹ️ Intentional, worth a TODO
Hidden group pace UI ℹ️ Intentional, worth a comment
All debug.md deferred items ✅ Properly documented with rationale

The commit is clean. The only actionable item is the cosmetic indentation fix in ensureVoiceSelected.

@Trucabulles
Copy link
Copy Markdown
Author

adressing this now to integrate this change :
Language selector is limited to fr/en — The only exposes French and English. Qwen supports 10 languages and the backend maps them all. This is clearly a deliberate scope reduction for SRT2Voice v1, but it is worth a TODO comment so it does not get forgotten when the module is opened to other languages.

@Trucabulles
Copy link
Copy Markdown
Author

@coderabbitai review : this has been adressed on the last commit (3/3)
Language selector is limited to fr/en — The only exposes French and English. Qwen supports 10 languages and the backend maps them all. This is clearly a deliberate scope reduction for SRT2Voice v1, but it is worth a TODO comment so it does not get forgotten when the module is opened to other languages.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the last commit SHA
LAST_COMMIT=$(gh pr view 673 --repo jamiepine/voicebox --json commits | jq -r '.commits[-1].oid')
echo "Last commit SHA: $LAST_COMMIT"

# Check the diff for DubbingTab.tsx in the last commit
gh api repos/jamiepine/voicebox/commits/$LAST_COMMIT --jq '.files[] | select(.filename | contains("DubbingTab")) | {filename, patch}'

Length of output: 5748


@Trucabulles This was addressed more thoroughly than just a TODO — the last commit fully implements dynamic language support for all engines. Here's what was done:

  • language state type changed from 'fr' | 'en' to the full LanguageCode union.
  • availableLanguageOptions is now computed via useMemo using getLanguageOptionsForEngine(selectedEngine), with the tada 1B edge case correctly mapped to the LuxTTS language set.
  • The <Select> now dynamically renders all available languages for the active engine via .map() instead of hardcoded French/English items.
  • The language reset effect was simplified to availableLanguageOptions-based validation.
  • The SRT import restore logic uses the new isLanguageCode() guard instead of the old === 'en' || === 'fr' check.

All 10 Qwen languages are now properly available and engine-filtered. ✅

🐇 ✨


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

@Trucabulles
Copy link
Copy Markdown
Author

@coderabbitai I have submitted a bug.md you've reviewed already and in the second, third and last fixed a lot things you mentionned. I now need a list of what's not been fixed. Don't rely on stack comments if you have not properly checked the files first

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

I'll check the actual files before giving you a definitive list. Let me fetch all my review comments and verify the current state of each flagged item.


🏁 Script executed:

#!/bin/bash
# Get all CodeRabbit review comments on PR `#673`
gh api repos/jamiepine/voicebox/pulls/673/comments --paginate \
  --jq '.[] | select(.user.login == "coderabbitai[bot]") | {id: .id, path: .path, line: .line, body: .body[0:300]}'

Length of output: 6028


Now let me check the current state of each flagged file.


🏁 Script executed:

#!/bin/bash
# Check AudioTrackEditor.tsx around the icon-only buttons issue (lines 550-640)
echo "=== AudioTrackEditor.tsx transport buttons ==="
sed -n '550,640p' app/src/components/AudioTimeline/AudioTrackEditor.tsx

echo ""
echo "=== FloatingGenerateBox.tsx around L179 ==="
sed -n '155,210p' app/src/components/Generation/FloatingGenerateBox.tsx

echo ""
echo "=== backend/app.py around L251 ==="
sed -n '235,275p' backend/app.py

echo ""
echo "=== backend/models.py around L473 - DubbingAutoFitRequest ==="
grep -n "DubbingAutoFitRequest" backend/models.py
grep -n -A 15 "class DubbingAutoFitRequest" backend/models.py

echo ""
echo "=== backend/routes/dubbing.py around L190 - UTF encoding ==="
sed -n '175,215p' backend/routes/dubbing.py

echo ""
echo "=== backend/routes/generations.py around L160-185 - retry/regenerate ==="
sed -n '155,200p' backend/routes/generations.py

echo ""
echo "=== backend/services/generation.py around L165-190 ==="
sed -n '150,200p' backend/services/generation.py

echo ""
echo "=== backend/utils/chunked_tts.py around L255-270 ==="
sed -n '1,50p' backend/utils/chunked_tts.py
grep -n "temperature" backend/utils/chunked_tts.py

echo ""
echo "=== tauri/src-tauri/src/main.rs around L245-275 ==="
sed -n '240,285p' tauri/src-tauri/src/main.rs

echo ""
echo "=== backend/backends/__init__.py around L780-800 ==="
sed -n '775,810p' backend/backends/__init__.py

echo ""
echo "=== backend/backends/pytorch_backend.py around L415-435 ==="
sed -n '410,440p' backend/backends/pytorch_backend.py

Length of output: 20135


@Trucabulles — I've checked each flagged file directly. Here is the precise status of every open issue:


❌ Not yet fixed (verified in current files)

1. app/src/components/AudioTimeline/AudioTrackEditor.tsx — Icon-only buttons without accessible names

The Play/Pause, Stop, Zoom Out, and Zoom In buttons still have no aria-label. Example:

<Button type="button" variant="ghost" size="icon" className="h-7 w-7" onClick={onPlayPause}>
  {isPlaying ? <Pause className="h-4 w-4" /> : <Play className="h-4 w-4" />}
</Button>

Each needs e.g. aria-label={isPlaying ? "Pause" : "Play"}, aria-label="Stop", aria-label="Zoom out", aria-label="Zoom in".


2. app/src/components/Generation/FloatingGenerateBox.tsx L~199 — Early return skips effects chain sync for designed profiles

if (selectedProfile?.voice_type === 'designed') {
  form.setValue('engine', 'qwen_voice_design');
  form.setValue('modelSize', '1.7B');
  return;   // ← skips the effects/preset sync block below
}

Switching to a designed profile leaves whatever effects chain was previously active in the form state.


3. backend/app.py L~261 — Stale failed dubbing generations not detached at startup

The reset subquery only includes 'generating' and 'loading_model':

"WHERE status IN ('generating', 'loading_model') AND source = 'dubbing_segment'"

Segments already linked to a failed dubbing generation remain attached and stay non-regenerable after a crash/restart.


4. backend/backends/__init__.py L~786 — return True inside finally silently swallows unload_model() exceptions

try:
    backend.unload_model()
finally:
    return True   # ← swallows any exception from unload_model()

The return in the finally block discards any exception raised during unload. Should be moved outside finally.


5. backend/models.pyDubbingAutoFitRequest missing temperature field

Current model:

class DubbingAutoFitRequest(BaseModel):
    profile_id: str
    language: str = ...
    engine: Optional[str] = ...
    model_size: Optional[str] = ...
    instruct: Optional[str] = ...
    style_prompt: Optional[str] = ...
    max_attempts: int = ...
    # temperature is absent

DubbingSegmentGenerateRequest has temperature; DubbingAutoFitRequest does not, so auto-fit silently ignores it.


6. backend/routes/dubbing.py L~193 — No UTF-16 fallback before cp1252 in SRT import

try:
    content = raw.decode("utf-8-sig")
except UnicodeDecodeError:
    content = raw.decode("cp1252")   # UTF-16 BOM decodes into garbage here instead of raising

A UTF-16 SRT (common Windows export) will silently import as corrupted text. Should try utf-16 before cp1252.


7. backend/routes/generations.py L~171 — Flat "1.7B" default in retry/regenerate, not engine-aware

await ensure_model_cached_or_raise(gen.engine or "qwen", gen.model_size or "1.7B")
...
model_size=gen.model_size or "1.7B",

tada should default to "1B", engines without model sizes should use "default". Same engine-aware branching from generate_speech is not applied here.


8. backend/utils/chunked_tts.pytemperature forwarded without backend capability guard

extra_kwargs = {"temperature": temperature} if temperature is not None else {}

Backends such as Kokoro, LuxTTS, and Hume do not declare a temperature kwarg in generate(). Passing it will raise TypeError at runtime when temperature is set for those engines.


9. tauri/src-tauri/src/main.rs L~253 — Concurrent wait can return before server is ready, or stall on starter failure

if check_health(SERVER_PORT) || state.child.lock().unwrap().is_some() {
    return Ok(...);
}

state.child.is_some() becomes true when the sidecar process is spawned, before it accepts connections. If the sidecar crashes immediately, the loop stalls for the full 120-second timeout with no failure detection.


10. backend/backends/pytorch_backend.py L~421 — device=self.device string may not be accepted by Transformers pipeline

pipe = pipeline(..., device=self.device)

When self.device is "xpu" or "directml", Transformers pipeline may not support those string values, causing a runtime error on those platforms.


✅ Fixed / Deferred

Issue Status
Language selector limited to fr/en ✅ Fixed (dynamic engine-filtered list)
ensureVoiceSelected indentation Not yet confirmed (cosmetic)
Group pace UI hidden comment Not yet confirmed (cosmetic)
AudioTrackEditor playhead keyboard ⏸️ Deferred (pre-existing gap, user confirmed)
TimelineScrollbar keyboard ⏸️ Deferred (pre-existing gap, user confirmed)
generation.py shared-engine unload race ⏸️ Heavy lift — per debug.md deferred list

🐇 10 open, 3 deferred. Let me know which ones you want to tackle next.

@Trucabulles
Copy link
Copy Markdown
Author

Tip

For best results, initiate chat on the files or code changes.
🧩 Analysis chain

@Trucabulles — I've checked each flagged file directly. Here is the precise status of every open issue:

❌ Not yet fixed (verified in current files)

1. app/src/components/AudioTimeline/AudioTrackEditor.tsx — Icon-only buttons without accessible names

The Play/Pause, Stop, Zoom Out, and Zoom In buttons still have no aria-label. Example:

<Button type="button" variant="ghost" size="icon" className="h-7 w-7" onClick={onPlayPause}>
  {isPlaying ? <Pause className="h-4 w-4" /> : <Play className="h-4 w-4" />}
</Button>

Each needs e.g. aria-label={isPlaying ? "Pause" : "Play"}, aria-label="Stop", aria-label="Zoom out", aria-label="Zoom in".

2. app/src/components/Generation/FloatingGenerateBox.tsx L~199 — Early return skips effects chain sync for designed profiles

if (selectedProfile?.voice_type === 'designed') {
  form.setValue('engine', 'qwen_voice_design');
  form.setValue('modelSize', '1.7B');
  return;   // ← skips the effects/preset sync block below
}

Switching to a designed profile leaves whatever effects chain was previously active in the form state.

3. backend/app.py L~261 — Stale failed dubbing generations not detached at startup

The reset subquery only includes 'generating' and 'loading_model':

"WHERE status IN ('generating', 'loading_model') AND source = 'dubbing_segment'"

Segments already linked to a failed dubbing generation remain attached and stay non-regenerable after a crash/restart.

4. backend/backends/__init__.py L~786 — return True inside finally silently swallows unload_model() exceptions

try:
    backend.unload_model()
finally:
    return True   # ← swallows any exception from unload_model()

The return in the finally block discards any exception raised during unload. Should be moved outside finally.

5. backend/models.pyDubbingAutoFitRequest missing temperature field

Current model:

class DubbingAutoFitRequest(BaseModel):
    profile_id: str
    language: str = ...
    engine: Optional[str] = ...
    model_size: Optional[str] = ...
    instruct: Optional[str] = ...
    style_prompt: Optional[str] = ...
    max_attempts: int = ...
    # temperature is absent

DubbingSegmentGenerateRequest has temperature; DubbingAutoFitRequest does not, so auto-fit silently ignores it.

6. backend/routes/dubbing.py L~193 — No UTF-16 fallback before cp1252 in SRT import

try:
    content = raw.decode("utf-8-sig")
except UnicodeDecodeError:
    content = raw.decode("cp1252")   # UTF-16 BOM decodes into garbage here instead of raising

A UTF-16 SRT (common Windows export) will silently import as corrupted text. Should try utf-16 before cp1252.

7. backend/routes/generations.py L~171 — Flat "1.7B" default in retry/regenerate, not engine-aware

await ensure_model_cached_or_raise(gen.engine or "qwen", gen.model_size or "1.7B")
...
model_size=gen.model_size or "1.7B",

tada should default to "1B", engines without model sizes should use "default". Same engine-aware branching from generate_speech is not applied here.

8. backend/utils/chunked_tts.pytemperature forwarded without backend capability guard

extra_kwargs = {"temperature": temperature} if temperature is not None else {}

Backends such as Kokoro, LuxTTS, and Hume do not declare a temperature kwarg in generate(). Passing it will raise TypeError at runtime when temperature is set for those engines.

9. tauri/src-tauri/src/main.rs L~253 — Concurrent wait can return before server is ready, or stall on starter failure

if check_health(SERVER_PORT) || state.child.lock().unwrap().is_some() {
    return Ok(...);
}

state.child.is_some() becomes true when the sidecar process is spawned, before it accepts connections. If the sidecar crashes immediately, the loop stalls for the full 120-second timeout with no failure detection.

10. backend/backends/pytorch_backend.py L~421 — device=self.device string may not be accepted by Transformers pipeline

pipe = pipeline(..., device=self.device)

When self.device is "xpu" or "directml", Transformers pipeline may not support those string values, causing a runtime error on those platforms.

✅ Fixed / Deferred

Issue Status
Language selector limited to fr/en ✅ Fixed (dynamic engine-filtered list)
ensureVoiceSelected indentation Not yet confirmed (cosmetic)
Group pace UI hidden comment Not yet confirmed (cosmetic)
AudioTrackEditor playhead keyboard ⏸️ Deferred (pre-existing gap, user confirmed)
TimelineScrollbar keyboard ⏸️ Deferred (pre-existing gap, user confirmed)
generation.py shared-engine unload race ⏸️ Heavy lift — per debug.md deferred list

🐇 10 open, 3 deferred. Let me know which ones you want to tackle next.

@coderabbitai can give me the instructions for AI for each the 10 opened

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Tip

For best results, initiate chat on the files or code changes.

@Trucabulles Here are AI-ready fix instructions for each of the 10 open issues. Each block is self-contained and can be pasted directly into Cursor, Copilot Chat, or any AI coding assistant.


Fix 1 — AudioTrackEditor.tsx — Add aria-label to icon-only buttons

File: app/src/components/AudioTimeline/AudioTrackEditor.tsx

Find all icon-only <Button> elements in the transport controls section (Play/Pause, Stop) and in the
zoom controls section (Zoom Out, Zoom In). Each currently renders only an icon child with no
accessible label.

Add aria-label attributes as follows:
- Play/Pause button: aria-label={isPlaying ? "Pause" : "Play"}
- Stop button: aria-label="Stop"
- Zoom Out button: aria-label="Zoom out"
- Zoom In button: aria-label="Zoom in"

Also add matching aria-pressed={isPlaying} on the Play/Pause button.

Do not change any logic, layout, or className. Only add the missing aria-label (and aria-pressed)
attributes to the identified <Button> elements.

Fix 2 — FloatingGenerateBox.tsx — Don't skip effects chain sync for designed profiles

File: app/src/components/Generation/FloatingGenerateBox.tsx

Locate the useEffect (or watch block) that syncs the selected profile to the form engine/modelSize.
It contains an early-return branch for designed profiles:

  if (selectedProfile?.voice_type === 'designed') {
    form.setValue('engine', 'qwen_voice_design');
    form.setValue('modelSize', '1.7B');
    return;   // ← this skips the effects-chain reset below
  }

The early return causes whatever effects chain was active before profile selection to remain in the
form state when switching to a designed profile.

Fix: After setting engine and modelSize for the designed branch, also reset the effects chain to its
default/empty value (the same reset that the non-designed path performs further below). Do NOT use
`return` if the code below it performs a necessary reset. Either remove the early return and fall
through, or duplicate the effects chain reset inside the designed branch before returning.

Fix 3 — backend/app.py — Also detach stale failed dubbing generations at startup

File: backend/app.py
Location: _run_startup function, the stale dubbing_segments cleanup block (~line 261)

Current subquery filters only status IN ('generating', 'loading_model'):
  WHERE status IN ('generating', 'loading_model') AND source = 'dubbing_segment'

Problem: dubbing_segments linked to a generation that was already marked 'failed' (e.g. from a
previous crash) remain attached and cannot be regenerated.

Fix: Expand the status filter to also include 'failed':
  WHERE status IN ('generating', 'loading_model', 'failed') AND source = 'dubbing_segment'

This ensures that on startup, any dubbing_segment whose generation_id points to a failed generation
is also detached (generation_id set to NULL, status reset to 'pending').

Fix 4 — backend/backends/__init__.pyreturn True inside finally swallows exceptions

File: backend/backends/__init__.py
Location: drop_tts_backend_for_engine function (~line 786)

Current code:
  try:
      backend.unload_model()
  finally:
      return True   # ← return inside finally discards any exception raised by unload_model()

A return statement inside a finally block silently suppresses any exception that was propagating,
including unexpected errors from unload_model().

Fix: Move the `return True` outside the try/finally block so that exceptions from unload_model()
are not swallowed:

  try:
      backend.unload_model()
  except Exception as e:
      logger.warning(f"drop_tts_backend_for_engine: unload_model raised: {e}")
  del _tts_backends[engine]   # or equivalent cleanup that was in finally
  return True

Adjust the del/cleanup step accordingly. The goal is that unload failures are logged but do not
prevent the backend entry from being removed, while also not silently discarding the exception.

Fix 5 — backend/models.py — Add temperature field to DubbingAutoFitRequest

File: backend/models.py
Location: class DubbingAutoFitRequest(BaseModel)

The sibling class DubbingSegmentGenerateRequest already has:
  temperature: Optional[float] = None

DubbingAutoFitRequest is missing this field, so auto-fit always generates with no temperature
control even when the user has configured one at the project or segment level.

Fix: Add the following optional field to DubbingAutoFitRequest, in the same position as it appears
in DubbingSegmentGenerateRequest:
  temperature: Optional[float] = None

Then, in backend/routes/dubbing.py, find the auto_fit_segment and auto_fit_project endpoints and
pass data.temperature through to the underlying generation call (the same way data.instruct is
passed).

Fix 6 — backend/routes/dubbing.py — Add UTF-16 fallback in SRT import decoding

File: backend/routes/dubbing.py
Location: import_srt endpoint, raw bytes decoding block (~line 193)

Current code:
  try:
      content = raw.decode("utf-8-sig")
  except UnicodeDecodeError:
      content = raw.decode("cp1252")

Problem: UTF-16 encoded SRT files (common from Windows Subtitle Edit / Microsoft tools) start with
a UTF-16 BOM. raw.decode("utf-8-sig") raises UnicodeDecodeError, but raw.decode("cp1252") then
silently decodes the UTF-16 bytes as Latin-1, producing garbage text with no error.

Fix: Add a UTF-16 attempt between utf-8-sig and cp1252:
  try:
      content = raw.decode("utf-8-sig")
  except UnicodeDecodeError:
      try:
          content = raw.decode("utf-16")
      except UnicodeDecodeError:
          content = raw.decode("cp1252", errors="replace")

This correctly handles UTF-8 (with/without BOM), UTF-16 LE/BE (with BOM), and falls back to
cp1252 for legacy Windows-1252 files.

Fix 7 — backend/routes/generations.py — Engine-aware model_size default in retry/regenerate

File: backend/routes/generations.py
Location: /generate/{generation_id}/retry and /generate/{generation_id}/regenerate endpoints
(~line 171 and nearby regenerate equivalent)

Current code uses a flat "1.7B" fallback:
  await ensure_model_cached_or_raise(gen.engine or "qwen", gen.model_size or "1.7B")
  model_size = gen.model_size or "1.7B"

The generate_speech endpoint already uses engine-aware defaulting. Apply the same logic here.
Extract or inline a helper that computes the correct default model_size from the engine:

  def _default_model_size(engine: str) -> str:
      if engine == "tada":
          return "1B"
      if engine_has_model_sizes(engine):
          return "1.7B"
      return "default"

Then replace both occurrences (retry and regenerate) of `gen.model_size or "1.7B"` with:
  gen.model_size or _default_model_size(gen.engine or "qwen")

Also apply the same replacement in the stream endpoint if it has the same flat "1.7B" fallback.

Fix 8 — backend/utils/chunked_tts.py — Guard temperature kwarg by backend capability

File: backend/utils/chunked_tts.py
Location: generate_chunked function, the extra_kwargs construction

Current code unconditionally forwards temperature to all backends:
  extra_kwargs = {"temperature": temperature} if temperature is not None else {}
  audio, sr = backend.generate(text, voice_prompt, **extra_kwargs)

Backends like KokoroTTSBackend, LuxTTSBackend, and HumeTTSBackend do not accept a `temperature`
keyword argument in their generate() signature. Passing it raises TypeError at runtime.

Fix: Use Python's inspect module (or a capabilities set) to check whether the backend's generate()
method accepts a `temperature` parameter before including it in kwargs:

  import inspect

  def _backend_accepts_temperature(backend) -> bool:
      sig = inspect.signature(backend.generate)
      return "temperature" in sig.parameters

Then:
  extra_kwargs = {}
  if temperature is not None and _backend_accepts_temperature(backend):
      extra_kwargs["temperature"] = temperature

Apply this guard in both the single-shot path and the chunked loop path within generate_chunked.

Fix 9 — tauri/src-tauri/src/main.rs — Concurrent startup wait: detect crash and use health-only readiness check

File: tauri/src-tauri/src/main.rs
Location: start_server function, the concurrent-startup wait loop (~line 253)

Current exit condition:
  if check_health(SERVER_PORT) || state.child.lock().unwrap().is_some() {
      return Ok(server_url);
  }

Two problems:
1. state.child.is_some() is true as soon as the sidecar process is spawned, before it accepts
   connections. This can return a URL before the server is ready.
2. If the sidecar crashes immediately after spawn, the loop runs for the full 120-second timeout
   with no early failure detection.

Fix:
- Remove the `state.child.lock().unwrap().is_some()` branch from the ready check. Only
  `check_health(SERVER_PORT)` should be the "server is ready" condition.
- Add a crash-detection check: if state.child is Some but the process has already exited (use
  Child::try_wait() or an equivalent), break the wait loop early and return an error.

Pseudocode:
  loop {
      if check_health(SERVER_PORT) { return Ok(server_url); }
      // detect crash: if child is set but process exited, return error
      if let Some(child) = state.child.lock().unwrap().as_mut() {
          if let Ok(Some(_)) = child.try_wait() {
              return Err("Server process exited unexpectedly".into());
          }
      }
      if elapsed >= timeout { return Err("Server startup timed out".into()); }
      sleep(250ms);
  }

Fix 10 — backend/backends/pytorch_backend.py — Guard device for XPU/DirectML in STT pipeline

File: backend/backends/pytorch_backend.py
Location: PyTorchSTTBackend.transcribe_word_timestamps (~line 421)

Current code:
  pipe = pipeline(..., device=self.device)

When self.device is "xpu" or "directml", Hugging Face Transformers pipeline does not accept
those strings and raises at runtime.

Fix: Map non-standard device strings to a safe fallback before passing to pipeline:

  def _safe_pipeline_device(device: str) -> str | int:
      """Map device string to a value accepted by HF pipeline."""
      if device.startswith("cuda"):
          return device          # "cuda" or "cuda:0" etc — fine
      if device == "cpu":
          return "cpu"           # fine
      # xpu, directml, mps-variants not reliably supported by pipeline → fall back to cpu
      return "cpu"

Then:
  pipe = pipeline(..., device=_safe_pipeline_device(self.device))

Optionally, if MPS ("mps") is known to work for the installed torch/transformers version, add it
to the safe list. The conservative default is to fall back to CPU for any non-cuda/non-cpu device.

🐇 All 10 instructions are self-contained and ready to paste. Let me know if you want me to batch any of these into a single commit plan with diffs.

@Trucabulles
Copy link
Copy Markdown
Author

@coderabbitai
AudioTrackEditor aria-labels : déjà corrigé. Play/Pause, Stop, Zoom in/out ont bien des aria-label.

FloatingGenerateBox designed profile / effects : déjà corrigé. Il n’y a plus de return prématuré, donc la synchro effects/persona continue après sélection d’une voix designed.

Startup stale dubbing generations : déjà corrigé. La requête inclut maintenant failed en plus de generating / loading_model.

return True dans finally unload : déjà corrigé. drop_tts_backend_for_engine() ne masque plus les exceptions, log + raise.

DubbingAutoFitRequest.temperature : déjà corrigé. Le champ temperature est présent.

Import SRT UTF-16 : déjà corrigé. Le code tente utf-8-sig, puis utf-16, puis cp1252.

Retry/regenerate model size engine-aware : déjà corrigé. Le code utilise _resolve_existing_generation_model_size(engine, gen.model_size) avant ensure_model_cached_or_raise().

chunked_tts temperature guard : déjà corrigé. Il inspecte la signature backend et ne passe temperature que si generate() l’accepte.

Boucle startup Tauri : pas corrigé volontairement, à garder en dette technique. Bénéfice théorique : démarrage plus robuste avec /health strict. Risque : très sensible, peut recasser le lancement serveur/ghost server. On avait déjà décidé de ne pas toucher sans chantier dédié.

pytorch_backend pipeline device : partiellement sécurisé. Le cas DirectML est explicitement évité. Le risque résiduel concerne surtout xpu/plateformes non testées. Bénéfice d’une correction : meilleure compatibilité future. Risque faible à moyen, mais non prioritaire Windows CUDA actuel.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me verify each fix directly in the files before confirming.


🏁 Script executed:

#!/bin/bash
echo "=== Fix 1: aria-labels in AudioTrackEditor.tsx ==="
grep -n "aria-label" app/src/components/AudioTimeline/AudioTrackEditor.tsx

echo ""
echo "=== Fix 2: FloatingGenerateBox designed profile early return ==="
grep -n -A 10 "voice_type.*designed" app/src/components/Generation/FloatingGenerateBox.tsx

echo ""
echo "=== Fix 3: startup stale dubbing generations (failed included) ==="
grep -n -B2 -A5 "loading_model\|generating\|failed" backend/app.py | grep -A5 "dubbing"

echo ""
echo "=== Fix 4: drop_tts_backend_for_engine finally/return ==="
grep -n -A 20 "def drop_tts_backend_for_engine" backend/backends/__init__.py

echo ""
echo "=== Fix 5: DubbingAutoFitRequest temperature ==="
grep -n -A 15 "class DubbingAutoFitRequest" backend/models.py

echo ""
echo "=== Fix 6: UTF-16 in SRT import ==="
grep -n -B2 -A8 "utf-8-sig\|utf-16\|cp1252" backend/routes/dubbing.py

echo ""
echo "=== Fix 7: _resolve_existing_generation_model_size in generations.py ==="
grep -n "_resolve_existing_generation_model_size\|model_size.*1.7B\|model_size.*1B" backend/routes/generations.py | head -30

echo ""
echo "=== Fix 8: temperature guard in chunked_tts.py ==="
grep -n -B2 -A8 "temperature" backend/utils/chunked_tts.py

echo ""
echo "=== Fix 9 (deferred): Tauri startup loop ==="
sed -n '245,275p' tauri/src-tauri/src/main.rs

echo ""
echo "=== Fix 10 (partial): pytorch_backend pipeline device ==="
grep -n -B3 -A5 "pipeline\|device.*self.device\|directml\|xpu" backend/backends/pytorch_backend.py | grep -A5 "pipeline\|directml"

Length of output: 8714


I need one more check on Fix 3 before giving the full picture.


🏁 Script executed:

#!/bin/bash
echo "=== Fix 3: Full dubbing stale reset block in app.py ==="
grep -n -B5 -A40 "stale.*dubbing\|dubbing.*stale\|dubbing_segment.*NULL\|reset.*dubbing" backend/app.py | head -80

echo ""
echo "=== Fix 3 alternative: search for 'failed' near dubbing_segment ==="
grep -n -B3 -A3 "failed" backend/app.py

Length of output: 2981


🏁 Script executed:

#!/bin/bash
echo "=== app.py lines 237-262 full context ==="
sed -n '237,262p' backend/app.py

Length of output: 1129


@Trucabulles — I've verified every file directly. The current branch (feature/srt2voice) does not yet contain any of the 8 claimed fixes. The changes may exist locally on your machine but have not been pushed to GitHub. Here is the precise state per file:

# Issue Verified status
1 AudioTrackEditor.tsx — Play/Pause, Stop, Zoom In/Out aria-label ❌ Not in branch — only trim/playhead/volume labels present
2 FloatingGenerateBox.tsx — early return for designed profiles ❌ Not in branch — return; still at Line 178
3 backend/app.py'failed' added to dubbing startup reset ❌ Not in branch — WHERE still only ('generating', 'loading_model')
4 backend/backends/__init__.pyreturn True inside finally ❌ Not in branch — still finally: return True at Line 785
5 backend/models.pytemperature in DubbingAutoFitRequest ❌ Not in branch — field absent from class
6 backend/routes/dubbing.py — UTF-16 fallback in SRT import ❌ Not in branch — still utf-8-sigcp1252 with no utf-16 attempt
7 backend/routes/generations.py — engine-aware model_size in retry/regenerate ❌ Not in branch — still flat "1.7B" fallback on Lines 166/190/214/238
8 backend/utils/chunked_tts.pyinspect guard for temperature kwarg ❌ Not in branch — unconditional extra_kwargs still used
9 tauri/src-tauri/src/main.rs — startup wait loop ⏸️ Deferred intentionally — confirmed
10 backend/backends/pytorch_backend.py — pipeline device guard ⚠️ Partial — directml excluded at device-selection level, but self.device still passed raw to pipeline()

Please push your local commits and I'll re-verify once they land on the branch. 🐇

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
backend/models.py (1)

477-480: 💤 Low value

Remove unnecessary pass statement.

DubbingFullNarrationRequest inherits from DubbingSegmentGenerateRequest and adds no new members, so the pass is redundant. Consider using an ellipsis or removing it.

Suggested fix
 class DubbingFullNarrationRequest(DubbingSegmentGenerateRequest):
     """Request model for beta whole-SRT narration generation."""
-
-    pass
+    ...

Or simply:

 class DubbingFullNarrationRequest(DubbingSegmentGenerateRequest):
     """Request model for beta whole-SRT narration generation."""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/models.py` around lines 477 - 480, The class
DubbingFullNarrationRequest currently contains a redundant "pass"; since it only
inherits from DubbingSegmentGenerateRequest and has a docstring, remove the
unnecessary pass statement (or replace it with an ellipsis literal) so the class
body is minimal—locate the class declaration for DubbingFullNarrationRequest and
delete the pass line.
backend/routes/generations.py (1)

86-87: 💤 Low value

Chain exception context for better traceability.

The exception should be raised with from e to preserve the original traceback.

Suggested fix
     except ValueError as e:
-        raise HTTPException(status_code=400, detail=str(e))
+        raise HTTPException(status_code=400, detail=str(e)) from e
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/routes/generations.py` around lines 86 - 87, The current exception
handler in backend/routes/generations.py catches ValueError as e and re-raises
HTTPException without chaining; update the raise to include exception chaining
(raise HTTPException(status_code=400, detail=str(e)) from e) so the original
traceback is preserved when re-raising from the ValueError caught in that except
block.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src/components/AudioTimeline/AudioTrackEditor.tsx`:
- Around line 689-706: The two Plus icon buttons in the AudioTrackEditor
component (the conditional top button where index === 0 and the bottom button
where index === tracks.length - 1) lack accessible names; update each <button>
to include an appropriate aria-label (e.g., "Add track above" for the top button
and "Add track below" for the bottom button) or a dynamically generated label
including the track index, while keeping the existing onClick handlers that call
setExtraTracks and using the same classNames and structure.

In `@backend/backends/__init__.py`:
- Around line 783-788: The reference to logger in drop_tts_backend_for_engine
will raise NameError because no logger is defined; fix by either adding a
module-level logger (import logging and set logger =
logging.getLogger(__name__)) near the top of the file so the existing
logger.exception call works, or mirror the pattern from unload_all_tts_backends
by performing a local import/lookup of the logger inside
drop_tts_backend_for_engine before calling logger.exception; ensure the symbol
logger is defined before use.

---

Nitpick comments:
In `@backend/models.py`:
- Around line 477-480: The class DubbingFullNarrationRequest currently contains
a redundant "pass"; since it only inherits from DubbingSegmentGenerateRequest
and has a docstring, remove the unnecessary pass statement (or replace it with
an ellipsis literal) so the class body is minimal—locate the class declaration
for DubbingFullNarrationRequest and delete the pass line.

In `@backend/routes/generations.py`:
- Around line 86-87: The current exception handler in
backend/routes/generations.py catches ValueError as e and re-raises
HTTPException without chaining; update the raise to include exception chaining
(raise HTTPException(status_code=400, detail=str(e)) from e) so the original
traceback is preserved when re-raising from the ValueError caught in that except
block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 355d4bd4-0391-4237-964e-7c4bf2929ff3

📥 Commits

Reviewing files that changed from the base of the PR and between d4fda1b and 638307b.

📒 Files selected for processing (9)
  • app/src/components/AudioTimeline/AudioTrackEditor.tsx
  • app/src/components/DubbingTab/DubbingTab.tsx
  • app/src/components/Generation/FloatingGenerateBox.tsx
  • backend/app.py
  • backend/backends/__init__.py
  • backend/models.py
  • backend/routes/dubbing.py
  • backend/routes/generations.py
  • backend/utils/chunked_tts.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/app.py
  • app/src/components/Generation/FloatingGenerateBox.tsx

Comment thread app/src/components/AudioTimeline/AudioTrackEditor.tsx
Comment thread backend/backends/__init__.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/utils/cache.py (1)

87-90: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep corrupted-cache cleanup best-effort.

If unlink() fails here, get_cached_voice_prompt() can raise instead of safely falling back to None.

Proposed fix
-        except Exception:
-            # Cache file corrupted, delete it
-            cache_file.unlink()
+        except Exception:
+            # Cache file corrupted, delete it (best effort)
+            try:
+                cache_file.unlink(missing_ok=True)
+            except Exception as e:
+                logger.warning("Failed to delete corrupted cache file %s: %s", cache_file, e)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/utils/cache.py` around lines 87 - 90, In get_cached_voice_prompt(),
the except block that handles a corrupted cache calls cache_file.unlink()
without guarding against errors; wrap the unlink call in its own try/except (or
handle OSError) so failures to delete the cache are swallowed and do not
propagate, ensuring the function still returns None on failure; specifically
update the except handling around cache_file.unlink() in
backend.utils.cache.get_cached_voice_prompt to perform best-effort deletion and
ignore any exception from unlink.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/backends/qwen_voice_design_backend.py`:
- Around line 34-63: load_model_async mutates shared state without
synchronization which allows concurrent calls to race; add an asyncio.Lock
(e.g., self._load_lock) initialized in __init__ and use "async with
self._load_lock" inside load_model_async to serialize the pre-checks,
unload_model calls and the to_thread(_load_model_sync) call, and similarly guard
unload_model and any other code paths that mutate self.model or
self._current_model_size (references: __init__, load_model_async, unload_model,
_load_model_sync, is_loaded, _current_model_size, model).

In `@backend/services/generation.py`:
- Around line 294-305: The current retry flow saves every retry to the same
filename (audio_path = .../f"{generation_id}.wav") so subsequent retries
overwrite the file and older version records created via
versions_mod.create_version(generation_id=..., label="retry",
audio_path=storage_path, ...) can point to replaced audio; change the save to
generate a unique filename per retry (e.g., include a UUID, timestamp, or
incremented retry_count in the filename like f"{generation_id}-{retry_id}.wav"),
update audio_path and storage_path accordingly before calling save_audio and
versions_mod.create_version, ensure the created version uses the unique
storage_path and that the function still returns that storage_path so each retry
preserves its own audio file and version record (update references to
audio_path, save_audio, and versions_mod.create_version).

---

Outside diff comments:
In `@backend/utils/cache.py`:
- Around line 87-90: In get_cached_voice_prompt(), the except block that handles
a corrupted cache calls cache_file.unlink() without guarding against errors;
wrap the unlink call in its own try/except (or handle OSError) so failures to
delete the cache are swallowed and do not propagate, ensuring the function still
returns None on failure; specifically update the except handling around
cache_file.unlink() in backend.utils.cache.get_cached_voice_prompt to perform
best-effort deletion and ignore any exception from unlink.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7c1e2ebd-1533-4b69-a835-50b5d80c4576

📥 Commits

Reviewing files that changed from the base of the PR and between 638307b and 3654b5b.

📒 Files selected for processing (6)
  • backend/app.py
  • backend/backends/qwen_voice_design_backend.py
  • backend/database/migrations.py
  • backend/database/models.py
  • backend/services/generation.py
  • backend/utils/cache.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • backend/app.py
  • backend/database/migrations.py
  • backend/database/models.py

Comment thread backend/backends/qwen_voice_design_backend.py
Comment thread backend/services/generation.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant