Skip to content

转写端到端接线:whisper 启用 + 模型下载 + 命令 + MCP get_transcript#183

Merged
appergb merged 1 commit into
mainfrom
feat/whisper-wiring
Jul 2, 2026
Merged

转写端到端接线:whisper 启用 + 模型下载 + 命令 + MCP get_transcript#183
appergb merged 1 commit into
mainfrom
feat/whisper-wiring

Conversation

@appergb

@appergb appergb commented Jul 2, 2026

Copy link
Copy Markdown
Owner

引擎已建成但出厂不可达的根治:whisper-backend 在 src-tauri 启用(crate 级仍可选,304/305 双态测试;whisper.cpp 一次性 ~42s 编译,CI runner 依赖齐备);ggml-base 多语模型下载(SHA-1 校验,复用既有下载机器);4 个 Tauri 命令(状态/下载+进度/转写+缓存/读取);MCP get_transcript 1:1 上游(词中点归属、span 夹取、speed 下限、10000 词分页、linked-video 让位音频伙伴),纯映射 20 边界测试 + 10 dispatcher 测试。门禁:workspace 1447 + web 330 全绿。注意:本 PR 起 CI 会编译 whisper.cpp,首跑变慢属预期。

… MCP get_transcript

The transcription engine (whisper.cpp backend, cache, locale, search) was fully
built and tested but compile-gated off with no consumer: the shipped app could
never transcribe. Wire it:

- Features: whisper-backend stays optional at opentake-media (crate tests pass
  with and without: 304/305) and is ON for src-tauri. Build cost measured: ~42s
  one-time whisper.cpp C++ compile, cmake+libclang only (preinstalled on GitHub
  ubuntu runners), CPU build, no CUDA.
- Model management: transcribe/model.rs mirrors search/model_download.rs's
  download->verify->atomic-rename machinery for a single ggml file. Default
  ggml-base MULTILINGUAL (~142MB, HuggingFace ggerganov/whisper.cpp, SHA-1
  verified) — upstream uses Apple's multilingual SpeechTranscriber with OS
  auto-install (Transcription.swift:119-147), so multilingual base is the
  faithful equivalent.
- Commands (src-tauri/src/transcribe.rs, camelCase DTOs + serde tests):
  transcribe_model_status / download_transcribe_model (async, transcribe://
  progress events like export) / transcribe_media (blocking on worker thread,
  TranscriptCache-backed so re-runs are instant) / transcript_get (cache-only).
- MCP get_transcript (upstream ToolExecutor+Timeline.swift:548-651, 1:1):
  post-edit timeline transcript in PROJECT frames — per-clip word midpoint
  assignment within [visStart,visEnd), span clamping, round() to_timeline with
  speed floor 0.0001, clips sorted by startFrame, 10000-word cap with
  nextStartFrame paging, skipped[] report, linked-video-drops-for-audio-partner
  eligibility (EditorViewModel+Captions.swift:52-90). Pure port in
  transcribe/timeline.rs with 20 edge-case tests (trim/speed/straddlers/seam
  midpoint/paging); bridge method transcribe_sources on MediaBridge; 10
  dispatcher tests via FakeBridge.

Deferred (next item): Captions generation tab + add_captions; inspect_media /
search_media stubs; frontend bindings for the 4 new commands.

Gates: fmt/clippy -D warnings clean; cargo test --workspace 1447; pnpm build
clean; pnpm test 330.
@appergb appergb merged commit 748698f into main Jul 2, 2026
2 checks passed
@appergb appergb deleted the feat/whisper-wiring branch July 2, 2026 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant