转写端到端接线:whisper 启用 + 模型下载 + 命令 + MCP get_transcript#183
Merged
Conversation
… MCP get_transcript The transcription engine (whisper.cpp backend, cache, locale, search) was fully built and tested but compile-gated off with no consumer: the shipped app could never transcribe. Wire it: - Features: whisper-backend stays optional at opentake-media (crate tests pass with and without: 304/305) and is ON for src-tauri. Build cost measured: ~42s one-time whisper.cpp C++ compile, cmake+libclang only (preinstalled on GitHub ubuntu runners), CPU build, no CUDA. - Model management: transcribe/model.rs mirrors search/model_download.rs's download->verify->atomic-rename machinery for a single ggml file. Default ggml-base MULTILINGUAL (~142MB, HuggingFace ggerganov/whisper.cpp, SHA-1 verified) — upstream uses Apple's multilingual SpeechTranscriber with OS auto-install (Transcription.swift:119-147), so multilingual base is the faithful equivalent. - Commands (src-tauri/src/transcribe.rs, camelCase DTOs + serde tests): transcribe_model_status / download_transcribe_model (async, transcribe:// progress events like export) / transcribe_media (blocking on worker thread, TranscriptCache-backed so re-runs are instant) / transcript_get (cache-only). - MCP get_transcript (upstream ToolExecutor+Timeline.swift:548-651, 1:1): post-edit timeline transcript in PROJECT frames — per-clip word midpoint assignment within [visStart,visEnd), span clamping, round() to_timeline with speed floor 0.0001, clips sorted by startFrame, 10000-word cap with nextStartFrame paging, skipped[] report, linked-video-drops-for-audio-partner eligibility (EditorViewModel+Captions.swift:52-90). Pure port in transcribe/timeline.rs with 20 edge-case tests (trim/speed/straddlers/seam midpoint/paging); bridge method transcribe_sources on MediaBridge; 10 dispatcher tests via FakeBridge. Deferred (next item): Captions generation tab + add_captions; inspect_media / search_media stubs; frontend bindings for the 4 new commands. Gates: fmt/clippy -D warnings clean; cargo test --workspace 1447; pnpm build clean; pnpm test 330.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
引擎已建成但出厂不可达的根治:whisper-backend 在 src-tauri 启用(crate 级仍可选,304/305 双态测试;whisper.cpp 一次性 ~42s 编译,CI runner 依赖齐备);ggml-base 多语模型下载(SHA-1 校验,复用既有下载机器);4 个 Tauri 命令(状态/下载+进度/转写+缓存/读取);MCP get_transcript 1:1 上游(词中点归属、span 夹取、speed 下限、10000 词分页、linked-video 让位音频伙伴),纯映射 20 边界测试 + 10 dispatcher 测试。门禁:workspace 1447 + web 330 全绿。注意:本 PR 起 CI 会编译 whisper.cpp,首跑变慢属预期。