AI-driven English closed captions generator, optimized for RTX 5080 hardware. This project provides tools for transcribing audio from video files into English subtitles, with a focus on anime content where matching English dubs are often unavailable.
This repository contains experimental scripts for generating closed captions using AI transcription models. The primary working implementation is transcribe_faster_whisper.sh, a wrapper script that orchestrates the underlying Python components for transcription.
An earlier attempt using WhisperX was aborted due to compatibility and performance issues.
The main pipeline (transcribe_faster_whisper.sh) includes the following capabilities:
- API Context Integration: Leverages external APIs to gather contextual information about the video content, improving transcription accuracy.
- Sign/Song Merging: Automatically detects and merges signs, songs, and other non-dialogue elements into the subtitle track.
- Existing Subtitle Detection and Merging: Identifies pre-existing closed caption tracks in the video file. If found, AI generation of new tracks is skipped, and existing tracks are merged with any detected signs/songs tracks.
Note: This project is highly experimental and requires significant manual intervention. It is not a fully automated solution.
- Ensure you have the necessary dependencies installed (see
requirements_fw.txtfor Python packages). - Prepare your video file and gather context:
- Know the expected list of existing subtitle tracks.
- Manually input the proper name of the video, TV show, or movie to fetch relevant context.
- Run the primary script:
Customize prompts and context as needed for optimal results.
./transcribe_faster_whisper.sh [video_file] [options]
- Experimental Status: This is an ongoing, incomplete project. Results vary greatly depending on the video file.
- Manual Knowledge Required: Success depends on your familiarity with the target video file, including its subtitle tracks and content details.
- Context Prompting: Requires manual input of accurate video names and contextual information to improve transcription quality.
- Anime Focus: Primarily tested and used for anime content, which is notorious for lacking synchronized English subtitles for dubs.
- Hardware Specific: Optimized for RTX 5080; performance may vary on other systems.
Use at your own risk, and expect to iterate on results manually.
- Python 3.x
- Dependencies listed in
requirements_fw.txt - RTX 5080 GPU (recommended)
- FFmpeg or similar for video processing
This is a personal project, but feel free to fork and experiment. Pull requests for improvements are welcome, though the project is not actively maintained.
MIT License - Intended for personal, non-commercial use. See LICENSE file for details.