🎬 Supernan AI Dubbing Pipeline

A high-fidelity, zero-cost Python pipeline that converts English training videos into professional-grade Hindi-dubbed versions with voice cloning and crystal-clear audio.

Built for: Supernan AI Intern Challenge
Output: 20-second high-quality dubbed clip with perfect lip-sync and studio-level voice clarity.

💎 Premium Quality Features

Unlike standard dubbing scripts, this pipeline includes a 4-Pillar Quality Enhancement Suite:

🎙️ Crystal Clear Voice Cloning: Applies adaptive denoising (afftdn) and high-pass filtering to the original reference audio for cleaner voice extraction.
🗣️ Anti-Fumble Smart Splitting: Uses a conjunction-aware text splitter (handling और, क्योंकि, लेकिन, etc.) to prevent XTTS from fumbling on long Hindi sentences.
✨ Ultimate Clarity Booster: Professional FFmpeg audio chain (Equalizer, Treble Boost, Compressor, and Loudnorm) for a "studio" feel.
🔄 Natural Precision Sync: Caps speed adjustment at 1.15x (natural human limit) and uses Smart Video Padding (freezing frames) instead of "chipmunk" speed-up if audio is long.

🛠️ Pipeline Architecture

graph TD
    A[Input Video] --> B[FFmpeg Clip Extract]
    B --> C[FFmpeg Audio Extract]
    C --> D[Whisper Transcription - audio to English Text]
    D --> E[IndicTrans2 Translation - English Text to Hindi Text]
    E --> F[Coqui XTTS v2 - Hindi Voice Clone]
    F --> G[FFmpeg Atempo Filter - Speed Adjust Audio to match duration]
    G --> H[VideoReTalking - Lip Sync]
    H --> I[GFPGAN - Face Enhancement]
    I --> J[Final Output - 20 sec Video]

🚀 Setup & Usage

☁️ Option 1 — Google Colab (Free, Recommended for Testing)

Open supernan_dubbing.ipynb in Colab for free GPU access.

Click Runtime → Change runtime type → T4 GPU
Run all cells top to bottom
Download the final output from the workspace/ folder

Tip: Colab free tier gives ~4 hrs of T4 GPU. Save your output before the session expires.

💻 Option 2 — Local Machine Setup

Prerequisites

Python 3.9+
CUDA-capable GPU (NVIDIA, 8 GB+ VRAM recommended)
FFmpeg installed (brew install ffmpeg on Mac / apt install ffmpeg on Linux)

Steps

# 1. Clone the repository
git clone https://github.com/Vikash9546/Supernan.git
cd Supernan

# 2. Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate          # Mac/Linux
# venv\Scripts\activate           # Windows

# 3. Install all dependencies
pip install -r requirements.txt

# 4. Download model weights (Whisper, XTTS, VideoReTalking, GFPGAN)
chmod +x setup.sh
./setup.sh

# 5. Run the pipeline
python dub_video.py --input supernan_training.mp4

Environment Variables

Create a .env file in the project root:

WHISPER_MODEL=medium          # tiny / base / small / medium / large
XTTS_LANGUAGE=hi              # Target language code
OUTPUT_DIR=workspace/output

⚡ Option 3 — Paid GPU Deployment (Production Scale)

For high-throughput or production use, deploy on dedicated GPU workers. Below is the recommended stack:

Recommended GPU Workers & Their Roles

Service	GPU	Handles
XTTS Voice Clone	A10G / L4	Coqui XTTS v2 inference
VideoReTalking Lip Sync	A10G / L4	Lip-sync video generation
Task Queue	CPU	Redis + Celery message queue
Object Storage	—	AWS S3 (input/output video storage)
Autoscaling	Spot A10G / L4	GPU worker autoscaling

Platforms

Modal Labs (Easiest — Serverless GPU)

pip install modal
modal token new

# Deploy XTTS worker
modal deploy modal_xtts_worker.py

# Deploy VideoReTalking worker
modal deploy modal_lipsync_worker.py

Modal supports A10G and L4 spot GPUs with pay-per-second billing and zero cold-start config.

RunPod (Most Flexible)

Go to runpod.io → Deploy → Select A10G or L4 pod
Use the runpod/pytorch:2.1.0-py3.10-cuda11.8.0 template
SSH into the pod and run:

git clone https://github.com/Vikash9546/Supernan.git
cd Supernan && bash setup.sh
python dub_video.py --input supernan_training.mp4

AWS EC2 Spot GPU (Cheapest at Scale)

# Launch a g5.xlarge spot instance (A10G, ~$0.50/hr)
aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type g5.xlarge \
  --instance-market-options MarketType=spot \
  --key-name your-key

# SSH in and run setup
ssh -i your-key.pem ubuntu@<instance-ip>
git clone https://github.com/Vikash9546/Supernan.git
cd Supernan && bash setup.sh
python dub_video.py --input supernan_training.mp4

Message Queue Setup (Redis + Celery)

For async multi-video processing with a task queue:

# Install Redis
sudo apt install redis-server
sudo systemctl start redis

# Install Celery
pip install celery redis

# Start Celery worker (on GPU machine)
celery -A dub_worker worker --loglevel=info --concurrency=1

Object Storage (AWS S3)

pip install boto3

# Configure AWS credentials
aws configure
# Enter: Access Key, Secret Key, Region (e.g. ap-south-1)

Set in .env:

AWS_BUCKET_NAME=supernan-dubbing-bucket
AWS_REGION=ap-south-1

Input videos are fetched from S3 and outputs are uploaded back automatically when using dub_video.py --s3.

Spot GPU Autoscaling (A10G / L4)

Use Modal or RunPod autoscale groups to spin up GPU workers on demand:

Modal: Set @stub.function(gpu="A10G", concurrency_limit=5) — scales to zero when idle
RunPod: Create a Serverless Endpoint with min 0 / max N workers on L4 pods
AWS: Use an Auto Scaling Group with g5.xlarge spot instances triggered by an SQS queue depth alarm

📂 Project Structure

supernan_dubbing.ipynb: Main interactive pipeline (GitHub-optimized).
dub_video.py: Orchestrator script for high-scale processing.
utils.py: Smart audio manipulation and sync-checking utilities.
setup.sh: Automated environment and model weights downloader.
workspace/: Temporary storage for intermediate stages (Denoised Ref, Raw TTS, Clean TTS).

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.gitignore		.gitignore
README.md		README.md
config.py		config.py
dub_video.py		dub_video.py
fix_notebook.py		fix_notebook.py
requirements.txt		requirements.txt
setup.sh		setup.sh
supernan_dubbing.ipynb		supernan_dubbing.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Supernan AI Dubbing Pipeline

💎 Premium Quality Features

🛠️ Pipeline Architecture

🚀 Setup & Usage

☁️ Option 1 — Google Colab (Free, Recommended for Testing)

💻 Option 2 — Local Machine Setup

Prerequisites

Steps

Environment Variables

⚡ Option 3 — Paid GPU Deployment (Production Scale)

Recommended GPU Workers & Their Roles

Platforms

Message Queue Setup (Redis + Celery)

Object Storage (AWS S3)

Spot GPU Autoscaling (A10G / L4)

📂 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎬 Supernan AI Dubbing Pipeline

💎 Premium Quality Features

🛠️ Pipeline Architecture

🚀 Setup & Usage

☁️ Option 1 — Google Colab (Free, Recommended for Testing)

💻 Option 2 — Local Machine Setup

Prerequisites

Steps

Environment Variables

⚡ Option 3 — Paid GPU Deployment (Production Scale)

Recommended GPU Workers & Their Roles

Platforms

Message Queue Setup (Redis + Celery)

Object Storage (AWS S3)

Spot GPU Autoscaling (A10G / L4)

📂 Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages