Echo Charlie

Echo Charlie is a multimodal voice conversion pipeline that turns a muted video into a fully voiced clip. The system retrieves reference audio from an indexed library, cleans up the lip-reading transcript, and then generates new speech that matches the target speaker.

🎬 Demo

👉 View the EchoCharlie Demo on GitHub Pages

Highlights

Retrieval-augmented voice cloning using EchoCharlie.echo_db.EchoDB
Visual speech recognition (lip reading) via EchoCharlie.echo_vsr
Transcript clean-up with the Boson-hosted Qwen large language model
Speech generation with the Boson Higgs audio model and reference audio
Extensible building blocks for notebooks, Streamlit demos, and batch jobs

Pipeline Overview

Frame embedding – EchoCharlie.echo_frame.GetFrame samples frames from the target video, runs DeepFace embeddings, and extracts audio if needed.
Reference retrieval – EchoDB stores embeddings in ChromaDB alongside an SQLite catalog of audio files. The closest reference clip is retrieved.
Lip reading – EchoCharlie.echo_vsr.VSRInferencePipeline runs the pre-trained LRS3 model to decode the video into raw text.
Transcript correction – EchoCharlie.echo_qwen.QwenModel fixes transcription errors and adds punctuation through a Boson API call.
Voice synthesis – EchoCharlie.echo_higgs.HiggsModel combines the cleaned transcript and reference audio to generate a new speech track.

Repository Layout

EchoCharlie/ – Core Python package
- echo_charlie.py – High-level orchestration class
- echo_db.py – Reference media indexing and retrieval
- echo_frame.py – Frame sampling, embeddings, and audio extraction
- echo_vsr.py – Visual speech recognition inference wrapper
- echo_qwen.py – Boson Qwen-based transcript cleaner
- echo_higgs.py – Boson Higgs speech generation helper
- streamlit.py – Prototype UI (uses hard-coded demo paths)
data/ – Sample videos, transcripts, embeddings, and generated audio
models/ – Pretrained VSR weights (models/LRS3_V_WER19.1/…)
nbs/ – Jupyter notebooks for experiments and tests
config.json – Example configuration (contains placeholder API key)
main.py – Minimal entry point stub

Requirements

Python 3.11
FFmpeg (for video/audio I/O used by MoviePy and OpenCV)
macOS with Apple Silicon is assumed by the default TensorFlow packages; for other platforms adjust the dependencies in pyproject.toml.
Boson API access token with permission to call the Higgs and Qwen models

Install FFmpeg first, e.g. brew install ffmpeg on macOS.

Installation

# Clone the repository
git clone <your fork>
cd echo-charlie

# (Recommended) create a virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -U pip
pip install -e .

This project also ships a uv.lock file. If you use uv you can run:

uv sync

The DeepFace backend downloads model weights the first time it runs. Allow network access on first execution.

Configuration

Create a .env or JSON file containing your Boson API key. Do not commit real credentials.
Pass the key into EchoCharlie when you instantiate it, or export BOSON_API_KEY and load it in your script.

Example config.json:

{
  "api_key": "bai-REPLACE-ME"
}

Quickstart

from EchoCharlie import EchoCharlie

video_path = "data/videos/obama_1_one_word_error.mp4"
transcript_json = "data/transcripts/transcript.json"
output_audio = "data/generated_audio/obama_fixed.wav"
boson_key = "bai-REPLACE-ME"

references = [
    "data/videos/trump_ref.mp4",
    "data/videos/trudeau_ref.mp4",
    "data/videos/macron_ref.mp4",
    "data/videos/obama_ref.mp4",
]

charlie = EchoCharlie(
    video_path=video_path,
    transcripts=transcript_json,
    qwen_api_key=boson_key,
    higgs_api_key=boson_key,
    n_frames=1,
    emb_dim=128,
)

video, audio = charlie.forward(out_path=output_audio, references=references)
print(f"Generated speech saved to {audio}")

The reference videos will be added to the vector and audio databases on the first run. Subsequent runs reuse the stored embeddings and metadata.

The generated audio can then be passed to 👉 Wav2Lip with a simple call to the inference.py file to generate the corrected and enhanced video.

Managing the Reference Database

Use EchoCharlie.echo_db.EchoDB.push_video to ingest additional videos.
Call EchoDB.clear_db() during development to wipe both ChromaDB and the SQLite catalog.
Generated audio files are stored in the directory you pass to forward.

Run python -m EchoCharlie.echo_db during development to explore helper methods, or inspect EchoCharlie/streamlit.py for a prototype UI. If you use the Streamlit demo, replace the hard-coded paths with your own media folders.

Notebooks

The nbs/tests/*.ipynb notebooks provide reproducible checks for ChromaDB integration, import validation, and database behavior. Launch them with uv run jupyter lab or your preferred notebook environment.

Troubleshooting

Visual speech model errors – Ensure the LRS3 model weights are present in models/LRS3_V_WER19.1/. Download them manually if necessary.
Audio extraction failures – Confirm FFmpeg is installed and discoverable.
DeepFace embedding shape mismatch – Make sure the emb_dim passed to EchoCharlie matches the embedding dimension of the selected DeepFace model (defaults to 128 for Facenet).
Boson API failures – Verify your key has access to Qwen3-32B and higgs-audio-generation models and that network egress is allowed.

Contributors

Pooja Ravi

Itay Kozlov

Vishnou Vinayagame

License

This project is licensed under the MIT License - see the License file for details

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
EchoCharlie		EchoCharlie
data		data
docs		docs
nbs/tests		nbs/tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
__init__.py		__init__.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Echo Charlie

🎬 Demo

Highlights

Pipeline Overview

Repository Layout

Requirements

Installation

Configuration

Quickstart

Managing the Reference Database

Notebooks

Troubleshooting

Contributors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Echo Charlie

🎬 Demo

Highlights

Pipeline Overview

Repository Layout

Requirements

Installation

Configuration

Quickstart

Managing the Reference Database

Notebooks

Troubleshooting

Contributors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages