Request: publish bundled ltx-av-step-1751000_vocoder_24K checkpoint (or equivalent audio-to-video bundle)

Hi Lightricks team — first, thank you for the excellent open-source work on LTX-2 and for keeping the componentized weights publicly accessible under CC-BY-NC-4.0. The repo layout (`audio_vae/`, `vocoder/`, `transformer/`, `text_encoder/`, `scheduler/`) has been genuinely useful for downstream research.

## Context

We're building [VintageVoice](https://huggingface.co/AutomatedJanitor/vintage-voice), an open-source TTS fine-tune of F5-TTS on 164 hours of public-domain pre-1955 audio — a preservation project for historical English broadcast speech patterns (transatlantic cadence, newsreel delivery, Edison cylinder voices, etc.). Project writeup: https://github.com/Scottcjn/vintage-voice

We're trying to wire LTX-2 audio-to-video onto our TTS output for lip-synced demo videos — the same pipeline `multimodalart/ltx2-audio-to-video` demonstrates as a HuggingFace Space. That Space works beautifully but runs on remote compute. We'd like a fully **local** pipeline on our V100 32GB (Elyan Labs compute cluster) for reproducibility and to not be gated by the Space's free-tier quota.

## The specific ask

The official ComfyUI audio-to-video template (`video_ltx2_3_ia2v.json`, shipped in `comfyui_workflow_templates_media_video`) references a single-file bundled checkpoint that isn't in the public `Lightricks/LTX-2` repo:

```
ltx-av-step-1751000_vocoder_24K.safetensors
```

It's referenced by three nodes in the API-format `extra.prompt` block of the template:

- `CheckpointLoaderSimple` (`ckpt_name`)
- `LTXVGemmaCLIPModelLoader` (`ltxv_path`)
- `LTXVAudioVAELoader` (`ckpt_name`)

Would it be possible to publish this bundled checkpoint (or an equivalent audio-enabled LTX-2/LTX-2.3 bundle) alongside the existing component weights? Even as a separate repo like `Lightricks/LTX-AV` under the same CC-BY-NC-4.0 license would be perfect. A "research preview" is fine.

**Alternative**: if the intended path for the ia2v template is to load the *existing* componentized weights (`audio_vae/`, `vocoder/`, `transformer/`) instead of the bundled file, guidance on how to wire them into the template would be equally valuable — and we'd be glad to contribute a community example workflow back to the LTX-2 repo if that helps.

## Why this matters

The ia2v template is currently the clearest documentation of the audio-conditioned LTX-2.3 pipeline available, and researchers who don't have a paid HF Spaces tier are effectively locked out of replicating it locally. Enabling local reproduction helps the broader audio-driven video-diffusion research community.

No commercial use — this is an open research project, and we'll cite appropriately in any writeup.

Thanks for considering!

— Scott Boudreaux, Elyan Labs (https://elyanlabs.ai)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: publish bundled ltx-av-step-1751000_vocoder_24K checkpoint (or equivalent audio-to-video bundle) #200

Context

The specific ask

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Request: publish bundled ltx-av-step-1751000_vocoder_24K checkpoint (or equivalent audio-to-video bundle) #200

Description

Context

The specific ask

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions