Skip to content

Request: publish bundled ltx-av-step-1751000_vocoder_24K checkpoint (or equivalent audio-to-video bundle) #200

@Scottcjn

Description

@Scottcjn

Hi Lightricks team — first, thank you for the excellent open-source work on LTX-2 and for keeping the componentized weights publicly accessible under CC-BY-NC-4.0. The repo layout (audio_vae/, vocoder/, transformer/, text_encoder/, scheduler/) has been genuinely useful for downstream research.

Context

We're building VintageVoice, an open-source TTS fine-tune of F5-TTS on 164 hours of public-domain pre-1955 audio — a preservation project for historical English broadcast speech patterns (transatlantic cadence, newsreel delivery, Edison cylinder voices, etc.). Project writeup: https://github.com/Scottcjn/vintage-voice

We're trying to wire LTX-2 audio-to-video onto our TTS output for lip-synced demo videos — the same pipeline multimodalart/ltx2-audio-to-video demonstrates as a HuggingFace Space. That Space works beautifully but runs on remote compute. We'd like a fully local pipeline on our V100 32GB (Elyan Labs compute cluster) for reproducibility and to not be gated by the Space's free-tier quota.

The specific ask

The official ComfyUI audio-to-video template (video_ltx2_3_ia2v.json, shipped in comfyui_workflow_templates_media_video) references a single-file bundled checkpoint that isn't in the public Lightricks/LTX-2 repo:

ltx-av-step-1751000_vocoder_24K.safetensors

It's referenced by three nodes in the API-format extra.prompt block of the template:

  • CheckpointLoaderSimple (ckpt_name)
  • LTXVGemmaCLIPModelLoader (ltxv_path)
  • LTXVAudioVAELoader (ckpt_name)

Would it be possible to publish this bundled checkpoint (or an equivalent audio-enabled LTX-2/LTX-2.3 bundle) alongside the existing component weights? Even as a separate repo like Lightricks/LTX-AV under the same CC-BY-NC-4.0 license would be perfect. A "research preview" is fine.

Alternative: if the intended path for the ia2v template is to load the existing componentized weights (audio_vae/, vocoder/, transformer/) instead of the bundled file, guidance on how to wire them into the template would be equally valuable — and we'd be glad to contribute a community example workflow back to the LTX-2 repo if that helps.

Why this matters

The ia2v template is currently the clearest documentation of the audio-conditioned LTX-2.3 pipeline available, and researchers who don't have a paid HF Spaces tier are effectively locked out of replicating it locally. Enabling local reproduction helps the broader audio-driven video-diffusion research community.

No commercial use — this is an open research project, and we'll cite appropriately in any writeup.

Thanks for considering!

— Scott Boudreaux, Elyan Labs (https://elyanlabs.ai)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions