Helios distilled dev#1104
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces native integration for the Helios-Distilled model, supporting both Text-to-Video (T2V) and Image-to-Video (I2V) generation. It adds the necessary model components, including the transformer, text encoder, VAE, scheduler, and runner, along with configurations and shell scripts. The review identified a potential device mismatch in the VAE decoding process when CPU offloading is enabled, as well as incorrect --model_cls arguments in the provided run scripts that would cause validation failures.
| latents_mean = self.latents_mean.to(device=latents.device, dtype=latents.dtype) | ||
| latents_std = self.latents_std.to(device=latents.device, dtype=latents.dtype) | ||
| current_latents = latents.to(self.model.device, dtype=self.model.dtype) / latents_std + latents_mean |
There was a problem hiding this comment.
In decode, latents_mean and latents_std are moved to latents.device and latents.dtype. However, current_latents is constructed on self.model.device and self.model.dtype. If latents are on the CPU (e.g., due to CPU offloading) while self.model is on the GPU, this will cause a device mismatch runtime error (RuntimeError: Expected all tensors to be on the same device...). They should be moved to self.model.device and self.model.dtype instead, matching the implementation in prepare_image_latents.
| latents_mean = self.latents_mean.to(device=latents.device, dtype=latents.dtype) | |
| latents_std = self.latents_std.to(device=latents.device, dtype=latents.dtype) | |
| current_latents = latents.to(self.model.device, dtype=self.model.dtype) / latents_std + latents_mean | |
| latents_mean = self.latents_mean.to(device=self.model.device, dtype=self.model.dtype) | |
| latents_std = self.latents_std.to(device=self.model.device, dtype=self.model.dtype) | |
| current_latents = latents.to(self.model.device, dtype=self.model.dtype) / latents_std + latents_mean |
| source ${lightx2v_path}/scripts/base/base.sh | ||
|
|
||
| python -m lightx2v.infer \ | ||
| --model_cls helios \ |
There was a problem hiding this comment.
The script specifies --model_cls helios, but "helios" is not in the SUPPORTED_MODEL_CLASSES list in lightx2v/infer.py. Running this script will result in an invalid --model_cls error. It should be updated to --model_cls helios_distilled.
| --model_cls helios \ | |
| --model_cls helios_distilled \ |
| source ${lightx2v_path}/scripts/base/base.sh | ||
|
|
||
| python -m lightx2v.infer \ | ||
| --model_cls helios \ |
There was a problem hiding this comment.
The script specifies --model_cls helios, but "helios" is not in the SUPPORTED_MODEL_CLASSES list in lightx2v/infer.py. Running this script will result in an invalid --model_cls error. It should be updated to --model_cls helios_distilled.
| --model_cls helios \ | |
| --model_cls helios_distilled \ |
Summary
This PR adds native
Helios-Distilledintegration to LightX2V following the existingrunner / network / scheduler / text encoder / vaelayering, instead of wrapping the upstreamHeliosPyramidPipelineas a black-box pipeline bridge.The public entry point is narrowed to
model_cls=helios_distilledonly. This avoids advertising generic Helios/Base support that is not actually implemented.Main Changes
lightx2v/models/.../helios/:HeliosModelandHeliosTransformer3DModelHeliosDistilledSchedulerand vendoredHeliosDMDSchedulerHeliosTextEncoder(UMT5 path)HeliosVAEHeliosRunnerfor nativet2v/i2vmodel_cls=helios_distilledset_config.pyt2v/i2vValidation
python -m py_compilepassed for the changed runtime files--model_cls heliosis now rejected explicitly--model_cls helios_distilledruns successfully for locali2vassets/inputs/imgs/girl.pngthe girl is dancing97frames,640x384,24 fpsKnown Limitations
Helios-Distilledonly; it does not support Helios base checkpoints.t2v/i2vCLI and pipeline integration. It does not add Gradio-side Helios model assembly.