Skip to content

Problems on Issues encountered during I2V rcm-training. #25

@ariyali

Description

@ariyali

Thank you very much for providing this framework!

We attempted RCM-based I2V distillation using the teacher model Wan2.1-Fun-V1.1-1.3B-Control-Camera, where the control signals include only the additional y and clip_feature. However, we observed that the distilled student model produces inference results with varying degrees of random blue-colored patches.

During training, we set the warmup steps to 1,000. Notably, these blue artifacts already started appearing in intermediate validation results at around step 1,500, and showed no significant improvement even after training up to 10,000 steps. Moreover, the blue patches are more pronounced when using inference with the reg parameter.

We also experimented with the sigma_max parameter during inference: When sigma_max = 80, the blue patches are relatively smaller, covering approximately 30% of the video area on average, and when sigma_max = 5000, most of the video turns into solid blue.

We would greatly appreciate your insights on potential causes of this issue and whether any special adjustments or optimizations are needed during training.

For reference, we have verified that the teacher model (Wan2.1-Fun-V1.1-1.3B-Control-Camera) performs normal I2V inference under the same conditions: With sigma_max = 5000, it successfully generates coherent I2V results without camera-control signals. But with sigma_max = 80, however, the teacher output becomes highly random and fails to preserve the first-frame information or temporal consistency.

Thank you again for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions