Skip to content

Conversation

@BuffMcBigHuge
Copy link
Collaborator

@BuffMcBigHuge BuffMcBigHuge commented Jan 6, 2026

  1. VACE architecture unification

    • Migrated KreaRealtimeVideoPipeline from lazy loading to the unified VACEEnabledPipeline mixin
    • Enhanced mixin with vace_layers support, FP8 quantization, and text encoder CPU offloading
    • Unified VACE handling across all pipelines
  2. Krea V2V prompt reset bug fix

    • Fixed prompt transition issue in V2V/VACE mode
    • Implemented mode-specific temporal interpolation defaults (0 for video mode, 4 for text mode)
    • Frontend now dynamically adjusts transition steps based on input mode
  3. Code quality improvements

    • Refactored KV cache recomputation code (moved helper to class level)
    • Removed optional optimization code for simplicity
    • Resolved merge conflicts cleanly

This PR is a derivative of #297.

Note: This branch doesn't allow for Krea + VACE on 32GB VRAM.

Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
…-mixin

Signed-off-by: BuffMcBigHuge <marco@bymar.co>
@BuffMcBigHuge BuffMcBigHuge marked this pull request as ready for review January 6, 2026 22:20
Copy link
Collaborator

@ryanontheinside ryanontheinside left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments and questions - this does not include any consideration around the cache recomp interactions with VACE that we've all been discussing offline.

kv_cache["k"][:, :local_end_index] = roped_key
kv_cache["v"][:, :local_end_index] = v
# Only update kv_cache if it exists (VACE forward passes kv_cache=None)
if kv_cache is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We skip this with is_tf = False anyways, but this check is separately unreachable since kv_cache is always None here

# Initialize optional LoRA adapters on the underlying model BEFORE quantization.
# Load text encoder before VACE initialization (may be offloaded to CPU)
start = time.time()
text_encoder = WanTextEncoderWrapper(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be more discriminate with the CPU offloading of the text encoder or exclude it entirely. I think you had mentioned that the Mixin approach already precludes 5090's with VACE+Krea, in which case we should probably not offload the text encoder at all. However if this does enable 5090+VACE+Krea, that will surely only be for development purposes, so maybe we do this with a flag and document it. As of now, the text encoder is always offloaded for Krea even when there is sufficient VRAM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I added it in anyways but now I'm realizing that it adds quite a bit of latency when changing prompts. I think supporting 32gb is a stretch and wiser to skip offloading all together.

new_block.block_id = saved_block_id

# Move new block to target device/dtype
new_block = new_block.to(device=orig_device, dtype=orig_dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, i think the memory optimization order is reversed. Currently, new_block is moved to GPU before orig_block is moved to CPU causing both blocks to be on GPU simultaneously

Copy link
Collaborator Author

@BuffMcBigHuge BuffMcBigHuge Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I made the change but didn't see any VRAM saving.

)
print(f"Loaded text encoder in {time.time() - start:3f}s")
# Move text encoder to target device but use dtype of weights
text_encoder = text_encoder.to(device=device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if device is cuda device, this allocates to GPU, but _init_vace in mixin.py immediately moves to CPU

# Use assign=True to preserve original tensor dtype (important for FP8 weights)
missing_keys, unexpected_keys = actual_model.load_state_dict(
vace_state_dict, strict=False
vace_state_dict, strict=False, assign=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does assign=True interact correctly with the .to(device, dtype) calls in
mixin.py (lines 157-158)? Those move VACE components to GPU before this
load happens - wondering if assign=True might replace them with CPU tensors
from the checkpoint, making that earlier allocation unnecessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, i suppose this makes the gpu/cpu assignment duplicated. This is now fixed, great find.

Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants