-
Notifications
You must be signed in to change notification settings - Fork 21
Krea VACE 14b Pipeline - Mixin #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
…-mixin Signed-off-by: BuffMcBigHuge <marco@bymar.co>
ryanontheinside
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments and questions - this does not include any consideration around the cache recomp interactions with VACE that we've all been discussing offline.
| kv_cache["k"][:, :local_end_index] = roped_key | ||
| kv_cache["v"][:, :local_end_index] = v | ||
| # Only update kv_cache if it exists (VACE forward passes kv_cache=None) | ||
| if kv_cache is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We skip this with is_tf = False anyways, but this check is separately unreachable since kv_cache is always None here
| # Initialize optional LoRA adapters on the underlying model BEFORE quantization. | ||
| # Load text encoder before VACE initialization (may be offloaded to CPU) | ||
| start = time.time() | ||
| text_encoder = WanTextEncoderWrapper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should be more discriminate with the CPU offloading of the text encoder or exclude it entirely. I think you had mentioned that the Mixin approach already precludes 5090's with VACE+Krea, in which case we should probably not offload the text encoder at all. However if this does enable 5090+VACE+Krea, that will surely only be for development purposes, so maybe we do this with a flag and document it. As of now, the text encoder is always offloaded for Krea even when there is sufficient VRAM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. I added it in anyways but now I'm realizing that it adds quite a bit of latency when changing prompts. I think supporting 32gb is a stretch and wiser to skip offloading all together.
| new_block.block_id = saved_block_id | ||
|
|
||
| # Move new block to target device/dtype | ||
| new_block = new_block.to(device=orig_device, dtype=orig_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, i think the memory optimization order is reversed. Currently, new_block is moved to GPU before orig_block is moved to CPU causing both blocks to be on GPU simultaneously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I made the change but didn't see any VRAM saving.
| ) | ||
| print(f"Loaded text encoder in {time.time() - start:3f}s") | ||
| # Move text encoder to target device but use dtype of weights | ||
| text_encoder = text_encoder.to(device=device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if device is cuda device, this allocates to GPU, but _init_vace in mixin.py immediately moves to CPU
| # Use assign=True to preserve original tensor dtype (important for FP8 weights) | ||
| missing_keys, unexpected_keys = actual_model.load_state_dict( | ||
| vace_state_dict, strict=False | ||
| vace_state_dict, strict=False, assign=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does assign=True interact correctly with the .to(device, dtype) calls in
mixin.py (lines 157-158)? Those move VACE components to GPU before this
load happens - wondering if assign=True might replace them with CPU tensors
from the checkpoint, making that earlier allocation unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, i suppose this makes the gpu/cpu assignment duplicated. This is now fixed, great find.
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
Signed-off-by: BuffMcBigHuge <marco@bymar.co>
VACE architecture unification
KreaRealtimeVideoPipelinefrom lazy loading to the unifiedVACEEnabledPipelinemixinvace_layerssupport, FP8 quantization, and text encoder CPU offloadingKrea V2V prompt reset bug fix
Code quality improvements
This PR is a derivative of #297.
Note: This branch doesn't allow for Krea + VACE on 32GB VRAM.