-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
Environment
- Hardware: 8x Ascend NPU 910B (64G each)
- Settings: sp_size=8, tile=384
Script
export TORCHDYNAMO_DISABLE=1
export PYTORCH_TRITON_DISABLE=1
export ALBUMENTATIONS_DISABLE_VERSION_CHECK=1
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True # for NPU
export HCCL_CONNECT_TIMEOUT=600
torchrun --standalone --nproc_per_node 8 \
scripts/inference_magicdrive.py \
configs/magicdrive/test/17-16x848x1600_stdit3_CogVAE_boxTDS_wCT_xCE_wSST_map0_fsp8_cfg2.0.py \
--cfg-options model.from_pretrained=./ckpts/MagicDriveDiT-stage3-40k-ft/ema.pt
Issue Description
During the VAE decode process, the script suddenly terminates without any specific error message or stack trace. It exits directly, making it impossible to debug :(
I Traced the issue from the sp_vae() function in inference_magicdrive.py.
Narrowed it down to the AutoencoderKLCogVideoX.tiled_decode() method in vae_cogvideox.py, specifically at the line: tile = self.decoder(tile).
Then the program crashes at this point without logging any details.
Has anyone encountered a similar issue or have ideas on what might be causing this sudden termination? Any suggestions for debugging or potential fixes would be appreciated!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels