Hi, I’m currently unable to find any up-to-date documentation or guidance on properly quantizing Qwen3.5 (MoE/Dense) to NVFP4.
This process was working previously around the time #897 was merged, but now I’m running into issues. I’ve tried both --qformat nvfp4_mlp_only and --qformat nvfp4_experts_only, but neither seems to apply the expected quantization—the exported weights are still roughly equivalent in size to BF16.
I’d really appreciate any guidance or pointers on what might have changed or what I might be missing. Thanks in advance!
@Edwardf0t1 @cjluo-nv
Hi, I’m currently unable to find any up-to-date documentation or guidance on properly quantizing Qwen3.5 (MoE/Dense) to NVFP4.
This process was working previously around the time #897 was merged, but now I’m running into issues. I’ve tried both
--qformat nvfp4_mlp_onlyand--qformat nvfp4_experts_only, but neither seems to apply the expected quantization—the exported weights are still roughly equivalent in size to BF16.I’d really appreciate any guidance or pointers on what might have changed or what I might be missing. Thanks in advance!
@Edwardf0t1 @cjluo-nv