Skip to content

What’s the Correct Way to Quantize Qwen3.5 (MoE/Dense) to NVFP4? #1255

@seindum

Description

@seindum

Hi, I’m currently unable to find any up-to-date documentation or guidance on properly quantizing Qwen3.5 (MoE/Dense) to NVFP4.

This process was working previously around the time #897 was merged, but now I’m running into issues. I’ve tried both --qformat nvfp4_mlp_only and --qformat nvfp4_experts_only, but neither seems to apply the expected quantization—the exported weights are still roughly equivalent in size to BF16.

I’d really appreciate any guidance or pointers on what might have changed or what I might be missing. Thanks in advance!

@Edwardf0t1 @cjluo-nv

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionHelp is is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions