Skip to content

UPSTREAM PR #21095: convert: Add compressed-tensors NVFP4 conversion#1317

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-21095-nvfp4-hf-comptens
Open

UPSTREAM PR #21095: convert: Add compressed-tensors NVFP4 conversion#1317
loci-dev wants to merge 1 commit intomainfrom
loci/pr-21095-nvfp4-hf-comptens

Conversation

@loci-dev
Copy link
Copy Markdown

Note

Source pull request: ggml-org/llama.cpp#21095

This update expands the convert_hf_to_gguf script to support converting Huggingface NVFP4 models quantized with compressed-tensors. Previously, only ModelOpt quantized models were compatible and an error was raised.

It finds the values and names used by compressed-tensors (eg, weight_global_scale instead of weight_scale_2 for the tensor scale) and renames them to the ModelOpt equivalents so that the rest of the conversion remains identical. This keeps the update small. The weights themselves do not need any adaptation; the only other difference is that the scales become reciprocal values.

@loci-review
Copy link
Copy Markdown

loci-review bot commented Mar 30, 2026

No meaningful performance changes were detected across 123908 analyzed functions in the following binaries: build.bin.libllama.so, build.bin.llama-tts, build.bin.llama-cvector-generator, build.bin.llama-bench, build.bin.libmtmd.so, build.bin.libggml.so, build.bin.libggml-cpu.so, build.bin.libggml-base.so, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli, build.bin.llama-tokenize.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

@loci-dev loci-dev force-pushed the main branch 10 times, most recently from fd3ce9d to 1770118 Compare April 6, 2026 02:18
@loci-dev loci-dev force-pushed the main branch 8 times, most recently from 385b1fc to 06d9e10 Compare April 13, 2026 02:18
@loci-dev loci-dev force-pushed the main branch 8 times, most recently from 7638ab4 to f1b46d5 Compare April 20, 2026 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants