How to convert FP16 or FP8 model files to NVFP4 model files

Hello, first of all, I would like to express my gratitude for your tremendous contributions to the field of LLM, which have driven significant advancements in the TensorRT-LLM framework. I am a computer science student who is using the TensorRT-LLM framework for the first time. I have some questions (regarding how to convert FP16 or FP8 model files to NVFP4 model files) that I would like to ask all of you.
First, I have read the following information:
For the basic environment, I have already created a container using the relevant command. (I learned that the conversion can be accomplished using https://github.com/NVIDIA/Model-Optimizer or the TensorRT-LLM framework at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags?version=1.2.0rc2.post1.)

`docker run -it --name tensor-llm-alanchen --ipc host --gpus all --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -v /data2:/data2 -v /data3:/data3 -v /data4:/data4 nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc4`

Could you please tell me the specific steps to convert FP16 or FP8 model files to NVFP4 model files? Alternatively, are there any easy-to-read tutorials that I can refer to?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to convert FP16 or FP8 model files to NVFP4 model files #770

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to convert FP16 or FP8 model files to NVFP4 model files #770

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions