Skip to content

How to convert FP16 or FP8 model files to NVFP4 model files #770

@Alan-D-Chen

Description

@Alan-D-Chen

Hello, first of all, I would like to express my gratitude for your tremendous contributions to the field of LLM, which have driven significant advancements in the TensorRT-LLM framework. I am a computer science student who is using the TensorRT-LLM framework for the first time. I have some questions (regarding how to convert FP16 or FP8 model files to NVFP4 model files) that I would like to ask all of you.
First, I have read the following information:
For the basic environment, I have already created a container using the relevant command. (I learned that the conversion can be accomplished using https://github.com/NVIDIA/Model-Optimizer or the TensorRT-LLM framework at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags?version=1.2.0rc2.post1.)

docker run -it --name tensor-llm-alanchen --ipc host --gpus all --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -v /data2:/data2 -v /data3:/data3 -v /data4:/data4 nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc4

Could you please tell me the specific steps to convert FP16 or FP8 model files to NVFP4 model files? Alternatively, are there any easy-to-read tutorials that I can refer to?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionHelp is is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions