-
Notifications
You must be signed in to change notification settings - Fork 237
Description
Hello, first of all, I would like to express my gratitude for your tremendous contributions to the field of LLM, which have driven significant advancements in the TensorRT-LLM framework. I am a computer science student who is using the TensorRT-LLM framework for the first time. I have some questions (regarding how to convert FP16 or FP8 model files to NVFP4 model files) that I would like to ask all of you.
First, I have read the following information:
For the basic environment, I have already created a container using the relevant command. (I learned that the conversion can be accomplished using https://github.com/NVIDIA/Model-Optimizer or the TensorRT-LLM framework at https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags?version=1.2.0rc2.post1.)
docker run -it --name tensor-llm-alanchen --ipc host --gpus all --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -v /data2:/data2 -v /data3:/data3 -v /data4:/data4 nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc4
Could you please tell me the specific steps to convert FP16 or FP8 model files to NVFP4 model files? Alternatively, are there any easy-to-read tutorials that I can refer to?