-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
System Info
- GPU: NVIDIA GeForce RTX 5090 Blackwell (SM 120)
- CPU: Intel i9-14900K Raptor Lake-S Refresh (32) @ 5.700GHz
- RAM: TeamGroup Delta RGB 64GB DDR5-7600 7600MHz Dual Channel
- OS: Ubuntu 24.04.3 LTS x86_64
- Kernel: 6.14.0-37-generic
- Docker Image: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc4
- Python Version: 3.12.3
- CUDA Version: 13.1.80
Who can help?
@2ez4bz
@yuanjingx87
@karljang
@greg-kwasniewski1
@Wanli-Jiang
Information
- The official example scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...)
Reproduction
- Start the official 1.3.0rc4 container
CUSTOM_SCRATCH="/custom_scratch/trt-llm"
docker run -it --name trtllm-debug \
-u $(id -u):$(id -g) \
-v "$CUSTOM_SCRATCH":/mnt/scratch \
-v /NVME2T/models:/models \
-v /WorkSpaces/TensorRT-LLM:/workspace \
--env "TLLM_DEBUG_MODE=1" \
--env "CCACHE_DIR=/mnt/scratch/ccache" \
--env "CCACHE_CPP2=1" \
--env "CCACHE_MAXSIZE=40G" \
--env "CCACHE_COMPILERCHECK=content" \
--env "CCACHE_SLOPPINESS=time_macros,include_file_mtime,file_macro,system_headers,pch_defines" \
--env "CCACHE_COMPRESS=true" \
--env "CCACHE_DIRECT=true" \
--env "CCACHE_NOHASHDIR=true" \
--env "TMPDIR=/mnt/scratch/pip_build" \
--env "PYTHONPYCACHEPREFIX=/mnt/scratch/pycache" \
--env "JOBLIB_TEMP_FOLDER=/mnt/scratch/joblib" \
--env "TRTLLM_NVCC_FLAGS=$TRTLLM_NVCC_FLAGS" \
--env "MAX_JOBS=1" \
--env "NVCC_THREADS=8" \
--env "CMAKE_GENERATOR=Ninja" \
--env "NINJAFLAGS=-j 1 -l 8 -k 1" \
--env "NINJA_STATUS=[%f/%t%p|%w] " \
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
--gpus all --shm-size=16g \
nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc4 \
bash -c "
audit() { python3 /workspace/1.3.0rc4_audit.py \"\$@\"; }; \
export -f audit; \
[ -f /workspace/after_run.sh ] && . /workspace/after_run.sh; \
exec bash"
#!/bin/bash
# after_run.sh
shopt -s expand_aliases
du -sh "/models/Qwen3-VL-8B-Instruct" "/models/Qwen3-VL-8B-NVFP4-Unified"
ls -lh /models/Qwen3-VL-8B-NVFP4-Unified
file /models/Qwen3-VL-8B-NVFP4-Unified/*.safetensors && \
python3 -c "from safetensors import safe_open; import sys,os; p='/models/Qwen3-VL-8B-NVFP4-Unified'; print(sorted(os.listdir(p))[:10])"
export TLLM_DEBUG_MODE=1
# Try to convert from HF checkpoint to TRT-LLM format
cd /app/tensorrt_llm/examples/models/core/qwen/
python3 convert_checkpoint.py \
--model_dir /models/Qwen3-VL-8B-NVFP4-Unified \
--output_dir /models/Qwen3-VL-Converted \
--dtype float16Expected behavior
The script should recognize qwen3_vl as a valid model type (consistent with release notes claiming Qwen3-VL support) and successfully convert the Hugging Face checkpoint into TensorRT-LLM format, populating /models/Qwen3-VL-Converted with the converted weights and config.json without raising an AssertionError?
actual behavior
AssertionError: Unsupported Qwen type: qwen3_vl, only ('qwen', 'qwen2', 'qwen2_moe', 'qwen2_llava_onevision', 'qwen2_vl', 'qwen2_audio', 'qwen3', 'qwen3_moe') are acceptable.
additional notes
The release notes for v1.3.0rc4 explicitly state:
"Add EPD disagg support for Qwen3 VL MoE (#10962)"
However, when trying to convert a standard Qwen3-VL-8B checkpoint, the Python front end explicitly rejects the model type qwen3_vl. It seems the valid_types list in config.py was not updated to match the C++ back end capabilities.