Skip to content

[Bug] v1.3.0rc4 claims Qwen3-VL support but convert_checkpoint.py throws AssertionError #11569

@CristyNel

Description

@CristyNel

System Info

  • GPU: NVIDIA GeForce RTX 5090 Blackwell (SM 120)
  • CPU: Intel i9-14900K Raptor Lake-S Refresh (32) @ 5.700GHz
  • RAM: TeamGroup Delta RGB 64GB DDR5-7600 7600MHz Dual Channel
  • OS: Ubuntu 24.04.3 LTS x86_64
  • Kernel: 6.14.0-37-generic
  • Docker Image: nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc4
  • Python Version: 3.12.3
  • CUDA Version: 13.1.80

Who can help?

@2ez4bz
@yuanjingx87
@karljang
@greg-kwasniewski1
@Wanli-Jiang

Information

  • The official example scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)

Reproduction

  1. Start the official 1.3.0rc4 container
CUSTOM_SCRATCH="/custom_scratch/trt-llm"
docker run -it --name trtllm-debug \
    -u $(id -u):$(id -g) \
    -v "$CUSTOM_SCRATCH":/mnt/scratch \
    -v /NVME2T/models:/models \
    -v /WorkSpaces/TensorRT-LLM:/workspace \
    --env "TLLM_DEBUG_MODE=1" \
    --env "CCACHE_DIR=/mnt/scratch/ccache" \
    --env "CCACHE_CPP2=1" \
    --env "CCACHE_MAXSIZE=40G" \
    --env "CCACHE_COMPILERCHECK=content" \
    --env "CCACHE_SLOPPINESS=time_macros,include_file_mtime,file_macro,system_headers,pch_defines" \
    --env "CCACHE_COMPRESS=true" \
    --env "CCACHE_DIRECT=true" \
    --env "CCACHE_NOHASHDIR=true" \
	--env "TMPDIR=/mnt/scratch/pip_build" \
	--env "PYTHONPYCACHEPREFIX=/mnt/scratch/pycache" \
	--env "JOBLIB_TEMP_FOLDER=/mnt/scratch/joblib" \
	--env "TRTLLM_NVCC_FLAGS=$TRTLLM_NVCC_FLAGS" \
	--env "MAX_JOBS=1" \
	--env "NVCC_THREADS=8" \
	--env "CMAKE_GENERATOR=Ninja" \
	--env "NINJAFLAGS=-j 1 -l 8 -k 1" \
	--env "NINJA_STATUS=[%f/%t%p|%w] " \
    --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
    --gpus all --shm-size=16g \
    nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc4 \
	bash -c " 
		audit() { python3 /workspace/1.3.0rc4_audit.py \"\$@\"; }; \
        export -f audit; \
        [ -f /workspace/after_run.sh ] && . /workspace/after_run.sh; \
		exec bash"

#!/bin/bash
# after_run.sh
shopt -s expand_aliases

du -sh "/models/Qwen3-VL-8B-Instruct" "/models/Qwen3-VL-8B-NVFP4-Unified"
ls -lh /models/Qwen3-VL-8B-NVFP4-Unified
file /models/Qwen3-VL-8B-NVFP4-Unified/*.safetensors && \
python3 -c "from safetensors import safe_open; import sys,os; p='/models/Qwen3-VL-8B-NVFP4-Unified'; print(sorted(os.listdir(p))[:10])"
export TLLM_DEBUG_MODE=1

# Try to convert from HF checkpoint to TRT-LLM format

cd /app/tensorrt_llm/examples/models/core/qwen/
python3 convert_checkpoint.py \
  --model_dir /models/Qwen3-VL-8B-NVFP4-Unified \
  --output_dir /models/Qwen3-VL-Converted \
  --dtype float16

Expected behavior

The script should recognize qwen3_vl as a valid model type (consistent with release notes claiming Qwen3-VL support) and successfully convert the Hugging Face checkpoint into TensorRT-LLM format, populating /models/Qwen3-VL-Converted with the converted weights and config.json without raising an AssertionError?

actual behavior

AssertionError: Unsupported Qwen type: qwen3_vl, only ('qwen', 'qwen2', 'qwen2_moe', 'qwen2_llava_onevision', 'qwen2_vl', 'qwen2_audio', 'qwen3', 'qwen3_moe') are acceptable.

additional notes

The release notes for v1.3.0rc4 explicitly state:

"Add EPD disagg support for Qwen3 VL MoE (#10962)"

However, when trying to convert a standard Qwen3-VL-8B checkpoint, the Python front end explicitly rejects the model type qwen3_vl. It seems the valid_types list in config.py was not updated to match the C++ back end capabilities.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Model customization<NV>Adding support for new model architectures or variantsbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions