[None][feat] EXAONE-4.5 Support by yechank-nvidia · Pull Request #12873 · NVIDIA/TensorRT-LLM

yechank-nvidia · 2026-04-09T02:53:21Z

This PR is adding Day-0 support of LG AI's new VLM model, EXAONE-4.5.

This PR includes Text + Multimodal support.

ISL1000 OSL1000 IMG(512,512) BF16	H200	CONC	C8	C16	C32	C64	C128
		Output Token Throughput Per User (tokens/sec/user)	130.85	116.83	96.95	82.09	62.76
		Output Token Throughput (tokens/sec)	988.6	1739.74	2838.79	4560.93	6427.07

ISL1000 OSL1000 IMG(512,512) BF16	B200	CONC	C8	C16	C32	C64	C128
		Output Token Throughput Per User (tokens/sec/user)	194.82	176.8	148.64	123.08	89.24
		Output Token Throughput (tokens/sec)	1451.54	2580.11	4211.01	6443.06	8644.52

Prerequisite

pip install git+https://github.com/nuxlear/transformers.git@add-exaone4_5

Sample command

# Server run
trtllm-serve LGAI-EXAONE/EXAONE-4.5-33B --tp_size 2 --port 8000 --reasoning_parser qwen3

# Quickstart
python3 examples/llm-api/quickstart_multimodal.py --model_dir LGAI-EXAONE/EXAONE-4.5-33B

Summary by CodeRabbit

Release Notes

New Features
- Added support for EXAONE-4.5 multimodal vision-language model
Documentation
- Expanded EXAONE model documentation with EXAONE-4.5 setup and running instructions, including multimodal feature support matrix
Updates
- Updated Transformers library dependency to version 5.3.0

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

coderabbitai · 2026-04-09T02:59:32Z

📝 Walkthrough

Walkthrough

The PR adds EXAONE-4.5 multimodal VLM support with new model implementations and weight mappers. Vision models are refactored to derive dtype/device directly from tensors instead of transformers utilities. Qwen vision models undergo significant RoPE computation and positional embedding pipeline updates. Model loader APIs are updated for kosmos-2, and test infrastructure gains skip mechanisms for conditional test execution.

Changes

Cohort / File(s)	Summary
EXAONE-4.5 Model Support `examples/models/core/exaone/README.md`, `tensorrt_llm/_torch/models/__init__.py`, `tensorrt_llm/_torch/models/modeling_exaone4_5.py`, `tensorrt_llm/_torch/models/checkpoints/hf/exaone4_5_weight_mapper.py`, `tests/unittest/_torch/modeling/test_modeling_exaone4_5.py`	New multimodal VLM implementation with config, input processor, vision model, weight mapper, and comprehensive test coverage. README updated with EXAONE-4.5 documentation and support matrix.
Vision Model dtype/device Refactoring `tensorrt_llm/_torch/models/modeling_clip.py`, `tensorrt_llm/_torch/models/modeling_siglip.py`, `tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py`	Replaced dependency on `transformers.modeling_utils` helpers with direct tensor access from embedding weights, adding explicit return type annotations.
Qwen Vision Model RoPE & Attention Updates `tensorrt_llm/_torch/models/modeling_qwen2vl.py`	Major refactor: added vocab-size helper, replaced fixed Q/K/V splitting with asymmetric GQA support, introduced FlashInfer RoPE fast path, refactored vision attention to use `position_ids`/embeddings parameters, and updated positional rope computation with windowed caching.
Qwen3VL Positional Embedding Pipeline `tensorrt_llm/_torch/models/modeling_qwen3vl.py`	Replaced HF vision rotary embedding with TensorRT-LLM `RotaryEmbedding`, added native bilinear interpolation helper, introduced device/dtype properties, and refactored rope computation with cached position ID generation.
Model Loader API Updates `tensorrt_llm/models/gpt/convert.py`, `tensorrt_llm/tools/multimodal_builder.py`	Updated kosmos-2 loading to use `AutoModelForImageTextToText` instead of deprecated `AutoModelForVision2Seq`.
Server & Utility Model-Type Resolution `tensorrt_llm/serve/chat_utils.py`, `tensorrt_llm/serve/openai_server.py`	Changed model-type source from instance attribute to class type method (`type(model_config).model_type`) for multimodal registry and placeholder decisions.
Configuration & Dependency Updates `requirements.txt`, `tensorrt_llm/_torch/models/modeling_exaone_moe.py`, `tensorrt_llm/_torch/models/modeling_llama.py`	Updated transformers from 4.57.3 to 5.3.0, added `exist_ok=True` to ExaoneMoE config registration, and updated `load_sharded_checkpoint` import path.
Test Framework Enhancements `tests/unittest/_torch/modeling/test_modeling_multimodal.py`	Added `skip_test` and `skip_test_reason` properties to base multimodal test class enabling conditional test skipping.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant InputProc as Input Processor
    participant VisionEnc as Vision Encoder
    participant TextEmbed as Text Embedding
    participant LLM as Language Model
    participant Output

    Client->>InputProc: Text + Images
    InputProc->>InputProc: Preprocess text & images
    InputProc->>InputProc: Fuse multimodal placeholders
    InputProc->>VisionEnc: pixel_values + grid_thw
    VisionEnc->>VisionEnc: Compute windowed RoPE (cos, sin)
    VisionEnc->>VisionEnc: Apply vision attention with position_ids
    VisionEnc-->>InputProc: Vision embeddings
    InputProc->>TextEmbed: Fused input_ids + multimodal_data
    TextEmbed->>TextEmbed: Embed text tokens
    TextEmbed->>LLM: Fused embeddings (text + vision)
    LLM->>LLM: Causal language modeling
    LLM-->>Output: Logits / Tokens

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 19.70% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	PR description lacks required structure and detail specified by the repository template.	Provide structured description with sections for: Description (explaining what and why), Test Coverage (listing relevant tests), and complete the PR Checklist. Ensure clarity and completeness before submission.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main change: adding support for EXAONE-4.5, a new VLM model. It follows the required format with ticket ID, type, and concise summary.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (2)

tensorrt_llm/_torch/models/checkpoints/hf/exaone4_5_weight_mapper.py (1)

19-19: Missing return type annotation.

The preprocess_weights method is missing a return type hint. Per coding guidelines, functions should be annotated with type hints.

✏️ Proposed fix

-    def preprocess_weights(self, weights: dict):
+    def preprocess_weights(self, weights: dict) -> dict:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/checkpoints/hf/exaone4_5_weight_mapper.py` at line
19, The method preprocess_weights(self, weights: dict) is missing a return type
annotation; update its signature to include an explicit return type such as ->
Dict[str, Any] (or Mapping[str, Any] if preferred) and ensure you import the
corresponding typing symbols (Dict and Any or Mapping) at the top of the module
so the signature reads e.g. def preprocess_weights(self, weights: Dict[str,
Any]) -> Dict[str, Any]: while keeping the existing behavior in the
preprocess_weights implementation.

tests/unittest/_torch/modeling/test_modeling_exaone4_5.py (1)

183-186: Hardcoded local path for test weights.

The test config contains a hardcoded developer-specific path (/code/yechan-models/exaone45_beta_2026-03-19_bf16). While the skip_test property handles missing paths gracefully, consider using an environment variable (e.g., EXAONE_4_5_MODEL_PATH) for configurability:

♻️ Suggested improvement

+import os
+
+_EXAONE_4_5_DEFAULT_PATH = "/code/yechan-models/exaone45_beta_2026-03-19_bf16"
+
 EXAONE_4_5_TEST_CONFIG = {
     # ... other config ...
-    "_name_or_path": str(
-        os.path.join("/code/yechan-models", "exaone45_beta_2026-03-19_bf16")
-    ),  # str(os.path.join(llm_models_root(), "Qwen2.5-VL-7B-Instruct"))
+    "_name_or_path": os.environ.get("EXAONE_4_5_MODEL_PATH", _EXAONE_4_5_DEFAULT_PATH),
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/modeling/test_modeling_exaone4_5.py` around lines 183 -
186, Replace the hardcoded developer path in the test config for "_name_or_path"
with an environment-configurable value: read
os.environ.get("EXAONE_4_5_MODEL_PATH") and fall back to the existing
os.path.join("/code/yechan-models", "exaone45_beta_2026-03-19_bf16") if the env
var is not set; keep the str(...) cast and preserve the existing skip_test
behavior that already handles missing paths. Update the assignment where
"_name_or_path" is set in
tests/unittest/_torch/modeling/test_modeling_exaone4_5.py to use this env var
fallback.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/models/core/exaone/README.md`:
- Around line 102-106: Replace the unresolved TODOs in the EXAONE-4.5 README:
set the actual HuggingFace repo name in the git clone command (replace `<TODO:
FILL>` in the git clone URL that follows HF_MODEL_DIR) and update the example's
expected output section (the lines labeled `TODO: FILL` around the expected
output) with the real output produced by the model so the README contains the
correct repository path and sample result.

In `@tensorrt_llm/_torch/models/modeling_exaone4_5.py`:
- Around line 235-237: Replace the runtime type check in load_weights to avoid
using assert: explicitly verify that weight_mapper is an instance of
Exaone4_5HfWeightMapper and if not raise a TypeError with a clear message (e.g.,
indicating expected Exaone4_5HfWeightMapper but got type(weight_mapper)); then
proceed to call weight_mapper.preprocess_weights(weights). Ensure you reference
the load_weights method and the Exaone4_5HfWeightMapper/BaseWeightMapper types
when making the change.
- Around line 175-189: Exaone4_5_VLModel is processing all multimodal_params
including text-only entries; call the parent filter to only keep requests with
actual multimodal data before calling the encoder. Modify the block handling
multimodal_params to first call
self._get_requests_with_mm_data(multimodal_params) (or assign its return to a
local filtered list) and use that filtered list when deciding to call
get_multimodal_embeddings, then pass the filtered mm_embeds into
find_input_mm_embeds; ensure you still raise NotImplementedError for
disaggregated mode in the same place.

In `@tensorrt_llm/_torch/models/modeling_qwen2vl.py`:
- Around line 914-916: The prepare_attn_metadata method in modeling_qwen2vl.py
shadows the incoming batch_size parameter by reassigning it to len(seq_lens);
remove that reassignment so the passed batch_size is used, mirroring the fix
from modeling_qwen3vl.py; then ensure any callers that currently rely on the old
behavior (e.g., the call sites referenced around lines 976-980) are updated to
pass the correct batch_size value (or compute len(seq_lens) before calling) so
prepare_attn_metadata(batch_size: int, seq_lens: List[int], attn_metadata:
AttentionMetadata) uses its batch_size argument as intended.

In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py`:
- Around line 768-771: The prepare_attn_metadata function currently shadows its
batch_size parameter with batch_size = len(seq_lens); remove the parameter from
prepare_attn_metadata's signature and update every call site that passes
batch_size (e.g., where prepare_attn_metadata(...) is invoked) to stop supplying
that argument, or alternatively keep the parameter and delete the reassignment
so the passed batch_size is used; modify the function signature and all
references consistently (function name: prepare_attn_metadata, local variable:
seq_lens) so there is no shadowing or dead parameter.

In `@tensorrt_llm/serve/chat_utils.py`:
- Around line 289-290: The line initializing MultimodalDataTracker is
misformatted causing pre-commit/yapf failures; reformat the statement that
constructs MultimodalDataTracker(type(model_config).model_type,
multimodal_server_config) to satisfy the project's formatter (wrap arguments
across lines or adjust spacing consistent with other calls), then run the
project's pre-commit hooks or `yapf` to enforce line wrapping for this and any
adjacent function calls; ensure you update the call sites referencing
MultimodalDataTracker, model_config, and multimodal_server_config so they
conform to the repository's line-length and formatting rules.
- Around line 305-307: Remove the dead local assignment "model_type =
model_config.model_type" since subsequent calls (e.g.,
MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(type(model_config).model_type))
use type(model_config).model_type instead; delete the unused "model_type"
variable or alternatively replace other uses to reference the local variable
consistently, ensuring references around model_config and
MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format remain correct.

---

Nitpick comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/exaone4_5_weight_mapper.py`:
- Line 19: The method preprocess_weights(self, weights: dict) is missing a
return type annotation; update its signature to include an explicit return type
such as -> Dict[str, Any] (or Mapping[str, Any] if preferred) and ensure you
import the corresponding typing symbols (Dict and Any or Mapping) at the top of
the module so the signature reads e.g. def preprocess_weights(self, weights:
Dict[str, Any]) -> Dict[str, Any]: while keeping the existing behavior in the
preprocess_weights implementation.

In `@tests/unittest/_torch/modeling/test_modeling_exaone4_5.py`:
- Around line 183-186: Replace the hardcoded developer path in the test config
for "_name_or_path" with an environment-configurable value: read
os.environ.get("EXAONE_4_5_MODEL_PATH") and fall back to the existing
os.path.join("/code/yechan-models", "exaone45_beta_2026-03-19_bf16") if the env
var is not set; keep the str(...) cast and preserve the existing skip_test
behavior that already handles missing paths. Update the assignment where
"_name_or_path" is set in
tests/unittest/_torch/modeling/test_modeling_exaone4_5.py to use this env var
fallback.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a4fc1689-50d7-46db-9958-52340c2e4a19

📥 Commits

Reviewing files that changed from the base of the PR and between b4a4ce0 and d1c8f67.

📒 Files selected for processing (18)

examples/models/core/exaone/README.md
requirements.txt
tensorrt_llm/_torch/models/__init__.py
tensorrt_llm/_torch/models/checkpoints/hf/exaone4_5_weight_mapper.py
tensorrt_llm/_torch/models/modeling_clip.py
tensorrt_llm/_torch/models/modeling_exaone4_5.py
tensorrt_llm/_torch/models/modeling_exaone_moe.py
tensorrt_llm/_torch/models/modeling_llama.py
tensorrt_llm/_torch/models/modeling_qwen2vl.py
tensorrt_llm/_torch/models/modeling_qwen3vl.py
tensorrt_llm/_torch/models/modeling_siglip.py
tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py
tensorrt_llm/models/gpt/convert.py
tensorrt_llm/serve/chat_utils.py
tensorrt_llm/serve/openai_server.py
tensorrt_llm/tools/multimodal_builder.py
tests/unittest/_torch/modeling/test_modeling_exaone4_5.py
tests/unittest/_torch/modeling/test_modeling_multimodal.py

coderabbitai · 2026-04-09T02:59:35Z

examples/models/core/exaone/README.md

+```bash
+export HF_MODEL_DIR=hf_models/exaone4_5
+git clone https://huggingface.co/LGAI-EXAONE/<TODO: FILL> $HF_MODEL_DIR
+```
+


⚠️ Potential issue | 🟡 Minor

Unresolved TODO placeholders in documentation.

The EXAONE-4.5 section contains placeholders that need to be filled:

Line 104: <TODO: FILL> in the git clone URL

Lines 143-145: TODO: FILL for expected output

Please update these with the actual HuggingFace model repository name and expected output before merging.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/models/core/exaone/README.md` around lines 102 - 106, Replace the unresolved TODOs in the EXAONE-4.5 README: set the actual HuggingFace repo name in the git clone command (replace `<TODO: FILL>` in the git clone URL that follows HF_MODEL_DIR) and update the example's expected output section (the lines labeled `TODO: FILL` around the expected output) with the real output produced by the model so the README contains the correct repository path and sample result.

coderabbitai · 2026-04-09T02:59:35Z

tensorrt_llm/_torch/models/modeling_exaone4_5.py

+        multimodal_params = kwargs.get("multimodal_params", [])
+        mm_embeds = []
+
+        if len(multimodal_params) > 0:
+            if not _is_disagg():
+                mm_embeds = get_multimodal_embeddings(
+                    encoder_forward_fn=self.mm_encoder.forward,
+                    multimodal_params=multimodal_params,
+                )
+            else:
+                raise NotImplementedError(
+                    "Exaone4.5-VL does not support disaggregated inference yet. "
+                    "Unset TLLM_MULTIMODAL_DISAGGREGATED or set it to '0'."
+                )
+            mm_embeds = find_input_mm_embeds(mm_embeds, multimodal_params)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if Exaone4_5InputProcessor populates multimodal_params for text-only prompts rg -n "multimodal_data" tensorrt_llm/_torch/models/modeling_exaone4_5.py -A 3 -B 3

Repository: NVIDIA/TensorRT-LLM

Length of output: 1333

🏁 Script executed:

# First, let's see the full Exaone4_5InputProcessor implementation to understand multimodal_params creation fd "modeling_exaone4_5.py" | xargs wc -l

Repository: NVIDIA/TensorRT-LLM

Length of output: 193

🏁 Script executed:

# Get the complete InputProcessor class and how it creates multimodal_params cat -n tensorrt_llm/_torch/models/modeling_exaone4_5.py | sed -n '60,115p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 2460

🏁 Script executed:

# Check how forward method at lines 175-189 uses multimodal_params cat -n tensorrt_llm/_torch/models/modeling_exaone4_5.py | sed -n '170,195p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 1268

🏁 Script executed:

# Look for the get_multimodal_embeddings function to understand what it does with multimodal_params rg -n "def get_multimodal_embeddings" tensorrt_llm/_torch/ -A 10

Repository: NVIDIA/TensorRT-LLM

Length of output: 1115

🏁 Script executed:

# Compare with Qwen2VL's _get_requests_with_mm_data filtering mentioned in the review rg -n "_get_requests_with_mm_data|get_multimodal_embeddings" tensorrt_llm/_torch/models/ -B 2 -A 5

Repository: NVIDIA/TensorRT-LLM

Length of output: 16857

🏁 Script executed:

# Check Exaone4_5ForCausalLM class definition and inheritance cat -n tensorrt_llm/_torch/models/modeling_exaone4_5.py | sed -n '130,170p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 1907

🏁 Script executed:

# Check if Exaone4_5ForCausalLM or its parent has _get_requests_with_mm_data rg -n "_get_requests_with_mm_data" tensorrt_llm/_torch/models/modeling_exaone4_5.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 45

🏁 Script executed:

# Check Qwen2VLModelBase to see if it has _get_requests_with_mm_data rg -n "class Qwen2VLModelBase|def _get_requests_with_mm_data" tensorrt_llm/_torch/models/modeling_qwen2vl.py -A 1

Repository: NVIDIA/TensorRT-LLM

Length of output: 221

🏁 Script executed:

# Check the full class declaration for Exaone4_5ForCausalLM rg -n "class Exaone4_5ForCausalLM" tensorrt_llm/_torch/models/modeling_exaone4_5.py -A 5

Repository: NVIDIA/TensorRT-LLM

Length of output: 45

🏁 Script executed:

# Get the full _get_requests_with_mm_data method from Qwen2VLModelBase cat -n tensorrt_llm/_torch/models/modeling_qwen2vl.py | sed -n '1163,1180p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 1140

🏁 Script executed:

# Search for the Exaone4_5ForCausalLM class definition in the file rg -n "class Exaone4_5" tensorrt_llm/_torch/models/modeling_exaone4_5.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 334

🏁 Script executed:

# Get full context around line 120 to see the class definition cat -n tensorrt_llm/_torch/models/modeling_exaone4_5.py | sed -n '115,135p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 922

🏁 Script executed:

# Verify that the forward method at lines 166-189 is in Exaone4_5_VLModel cat -n tensorrt_llm/_torch/models/modeling_exaone4_5.py | sed -n '124,200p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 3413

Add filtering of multimodal_params to match Qwen2VLModelBase behavior.

Exaone4_5_VLModel inherits from Qwen2VLModelBase but doesn't use the parent's _get_requests_with_mm_data() method to filter multimodal_params. The parent implementation and Qwen3VLModel both filter to only process entries with actual multimodal data (image/video), because Qwen-VL models include mrope_config entries even for text-only prompts. Without filtering, empty entries are unnecessarily passed to the encoder. Consider calling self._get_requests_with_mm_data(multimodal_params) before processing, similar to the parent class.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_exaone4_5.py` around lines 175 - 189, Exaone4_5_VLModel is processing all multimodal_params including text-only entries; call the parent filter to only keep requests with actual multimodal data before calling the encoder. Modify the block handling multimodal_params to first call self._get_requests_with_mm_data(multimodal_params) (or assign its return to a local filtered list) and use that filtered list when deciding to call get_multimodal_embeddings, then pass the filtered mm_embeds into find_input_mm_embeds; ensure you still raise NotImplementedError for disaggregated mode in the same place.

coderabbitai · 2026-04-09T02:59:35Z

tensorrt_llm/_torch/models/modeling_exaone4_5.py

+    def load_weights(self, weights, weight_mapper: BaseWeightMapper):
+        assert isinstance(weight_mapper, Exaone4_5HfWeightMapper)
+        weights = weight_mapper.preprocess_weights(weights)


⚠️ Potential issue | 🟡 Minor

Avoid assert for runtime type validation in production code.

Using assert for type checking can be bypassed when Python is run with -O (optimizations). Use an explicit check with raise TypeError instead.

🛡️ Proposed fix

def load_weights(self, weights, weight_mapper: BaseWeightMapper): - assert isinstance(weight_mapper, Exaone4_5HfWeightMapper) + if not isinstance(weight_mapper, Exaone4_5HfWeightMapper): + raise TypeError( + f"Expected Exaone4_5HfWeightMapper, got {type(weight_mapper).__name__}" + ) weights = weight_mapper.preprocess_weights(weights)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_exaone4_5.py` around lines 235 - 237, Replace the runtime type check in load_weights to avoid using assert: explicitly verify that weight_mapper is an instance of Exaone4_5HfWeightMapper and if not raise a TypeError with a clear message (e.g., indicating expected Exaone4_5HfWeightMapper but got type(weight_mapper)); then proceed to call weight_mapper.preprocess_weights(weights). Ensure you reference the load_weights method and the Exaone4_5HfWeightMapper/BaseWeightMapper types when making the change.

coderabbitai · 2026-04-09T02:59:35Z

tensorrt_llm/_torch/models/modeling_qwen2vl.py

+    def prepare_attn_metadata(self, batch_size: int, seq_lens: List[int],
+                              attn_metadata: AttentionMetadata):
+        batch_size = len(seq_lens)


⚠️ Potential issue | 🟡 Minor

Unused batch_size parameter is immediately shadowed.

Same issue as in modeling_qwen3vl.py: the batch_size parameter is passed but immediately overwritten. This should be consistent with the fix applied to qwen3vl.

🧹 Proposed fix

- def prepare_attn_metadata(self, batch_size: int, seq_lens: List[int], + def prepare_attn_metadata(self, seq_lens: List[int], attn_metadata: AttentionMetadata): batch_size = len(seq_lens)

Update call sites at lines 976-980:

- full_attn_metadata = self.prepare_attn_metadata(len(grid_rows), - seq_lens, + full_attn_metadata = self.prepare_attn_metadata(seq_lens, self.full_attn_metadata) - window_attn_metadata = self.prepare_attn_metadata( - len(grid_rows), window_seq_lens, self.window_attn_metadata) + window_attn_metadata = self.prepare_attn_metadata(window_seq_lens, + self.window_attn_metadata)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_qwen2vl.py` around lines 914 - 916, The prepare_attn_metadata method in modeling_qwen2vl.py shadows the incoming batch_size parameter by reassigning it to len(seq_lens); remove that reassignment so the passed batch_size is used, mirroring the fix from modeling_qwen3vl.py; then ensure any callers that currently rely on the old behavior (e.g., the call sites referenced around lines 976-980) are updated to pass the correct batch_size value (or compute len(seq_lens) before calling) so prepare_attn_metadata(batch_size: int, seq_lens: List[int], attn_metadata: AttentionMetadata) uses its batch_size argument as intended.

coderabbitai · 2026-04-09T02:59:35Z

tensorrt_llm/_torch/models/modeling_qwen3vl.py

+    def prepare_attn_metadata(
+        self, batch_size: int, seq_lens: List[int], attn_metadata: AttentionMetadata
+    ):
        batch_size = len(seq_lens)


⚠️ Potential issue | 🟡 Minor

Unused batch_size parameter is immediately shadowed.

The batch_size parameter is passed to prepare_attn_metadata but immediately overwritten by batch_size = len(seq_lens) on line 771. Either remove the parameter or remove the reassignment.

🧹 Proposed fix (remove parameter)

def prepare_attn_metadata( - self, batch_size: int, seq_lens: List[int], attn_metadata: AttentionMetadata + self, seq_lens: List[int], attn_metadata: AttentionMetadata ): batch_size = len(seq_lens)

Then update the call site at line 789:

- attn_metadata = self.prepare_attn_metadata(len(grid_thw), seq_lens, self.attn_metadata) + attn_metadata = self.prepare_attn_metadata(seq_lens, self.attn_metadata)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def prepare_attn_metadata(

self, batch_size: int, seq_lens: List[int], attn_metadata: AttentionMetadata

):

batch_size = len(seq_lens)

def prepare_attn_metadata(

self, seq_lens: List[int], attn_metadata: AttentionMetadata

):

batch_size = len(seq_lens)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py` around lines 768 - 771, The prepare_attn_metadata function currently shadows its batch_size parameter with batch_size = len(seq_lens); remove the parameter from prepare_attn_metadata's signature and update every call site that passes batch_size (e.g., where prepare_attn_metadata(...) is invoked) to stop supplying that argument, or alternatively keep the parameter and delete the reassignment so the passed batch_size is used; modify the function signature and all references consistently (function name: prepare_attn_metadata, local variable: seq_lens) so there is no shadowing or dead parameter.

coderabbitai · 2026-04-09T02:59:35Z

tensorrt_llm/serve/chat_utils.py

+    mm_data_tracker = MultimodalDataTracker(
+        type(model_config).model_type, multimodal_server_config)


⚠️ Potential issue | 🟡 Minor

Pipeline failure: formatting enforcement.

The CI indicates a pre-commit formatting failure on this file. Please run yapf or the pre-commit hooks to fix line wrapping around the MultimodalDataTracker initialization and function calls.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/serve/chat_utils.py` around lines 289 - 290, The line initializing MultimodalDataTracker is misformatted causing pre-commit/yapf failures; reformat the statement that constructs MultimodalDataTracker(type(model_config).model_type, multimodal_server_config) to satisfy the project's formatter (wrap arguments across lines or adjust spacing consistent with other calls), then run the project's pre-commit hooks or `yapf` to enforce line wrapping for this and any adjacent function calls; ensure you update the call sites referencing MultimodalDataTracker, model_config, and multimodal_server_config so they conform to the repository's line-length and formatting rules.

coderabbitai · 2026-04-09T02:59:35Z

tensorrt_llm/serve/chat_utils.py

    model_type = model_config.model_type
    registry_format = MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(
-        model_type)
+        type(model_config).model_type)


⚠️ Potential issue | 🟡 Minor

Unused variable model_type after refactor.

Line 305 assigns model_type = model_config.model_type, but subsequent code (lines 307, 336, 339, 343) now uses type(model_config).model_type instead. This leaves model_type as dead code.

🧹 Proposed fix to remove unused variable

- model_type = model_config.model_type registry_format = MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format( type(model_config).model_type)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

model_type = model_config.model_type

registry_format = MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(

model_type)

type(model_config).model_type)

registry_format = MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(

type(model_config).model_type)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/serve/chat_utils.py` around lines 305 - 307, Remove the dead local assignment "model_type = model_config.model_type" since subsequent calls (e.g., MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(type(model_config).model_type)) use type(model_config).model_type instead; delete the unused "model_type" variable or alternatively replace other uses to reference the local variable consistently, ensuring references around model_config and MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format remain correct.

yechank-nvidia · 2026-04-10T00:58:32Z

/bot run

tensorrt-cicd · 2026-04-10T01:04:17Z

PR_Github #42598 [ run ] triggered by Bot. Commit: d1c8f67 Link to invocation

tensorrt-cicd · 2026-04-10T02:25:12Z

PR_Github #42598 [ run ] completed with state FAILURE. Commit: d1c8f67
/LLM/main/L0_MergeRequest_PR pipeline #33324 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

yechank-nvidia added 12 commits April 6, 2026 09:27

EXAONE-4.5 support

fdb1b48

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

address model_type issue

a8ec75a

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

transformer 5.0.0 compatibility

304b613

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

remove print

a2ec83f

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

optimize Qwen2.5 Vision Encoder

2b670ac

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

add EXAONE-4.5 to the README.md

bb91805

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

add unittest

6e67024

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

update PLACEHOLDERS

923db4f

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

modify Qwen3-VL Vision Encoder

12292fd

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

refine test

17a68b5

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

fix weight loading

403c349

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

compatibility matching to latest model HF

d1c8f67

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

yechank-nvidia requested review from a team as code owners April 9, 2026 02:53

yechank-nvidia requested review from QiJune, byshiue, dongjiyingdjy, nv-guomingz, qixiang-99 and syuoni April 9, 2026 02:53

github-actions bot assigned yechank-nvidia Apr 9, 2026

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

		mm_data_tracker = MultimodalDataTracker(
		type(model_config).model_type, multimodal_server_config)

Conversation

yechank-nvidia commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

yechank-nvidia commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

tensorrt-cicd commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yechank-nvidia commented Apr 9, 2026 •

edited

Loading

coderabbitai bot commented Apr 9, 2026 •

edited

Loading