Skip to content

[None][feat] EXAONE-4.5 Support#12873

Open
yechank-nvidia wants to merge 12 commits intoNVIDIA:mainfrom
yechank-nvidia:exaone_4_5
Open

[None][feat] EXAONE-4.5 Support#12873
yechank-nvidia wants to merge 12 commits intoNVIDIA:mainfrom
yechank-nvidia:exaone_4_5

Conversation

@yechank-nvidia
Copy link
Copy Markdown
Collaborator

@yechank-nvidia yechank-nvidia commented Apr 9, 2026

This PR is adding Day-0 support of LG AI's new VLM model, EXAONE-4.5.

This PR includes Text + Multimodal support.

ISL1000 OSL1000 IMG(512,512) BF16 H200 CONC C8 C16 C32 C64 C128
    Output Token Throughput Per User (tokens/sec/user) 130.85 116.83 96.95 82.09 62.76
    Output Token Throughput (tokens/sec) 988.6 1739.74 2838.79 4560.93 6427.07
ISL1000 OSL1000 IMG(512,512) BF16 B200 CONC C8 C16 C32 C64 C128
    Output Token Throughput Per User (tokens/sec/user) 194.82 176.8 148.64 123.08 89.24
    Output Token Throughput (tokens/sec) 1451.54 2580.11 4211.01 6443.06 8644.52

Prerequisite

pip install git+https://github.com/nuxlear/transformers.git@add-exaone4_5

Sample command

# Server run
trtllm-serve LGAI-EXAONE/EXAONE-4.5-33B --tp_size 2 --port 8000 --reasoning_parser qwen3

# Quickstart
python3 examples/llm-api/quickstart_multimodal.py --model_dir LGAI-EXAONE/EXAONE-4.5-33B

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for EXAONE-4.5 multimodal vision-language model
  • Documentation

    • Expanded EXAONE model documentation with EXAONE-4.5 setup and running instructions, including multimodal feature support matrix
  • Updates

    • Updated Transformers library dependency to version 5.3.0

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

The PR adds EXAONE-4.5 multimodal VLM support with new model implementations and weight mappers. Vision models are refactored to derive dtype/device directly from tensors instead of transformers utilities. Qwen vision models undergo significant RoPE computation and positional embedding pipeline updates. Model loader APIs are updated for kosmos-2, and test infrastructure gains skip mechanisms for conditional test execution.

Changes

Cohort / File(s) Summary
EXAONE-4.5 Model Support
examples/models/core/exaone/README.md, tensorrt_llm/_torch/models/__init__.py, tensorrt_llm/_torch/models/modeling_exaone4_5.py, tensorrt_llm/_torch/models/checkpoints/hf/exaone4_5_weight_mapper.py, tests/unittest/_torch/modeling/test_modeling_exaone4_5.py
New multimodal VLM implementation with config, input processor, vision model, weight mapper, and comprehensive test coverage. README updated with EXAONE-4.5 documentation and support matrix.
Vision Model dtype/device Refactoring
tensorrt_llm/_torch/models/modeling_clip.py, tensorrt_llm/_torch/models/modeling_siglip.py, tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py
Replaced dependency on transformers.modeling_utils helpers with direct tensor access from embedding weights, adding explicit return type annotations.
Qwen Vision Model RoPE & Attention Updates
tensorrt_llm/_torch/models/modeling_qwen2vl.py
Major refactor: added vocab-size helper, replaced fixed Q/K/V splitting with asymmetric GQA support, introduced FlashInfer RoPE fast path, refactored vision attention to use position_ids/embeddings parameters, and updated positional rope computation with windowed caching.
Qwen3VL Positional Embedding Pipeline
tensorrt_llm/_torch/models/modeling_qwen3vl.py
Replaced HF vision rotary embedding with TensorRT-LLM RotaryEmbedding, added native bilinear interpolation helper, introduced device/dtype properties, and refactored rope computation with cached position ID generation.
Model Loader API Updates
tensorrt_llm/models/gpt/convert.py, tensorrt_llm/tools/multimodal_builder.py
Updated kosmos-2 loading to use AutoModelForImageTextToText instead of deprecated AutoModelForVision2Seq.
Server & Utility Model-Type Resolution
tensorrt_llm/serve/chat_utils.py, tensorrt_llm/serve/openai_server.py
Changed model-type source from instance attribute to class type method (type(model_config).model_type) for multimodal registry and placeholder decisions.
Configuration & Dependency Updates
requirements.txt, tensorrt_llm/_torch/models/modeling_exaone_moe.py, tensorrt_llm/_torch/models/modeling_llama.py
Updated transformers from 4.57.3 to 5.3.0, added exist_ok=True to ExaoneMoE config registration, and updated load_sharded_checkpoint import path.
Test Framework Enhancements
tests/unittest/_torch/modeling/test_modeling_multimodal.py
Added skip_test and skip_test_reason properties to base multimodal test class enabling conditional test skipping.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant InputProc as Input Processor
    participant VisionEnc as Vision Encoder
    participant TextEmbed as Text Embedding
    participant LLM as Language Model
    participant Output

    Client->>InputProc: Text + Images
    InputProc->>InputProc: Preprocess text & images
    InputProc->>InputProc: Fuse multimodal placeholders
    InputProc->>VisionEnc: pixel_values + grid_thw
    VisionEnc->>VisionEnc: Compute windowed RoPE (cos, sin)
    VisionEnc->>VisionEnc: Apply vision attention with position_ids
    VisionEnc-->>InputProc: Vision embeddings
    InputProc->>TextEmbed: Fused input_ids + multimodal_data
    TextEmbed->>TextEmbed: Embed text tokens
    TextEmbed->>LLM: Fused embeddings (text + vision)
    LLM->>LLM: Causal language modeling
    LLM-->>Output: Logits / Tokens
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive PR description lacks required structure and detail specified by the repository template. Provide structured description with sections for: Description (explaining what and why), Test Coverage (listing relevant tests), and complete the PR Checklist. Ensure clarity and completeness before submission.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: adding support for EXAONE-4.5, a new VLM model. It follows the required format with ticket ID, type, and concise summary.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
tensorrt_llm/_torch/models/checkpoints/hf/exaone4_5_weight_mapper.py (1)

19-19: Missing return type annotation.

The preprocess_weights method is missing a return type hint. Per coding guidelines, functions should be annotated with type hints.

✏️ Proposed fix
-    def preprocess_weights(self, weights: dict):
+    def preprocess_weights(self, weights: dict) -> dict:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/checkpoints/hf/exaone4_5_weight_mapper.py` at line
19, The method preprocess_weights(self, weights: dict) is missing a return type
annotation; update its signature to include an explicit return type such as ->
Dict[str, Any] (or Mapping[str, Any] if preferred) and ensure you import the
corresponding typing symbols (Dict and Any or Mapping) at the top of the module
so the signature reads e.g. def preprocess_weights(self, weights: Dict[str,
Any]) -> Dict[str, Any]: while keeping the existing behavior in the
preprocess_weights implementation.
tests/unittest/_torch/modeling/test_modeling_exaone4_5.py (1)

183-186: Hardcoded local path for test weights.

The test config contains a hardcoded developer-specific path (/code/yechan-models/exaone45_beta_2026-03-19_bf16). While the skip_test property handles missing paths gracefully, consider using an environment variable (e.g., EXAONE_4_5_MODEL_PATH) for configurability:

♻️ Suggested improvement
+import os
+
+_EXAONE_4_5_DEFAULT_PATH = "/code/yechan-models/exaone45_beta_2026-03-19_bf16"
+
 EXAONE_4_5_TEST_CONFIG = {
     # ... other config ...
-    "_name_or_path": str(
-        os.path.join("/code/yechan-models", "exaone45_beta_2026-03-19_bf16")
-    ),  # str(os.path.join(llm_models_root(), "Qwen2.5-VL-7B-Instruct"))
+    "_name_or_path": os.environ.get("EXAONE_4_5_MODEL_PATH", _EXAONE_4_5_DEFAULT_PATH),
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/modeling/test_modeling_exaone4_5.py` around lines 183 -
186, Replace the hardcoded developer path in the test config for "_name_or_path"
with an environment-configurable value: read
os.environ.get("EXAONE_4_5_MODEL_PATH") and fall back to the existing
os.path.join("/code/yechan-models", "exaone45_beta_2026-03-19_bf16") if the env
var is not set; keep the str(...) cast and preserve the existing skip_test
behavior that already handles missing paths. Update the assignment where
"_name_or_path" is set in
tests/unittest/_torch/modeling/test_modeling_exaone4_5.py to use this env var
fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/models/core/exaone/README.md`:
- Around line 102-106: Replace the unresolved TODOs in the EXAONE-4.5 README:
set the actual HuggingFace repo name in the git clone command (replace `<TODO:
FILL>` in the git clone URL that follows HF_MODEL_DIR) and update the example's
expected output section (the lines labeled `TODO: FILL` around the expected
output) with the real output produced by the model so the README contains the
correct repository path and sample result.

In `@tensorrt_llm/_torch/models/modeling_exaone4_5.py`:
- Around line 235-237: Replace the runtime type check in load_weights to avoid
using assert: explicitly verify that weight_mapper is an instance of
Exaone4_5HfWeightMapper and if not raise a TypeError with a clear message (e.g.,
indicating expected Exaone4_5HfWeightMapper but got type(weight_mapper)); then
proceed to call weight_mapper.preprocess_weights(weights). Ensure you reference
the load_weights method and the Exaone4_5HfWeightMapper/BaseWeightMapper types
when making the change.
- Around line 175-189: Exaone4_5_VLModel is processing all multimodal_params
including text-only entries; call the parent filter to only keep requests with
actual multimodal data before calling the encoder. Modify the block handling
multimodal_params to first call
self._get_requests_with_mm_data(multimodal_params) (or assign its return to a
local filtered list) and use that filtered list when deciding to call
get_multimodal_embeddings, then pass the filtered mm_embeds into
find_input_mm_embeds; ensure you still raise NotImplementedError for
disaggregated mode in the same place.

In `@tensorrt_llm/_torch/models/modeling_qwen2vl.py`:
- Around line 914-916: The prepare_attn_metadata method in modeling_qwen2vl.py
shadows the incoming batch_size parameter by reassigning it to len(seq_lens);
remove that reassignment so the passed batch_size is used, mirroring the fix
from modeling_qwen3vl.py; then ensure any callers that currently rely on the old
behavior (e.g., the call sites referenced around lines 976-980) are updated to
pass the correct batch_size value (or compute len(seq_lens) before calling) so
prepare_attn_metadata(batch_size: int, seq_lens: List[int], attn_metadata:
AttentionMetadata) uses its batch_size argument as intended.

In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py`:
- Around line 768-771: The prepare_attn_metadata function currently shadows its
batch_size parameter with batch_size = len(seq_lens); remove the parameter from
prepare_attn_metadata's signature and update every call site that passes
batch_size (e.g., where prepare_attn_metadata(...) is invoked) to stop supplying
that argument, or alternatively keep the parameter and delete the reassignment
so the passed batch_size is used; modify the function signature and all
references consistently (function name: prepare_attn_metadata, local variable:
seq_lens) so there is no shadowing or dead parameter.

In `@tensorrt_llm/serve/chat_utils.py`:
- Around line 289-290: The line initializing MultimodalDataTracker is
misformatted causing pre-commit/yapf failures; reformat the statement that
constructs MultimodalDataTracker(type(model_config).model_type,
multimodal_server_config) to satisfy the project's formatter (wrap arguments
across lines or adjust spacing consistent with other calls), then run the
project's pre-commit hooks or `yapf` to enforce line wrapping for this and any
adjacent function calls; ensure you update the call sites referencing
MultimodalDataTracker, model_config, and multimodal_server_config so they
conform to the repository's line-length and formatting rules.
- Around line 305-307: Remove the dead local assignment "model_type =
model_config.model_type" since subsequent calls (e.g.,
MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(type(model_config).model_type))
use type(model_config).model_type instead; delete the unused "model_type"
variable or alternatively replace other uses to reference the local variable
consistently, ensuring references around model_config and
MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format remain correct.

---

Nitpick comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/exaone4_5_weight_mapper.py`:
- Line 19: The method preprocess_weights(self, weights: dict) is missing a
return type annotation; update its signature to include an explicit return type
such as -> Dict[str, Any] (or Mapping[str, Any] if preferred) and ensure you
import the corresponding typing symbols (Dict and Any or Mapping) at the top of
the module so the signature reads e.g. def preprocess_weights(self, weights:
Dict[str, Any]) -> Dict[str, Any]: while keeping the existing behavior in the
preprocess_weights implementation.

In `@tests/unittest/_torch/modeling/test_modeling_exaone4_5.py`:
- Around line 183-186: Replace the hardcoded developer path in the test config
for "_name_or_path" with an environment-configurable value: read
os.environ.get("EXAONE_4_5_MODEL_PATH") and fall back to the existing
os.path.join("/code/yechan-models", "exaone45_beta_2026-03-19_bf16") if the env
var is not set; keep the str(...) cast and preserve the existing skip_test
behavior that already handles missing paths. Update the assignment where
"_name_or_path" is set in
tests/unittest/_torch/modeling/test_modeling_exaone4_5.py to use this env var
fallback.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a4fc1689-50d7-46db-9958-52340c2e4a19

📥 Commits

Reviewing files that changed from the base of the PR and between b4a4ce0 and d1c8f67.

📒 Files selected for processing (18)
  • examples/models/core/exaone/README.md
  • requirements.txt
  • tensorrt_llm/_torch/models/__init__.py
  • tensorrt_llm/_torch/models/checkpoints/hf/exaone4_5_weight_mapper.py
  • tensorrt_llm/_torch/models/modeling_clip.py
  • tensorrt_llm/_torch/models/modeling_exaone4_5.py
  • tensorrt_llm/_torch/models/modeling_exaone_moe.py
  • tensorrt_llm/_torch/models/modeling_llama.py
  • tensorrt_llm/_torch/models/modeling_qwen2vl.py
  • tensorrt_llm/_torch/models/modeling_qwen3vl.py
  • tensorrt_llm/_torch/models/modeling_siglip.py
  • tensorrt_llm/_torch/visual_gen/models/wan/transformer_wan.py
  • tensorrt_llm/models/gpt/convert.py
  • tensorrt_llm/serve/chat_utils.py
  • tensorrt_llm/serve/openai_server.py
  • tensorrt_llm/tools/multimodal_builder.py
  • tests/unittest/_torch/modeling/test_modeling_exaone4_5.py
  • tests/unittest/_torch/modeling/test_modeling_multimodal.py

Comment on lines +102 to +106
```bash
export HF_MODEL_DIR=hf_models/exaone4_5
git clone https://huggingface.co/LGAI-EXAONE/<TODO: FILL> $HF_MODEL_DIR
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unresolved TODO placeholders in documentation.

The EXAONE-4.5 section contains placeholders that need to be filled:

  • Line 104: <TODO: FILL> in the git clone URL
  • Lines 143-145: TODO: FILL for expected output

Please update these with the actual HuggingFace model repository name and expected output before merging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/core/exaone/README.md` around lines 102 - 106, Replace the
unresolved TODOs in the EXAONE-4.5 README: set the actual HuggingFace repo name
in the git clone command (replace `<TODO: FILL>` in the git clone URL that
follows HF_MODEL_DIR) and update the example's expected output section (the
lines labeled `TODO: FILL` around the expected output) with the real output
produced by the model so the README contains the correct repository path and
sample result.

Comment on lines +175 to +189
multimodal_params = kwargs.get("multimodal_params", [])
mm_embeds = []

if len(multimodal_params) > 0:
if not _is_disagg():
mm_embeds = get_multimodal_embeddings(
encoder_forward_fn=self.mm_encoder.forward,
multimodal_params=multimodal_params,
)
else:
raise NotImplementedError(
"Exaone4.5-VL does not support disaggregated inference yet. "
"Unset TLLM_MULTIMODAL_DISAGGREGATED or set it to '0'."
)
mm_embeds = find_input_mm_embeds(mm_embeds, multimodal_params)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if Exaone4_5InputProcessor populates multimodal_params for text-only prompts
rg -n "multimodal_data" tensorrt_llm/_torch/models/modeling_exaone4_5.py -A 3 -B 3

Repository: NVIDIA/TensorRT-LLM

Length of output: 1333


🏁 Script executed:

# First, let's see the full Exaone4_5InputProcessor implementation to understand multimodal_params creation
fd "modeling_exaone4_5.py" | xargs wc -l

Repository: NVIDIA/TensorRT-LLM

Length of output: 193


🏁 Script executed:

# Get the complete InputProcessor class and how it creates multimodal_params
cat -n tensorrt_llm/_torch/models/modeling_exaone4_5.py | sed -n '60,115p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 2460


🏁 Script executed:

# Check how forward method at lines 175-189 uses multimodal_params
cat -n tensorrt_llm/_torch/models/modeling_exaone4_5.py | sed -n '170,195p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 1268


🏁 Script executed:

# Look for the get_multimodal_embeddings function to understand what it does with multimodal_params
rg -n "def get_multimodal_embeddings" tensorrt_llm/_torch/ -A 10

Repository: NVIDIA/TensorRT-LLM

Length of output: 1115


🏁 Script executed:

# Compare with Qwen2VL's _get_requests_with_mm_data filtering mentioned in the review
rg -n "_get_requests_with_mm_data|get_multimodal_embeddings" tensorrt_llm/_torch/models/ -B 2 -A 5

Repository: NVIDIA/TensorRT-LLM

Length of output: 16857


🏁 Script executed:

# Check Exaone4_5ForCausalLM class definition and inheritance
cat -n tensorrt_llm/_torch/models/modeling_exaone4_5.py | sed -n '130,170p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 1907


🏁 Script executed:

# Check if Exaone4_5ForCausalLM or its parent has _get_requests_with_mm_data
rg -n "_get_requests_with_mm_data" tensorrt_llm/_torch/models/modeling_exaone4_5.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 45


🏁 Script executed:

# Check Qwen2VLModelBase to see if it has _get_requests_with_mm_data
rg -n "class Qwen2VLModelBase|def _get_requests_with_mm_data" tensorrt_llm/_torch/models/modeling_qwen2vl.py -A 1

Repository: NVIDIA/TensorRT-LLM

Length of output: 221


🏁 Script executed:

# Check the full class declaration for Exaone4_5ForCausalLM
rg -n "class Exaone4_5ForCausalLM" tensorrt_llm/_torch/models/modeling_exaone4_5.py -A 5

Repository: NVIDIA/TensorRT-LLM

Length of output: 45


🏁 Script executed:

# Get the full _get_requests_with_mm_data method from Qwen2VLModelBase
cat -n tensorrt_llm/_torch/models/modeling_qwen2vl.py | sed -n '1163,1180p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 1140


🏁 Script executed:

# Search for the Exaone4_5ForCausalLM class definition in the file
rg -n "class Exaone4_5" tensorrt_llm/_torch/models/modeling_exaone4_5.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 334


🏁 Script executed:

# Get full context around line 120 to see the class definition
cat -n tensorrt_llm/_torch/models/modeling_exaone4_5.py | sed -n '115,135p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 922


🏁 Script executed:

# Verify that the forward method at lines 166-189 is in Exaone4_5_VLModel
cat -n tensorrt_llm/_torch/models/modeling_exaone4_5.py | sed -n '124,200p'

Repository: NVIDIA/TensorRT-LLM

Length of output: 3413


Add filtering of multimodal_params to match Qwen2VLModelBase behavior.

Exaone4_5_VLModel inherits from Qwen2VLModelBase but doesn't use the parent's _get_requests_with_mm_data() method to filter multimodal_params. The parent implementation and Qwen3VLModel both filter to only process entries with actual multimodal data (image/video), because Qwen-VL models include mrope_config entries even for text-only prompts. Without filtering, empty entries are unnecessarily passed to the encoder. Consider calling self._get_requests_with_mm_data(multimodal_params) before processing, similar to the parent class.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_exaone4_5.py` around lines 175 - 189,
Exaone4_5_VLModel is processing all multimodal_params including text-only
entries; call the parent filter to only keep requests with actual multimodal
data before calling the encoder. Modify the block handling multimodal_params to
first call self._get_requests_with_mm_data(multimodal_params) (or assign its
return to a local filtered list) and use that filtered list when deciding to
call get_multimodal_embeddings, then pass the filtered mm_embeds into
find_input_mm_embeds; ensure you still raise NotImplementedError for
disaggregated mode in the same place.

Comment on lines +235 to +237
def load_weights(self, weights, weight_mapper: BaseWeightMapper):
assert isinstance(weight_mapper, Exaone4_5HfWeightMapper)
weights = weight_mapper.preprocess_weights(weights)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid assert for runtime type validation in production code.

Using assert for type checking can be bypassed when Python is run with -O (optimizations). Use an explicit check with raise TypeError instead.

🛡️ Proposed fix
     def load_weights(self, weights, weight_mapper: BaseWeightMapper):
-        assert isinstance(weight_mapper, Exaone4_5HfWeightMapper)
+        if not isinstance(weight_mapper, Exaone4_5HfWeightMapper):
+            raise TypeError(
+                f"Expected Exaone4_5HfWeightMapper, got {type(weight_mapper).__name__}"
+            )
         weights = weight_mapper.preprocess_weights(weights)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_exaone4_5.py` around lines 235 - 237,
Replace the runtime type check in load_weights to avoid using assert: explicitly
verify that weight_mapper is an instance of Exaone4_5HfWeightMapper and if not
raise a TypeError with a clear message (e.g., indicating expected
Exaone4_5HfWeightMapper but got type(weight_mapper)); then proceed to call
weight_mapper.preprocess_weights(weights). Ensure you reference the load_weights
method and the Exaone4_5HfWeightMapper/BaseWeightMapper types when making the
change.

Comment on lines +914 to +916
def prepare_attn_metadata(self, batch_size: int, seq_lens: List[int],
attn_metadata: AttentionMetadata):
batch_size = len(seq_lens)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused batch_size parameter is immediately shadowed.

Same issue as in modeling_qwen3vl.py: the batch_size parameter is passed but immediately overwritten. This should be consistent with the fix applied to qwen3vl.

🧹 Proposed fix
-    def prepare_attn_metadata(self, batch_size: int, seq_lens: List[int],
+    def prepare_attn_metadata(self, seq_lens: List[int],
                               attn_metadata: AttentionMetadata):
         batch_size = len(seq_lens)

Update call sites at lines 976-980:

-        full_attn_metadata = self.prepare_attn_metadata(len(grid_rows),
-                                                        seq_lens,
+        full_attn_metadata = self.prepare_attn_metadata(seq_lens,
                                                         self.full_attn_metadata)
-        window_attn_metadata = self.prepare_attn_metadata(
-            len(grid_rows), window_seq_lens, self.window_attn_metadata)
+        window_attn_metadata = self.prepare_attn_metadata(window_seq_lens,
+                                                          self.window_attn_metadata)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_qwen2vl.py` around lines 914 - 916, The
prepare_attn_metadata method in modeling_qwen2vl.py shadows the incoming
batch_size parameter by reassigning it to len(seq_lens); remove that
reassignment so the passed batch_size is used, mirroring the fix from
modeling_qwen3vl.py; then ensure any callers that currently rely on the old
behavior (e.g., the call sites referenced around lines 976-980) are updated to
pass the correct batch_size value (or compute len(seq_lens) before calling) so
prepare_attn_metadata(batch_size: int, seq_lens: List[int], attn_metadata:
AttentionMetadata) uses its batch_size argument as intended.

Comment on lines +768 to 771
def prepare_attn_metadata(
self, batch_size: int, seq_lens: List[int], attn_metadata: AttentionMetadata
):
batch_size = len(seq_lens)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused batch_size parameter is immediately shadowed.

The batch_size parameter is passed to prepare_attn_metadata but immediately overwritten by batch_size = len(seq_lens) on line 771. Either remove the parameter or remove the reassignment.

🧹 Proposed fix (remove parameter)
     def prepare_attn_metadata(
-        self, batch_size: int, seq_lens: List[int], attn_metadata: AttentionMetadata
+        self, seq_lens: List[int], attn_metadata: AttentionMetadata
     ):
         batch_size = len(seq_lens)

Then update the call site at line 789:

-        attn_metadata = self.prepare_attn_metadata(len(grid_thw), seq_lens, self.attn_metadata)
+        attn_metadata = self.prepare_attn_metadata(seq_lens, self.attn_metadata)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def prepare_attn_metadata(
self, batch_size: int, seq_lens: List[int], attn_metadata: AttentionMetadata
):
batch_size = len(seq_lens)
def prepare_attn_metadata(
self, seq_lens: List[int], attn_metadata: AttentionMetadata
):
batch_size = len(seq_lens)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py` around lines 768 - 771, The
prepare_attn_metadata function currently shadows its batch_size parameter with
batch_size = len(seq_lens); remove the parameter from prepare_attn_metadata's
signature and update every call site that passes batch_size (e.g., where
prepare_attn_metadata(...) is invoked) to stop supplying that argument, or
alternatively keep the parameter and delete the reassignment so the passed
batch_size is used; modify the function signature and all references
consistently (function name: prepare_attn_metadata, local variable: seq_lens) so
there is no shadowing or dead parameter.

Comment on lines +289 to +290
mm_data_tracker = MultimodalDataTracker(
type(model_config).model_type, multimodal_server_config)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pipeline failure: formatting enforcement.

The CI indicates a pre-commit formatting failure on this file. Please run yapf or the pre-commit hooks to fix line wrapping around the MultimodalDataTracker initialization and function calls.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/chat_utils.py` around lines 289 - 290, The line
initializing MultimodalDataTracker is misformatted causing pre-commit/yapf
failures; reformat the statement that constructs
MultimodalDataTracker(type(model_config).model_type, multimodal_server_config)
to satisfy the project's formatter (wrap arguments across lines or adjust
spacing consistent with other calls), then run the project's pre-commit hooks or
`yapf` to enforce line wrapping for this and any adjacent function calls; ensure
you update the call sites referencing MultimodalDataTracker, model_config, and
multimodal_server_config so they conform to the repository's line-length and
formatting rules.

Comment on lines 305 to +307
model_type = model_config.model_type
registry_format = MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(
model_type)
type(model_config).model_type)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused variable model_type after refactor.

Line 305 assigns model_type = model_config.model_type, but subsequent code (lines 307, 336, 339, 343) now uses type(model_config).model_type instead. This leaves model_type as dead code.

🧹 Proposed fix to remove unused variable
-    model_type = model_config.model_type
     registry_format = MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(
         type(model_config).model_type)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model_type = model_config.model_type
registry_format = MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(
model_type)
type(model_config).model_type)
registry_format = MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(
type(model_config).model_type)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/chat_utils.py` around lines 305 - 307, Remove the dead
local assignment "model_type = model_config.model_type" since subsequent calls
(e.g.,
MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format(type(model_config).model_type))
use type(model_config).model_type instead; delete the unused "model_type"
variable or alternatively replace other uses to reference the local variable
consistently, ensuring references around model_config and
MULTIMODAL_PLACEHOLDER_REGISTRY.get_content_format remain correct.

@yechank-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42598 [ run ] triggered by Bot. Commit: d1c8f67 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42598 [ run ] completed with state FAILURE. Commit: d1c8f67
/LLM/main/L0_MergeRequest_PR pipeline #33324 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants