Fix mRoPE position ID crash on Qwen2-VL prompt truncation by Mr-Neutr0n · Pull Request #482 · microsoft/agent-lightning

Mr-Neutr0n · 2026-02-09T22:00:18Z

Summary

Fixes #441

When training Qwen2.5-VL with agent-lightning + verl, the model crashes in get_rope_index with a shape mismatch:

position_ids[..., attention_mask == 1] = llm_positions

fails because llm_positions length differs from the attention mask true-count.

Root cause: In get_train_data_batch, prompt truncation (prompt_ids[:max_prompt_length]) changes the token count, potentially removing image placeholder tokens. However, image_grid_thw is computed from the original (untruncated) image_urls list. When get_rope_index processes the truncated sequence, it finds fewer <|vision_start|><|image_pad|> regions than image_grid_thw entries, causing the position ID length to diverge from the attention mask count.

Fix: After prompt truncation, count the remaining image regions in the truncated token sequence using the same vision_start_token_id + image_token_id pattern that get_rope_index uses, and slice image_urls to match before computing image_grid_thw.

Added _count_images_in_tokens() helper method to detect image regions in token sequences
Modified the transition-level mRoPE code path to reconcile image_urls with truncated prompts

Test plan

Verify Qwen2.5-VL training with prompts that exceed max_prompt_length and contain images no longer crashes in get_rope_index
Verify Qwen2.5-VL training with prompts shorter than max_prompt_length is unaffected (no truncation, all images retained)
Verify non-VL model training paths are unaffected (_use_mrope is False)

When training Qwen2.5-VL with agent-lightning + verl, prompt truncation changes the token count but image_grid_thw is computed from the original (untruncated) image_urls. This causes get_rope_index to fail with a shape mismatch because it finds fewer image tokens in the truncated input_ids than entries in image_grid_thw. After prompt truncation, count remaining image regions in the truncated token sequence and slice image_urls to match before computing image_grid_thw, ensuring consistency between the token content and the mRoPE spatial metadata. Fixes microsoft#441

Mr-Neutr0n · 2026-02-12T18:11:42Z

Friendly bump! Let me know if there's anything I should update or improve to help move this forward.

ultmaster · 2026-02-28T02:53:47Z

agentlightning/verl/daemon.py

            raise ValueError(f"Relative path '{path}' requires 'image_base_dir' to be set.")
        return os.path.join(self.image_base_dir, path)

+    def _count_images_in_tokens(self, token_ids: List[int]) -> int:


I can't review this PR. @totoluo

I see the PR tried to match the how image is counted for image_grid_thw w.r.t get_rope_index. However, the current mechanism will still fail in a corner case of an image is truncated in the middle. Count index will increment by 1 and get_rope_index fail at the same place.
IMO, we should simply leverage the existing is_dropped_list and put dummy pos_ids and skip _compute_mrope_position_ids for those samples. They should get treated the same way as those exceeded length text prompts. So what we should do is:

if self._use_mrope and is_drop_list[i]: # Don't call get_rope_index — it would crash on truncated images. # is_drop_mask will remove this sample in the trainer. position_ids_list.append(torch.zeros(4, seq_len, dtype=torch.long, device=device)) else: pos_ids = self._compute_mrope_position_ids(...) position_ids_list.append(pos_ids)

There is no harm putting the current code in place, but it's not a fix for all. Thoughts @Mr-Neutr0n?

Mr-Neutr0n force-pushed the fix/qwen-vl-mrope-truncation branch from bdd1c8d to ca0be5a Compare February 9, 2026 22:01

ultmaster reviewed Feb 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mRoPE position ID crash on Qwen2-VL prompt truncation#482

Fix mRoPE position ID crash on Qwen2-VL prompt truncation#482
Mr-Neutr0n wants to merge 1 commit intomicrosoft:mainfrom
Mr-Neutr0n:fix/qwen-vl-mrope-truncation

Mr-Neutr0n commented Feb 9, 2026

Uh oh!

Mr-Neutr0n commented Feb 12, 2026

Uh oh!

ultmaster Feb 28, 2026

Uh oh!

totoluo Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Mr-Neutr0n commented Feb 9, 2026

Summary

Test plan

Uh oh!

Mr-Neutr0n commented Feb 12, 2026

Uh oh!

ultmaster Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

totoluo Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

totoluo Mar 3, 2026 •

edited

Loading