Skip to content

fix: vision image token insertion#671

Open
YashasviChaurasia wants to merge 2 commits intofoundation-model-stack:mainfrom
YashasviChaurasia:fix/vision-image-token-insertion
Open

fix: vision image token insertion#671
YashasviChaurasia wants to merge 2 commits intofoundation-model-stack:mainfrom
YashasviChaurasia:fix/vision-image-token-insertion

Conversation

@YashasviChaurasia
Copy link
Contributor

Description of the change

This PR includes two critical fixes:

  1. Vision model image token bug - Fixes "Image features and image tokens do not match" error
  2. Transformers v5 API compatibility - Restores compatibility with transformers v4.55+
  3. Test suite fixes - Resolves test failures in CI/CD

Problem 1: Vision Model Training Failure

Error:

ValueError: Image features and image tokens do not match: tokens: 0, features 18432

Root Cause:
The apply_tokenizer_chat_template handler wasn't correctly extracting conversation messages from OpenAI format datasets when conversation_column_name was not explicitly set. This resulted in formatted text without <image> tokens, causing vision model training to fail.

Fix

  • Adds auto-detection for common conversation column names ('messages', 'conversation', 'chat', 'turns')
  • Adds validation to ensure image tokens are present when images exist in the dataset
  • Enhances error messages with actionable guidance

Problem 2: Transformers v5 API Breaking Change

Error:

  AttributeError at line 610: labels = input_ids.clone()
  KeyError: 'clone'

Root Cause:
In transformers v4.55+, apply_chat_template() with return_tensors='pt' changed behavior:

  • Old API (v4.x): Returns {"input_ids": tensor} (dict)
  • New API (v4.55+): Returns tensor directly OR BatchEncoding object (in tox environment)

The code was doing result["input_ids"] which fails when:

  1. result is a tensor (causes IndexError)
  2. result is a BatchEncoding without .clone() method (causes AttributeError)

Solution:
Added robust handling for all three return types in tokenize_and_apply_chat_template_with_masking:

# Handle both old API (dict/BatchEncoding) and new API (tensor)
if hasattr(result, "input_ids"):
    input_ids = result.input_ids  # BatchEncoding or dict-like
elif isinstance(result, dict):
    input_ids = result["input_ids"]  # Plain dict
else:
    input_ids = result  # Direct tensor

Problem 3: Test Suite Failures

3a. test_empty_data

Error:

StopIteration (expected DatasetGenerationError or ValueError)

Root Cause:
Datasets library in transformers v5 raises StopIteration when processing empty JSON files.

Solution:
Added StopIteration to expected exceptions in the test.

3b. test_run_chat_style_ft_using_custom_split_name

Error:

 NotImplementedError: "histogram_mps" not implemented for 'Int'

Root Cause:
MoE models use histogram operations that are incompatible with Apple Silicon MPS backend.

Solution:
Skip test on MPS-only systems using @pytest.mark.skipif.

Related issue number

How to verify the PR

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

…lity

Signed-off-by: yashasvi <yashasvi@ibm.com>
@github-actions
Copy link

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions bot added the fix label Mar 18, 2026
Signed-off-by: yashasvi <yashasvi@ibm.com>
@YashasviChaurasia YashasviChaurasia force-pushed the fix/vision-image-token-insertion branch from 1c7bd0d to b2344fd Compare March 18, 2026 07:04
@dushyantbehl
Copy link
Collaborator

/build


@pytest.mark.skipif(
torch.backends.mps.is_available() and not torch.cuda.is_available(),
reason="MoE models have histogram incompatibility with MPS backend",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we adding this here? this test was running fine without anything right? is it not running on mac now?
if so what model are we using which is MoE? can we choose another?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I did miss this, this was for my local testing tho.. the test was failing on my mac locally..
Shouldn't be a problem for github actions test coverage btw

@github-actions
Copy link

Build succeeded for b2344fd (NVCR image)

View run

@dushyantbehl
Copy link
Collaborator

/build

@github-actions
Copy link

Build failed for b2344fd (NVCR image)

View run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants