From 91c4d4b4ddcd48f4af263731bbfa951c7f0aaba8 Mon Sep 17 00:00:00 2001
From: Alex Bozarth <ajbozart@us.ibm.com>
Date: Wed, 10 Jun 2026 16:54:44 -0500
Subject: [PATCH] fix(hf): pass return_dict=True to apply_chat_template

apply_chat_template(return_tensors="pt") returns a 2-D torch.Tensor,
not a dict. The standard-generation call site at huggingface.py:1119-
1120 then indexes it with string keys ("input_ids", "attention_mask"),
which on torch >= 2.9 is interpreted as fancy indexing with the
codepoints of the string and raises IndexError on macOS (CPU and MPS).

Adding return_dict=True makes apply_chat_template return a
BatchEncoding, so dict access works as the surrounding code already
expects. The downstream isinstance(input_ids, torch.Tensor) ternaries
in processing()/post_processing() were already coded to handle both
the dict producer and the bare-Tensor producer from the merged-cache
path, so they continue to work for both shapes.

Latent since #418 (transformers 5 bump). Not caught by PR CI because
the integration test is gated by @require_gpu, and not caught by the
GPU nightly because the Linux CUDA wheel of torch 2.11.0 doesn't
trigger the deprecated indexing path that fails locally on macOS.

Assisted-by: Claude Code
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
---
 mellea/backends/huggingface.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mellea/backends/huggingface.py b/mellea/backends/huggingface.py
index a4899b3a9..1b4e8f5c2 100644
--- a/mellea/backends/huggingface.py
+++ b/mellea/backends/huggingface.py
@@ -1056,6 +1056,7 @@ async def _generate_from_context_standard(
                 tools=convert_tools_to_json(tools),  # type: ignore
                 add_generation_prompt=True,  # If we change this, must modify huggingface granite guardian.
                 return_tensors="pt",
+                return_dict=True,
                 **self._filter_for_chat_template(model_options),
             ).to(self._device)  # type: ignore