From 91c4d4b4ddcd48f4af263731bbfa951c7f0aaba8 Mon Sep 17 00:00:00 2001 From: Alex Bozarth Date: Wed, 10 Jun 2026 16:54:44 -0500 Subject: [PATCH] fix(hf): pass return_dict=True to apply_chat_template apply_chat_template(return_tensors="pt") returns a 2-D torch.Tensor, not a dict. The standard-generation call site at huggingface.py:1119- 1120 then indexes it with string keys ("input_ids", "attention_mask"), which on torch >= 2.9 is interpreted as fancy indexing with the codepoints of the string and raises IndexError on macOS (CPU and MPS). Adding return_dict=True makes apply_chat_template return a BatchEncoding, so dict access works as the surrounding code already expects. The downstream isinstance(input_ids, torch.Tensor) ternaries in processing()/post_processing() were already coded to handle both the dict producer and the bare-Tensor producer from the merged-cache path, so they continue to work for both shapes. Latent since #418 (transformers 5 bump). Not caught by PR CI because the integration test is gated by @require_gpu, and not caught by the GPU nightly because the Linux CUDA wheel of torch 2.11.0 doesn't trigger the deprecated indexing path that fails locally on macOS. Assisted-by: Claude Code Signed-off-by: Alex Bozarth --- mellea/backends/huggingface.py | 1 + 1 file changed, 1 insertion(+) diff --git a/mellea/backends/huggingface.py b/mellea/backends/huggingface.py index a4899b3a9..1b4e8f5c2 100644 --- a/mellea/backends/huggingface.py +++ b/mellea/backends/huggingface.py @@ -1056,6 +1056,7 @@ async def _generate_from_context_standard( tools=convert_tools_to_json(tools), # type: ignore add_generation_prompt=True, # If we change this, must modify huggingface granite guardian. return_tensors="pt", + return_dict=True, **self._filter_for_chat_template(model_options), ).to(self._device) # type: ignore