Add chat-template hooks to LMEvalORTGenAIEvaluator#2462
Open
ykhrustalev wants to merge 4 commits into
Open
Conversation
lm-eval's `simple_evaluate(..., apply_chat_template=True)` requires the
underlying LM class to implement `tokenizer_name` and `apply_chat_template`.
The HFLM backend has both; the ORT GenAI backend does not, so any attempt
to evaluate a chat-tuned ONNX model with chat-formatted prompts raises
`NotImplementedError: To use this model with chat templates, please
implement the 'tokenizer_name' property.`
This adds the two members with the minimum surface area:
- `tokenizer_name` returns the model path (for lm-eval's chat-aware
result caching), matching the HFLM convention of slash-replacement.
- `apply_chat_template` defers to the model's HF tokenizer via
`AutoTokenizer.apply_chat_template`, mirroring HFLM's
implementation.
The HF tokenizer is loaded once at `__init__` purely for chat-template
rendering; token-level encode/decode still goes through `og.Tokenizer`
and the runtime, so there is no change to generation behavior or any
existing code path.
Verified end-to-end on LFM2.5-350M (int4, k_quant_mixed) MBPP:
without chat-template hooks the eval raised at task start; with them
plus `num_fewshot=0` and a chat-friendly stop list, pass@1 went from
0.0/500 to 67/500 (13.4%) -- the original 0.0 was a prompt-format
artifact (instruct model + completion-style few-shot), not a
conversion regression.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds lm-eval “chat template” integration hooks to LMEvalORTGenAIEvaluator so ORT GenAI–backed models can be evaluated via lm_eval.simple_evaluate(..., apply_chat_template=True) (matching the capability available in the HuggingFace backend).
Changes:
- Add an HF tokenizer instance to
LMEvalORTGenAIEvaluatorfor rendering chat templates. - Implement
tokenizer_namefor lm-eval chat-template-aware caching. - Implement
apply_chat_template(...)by delegating to the HF tokenizer.
Comment on lines
+501
to
+504
| # HF tokenizer kept solely to render `apply_chat_template`; generation | ||
| # still uses og.Tokenizer above. | ||
| self._pretrained = str(pretrained) | ||
| self._hf_tokenizer = AutoTokenizer.from_pretrained(self._pretrained) |
| @property | ||
| def tokenizer_name(self) -> str: | ||
| """Identifier used by lm-eval for chat-template-aware caching.""" | ||
| return self._pretrained.replace("/", "__") |
… key, tests
- Lazy-load the HF tokenizer on the first ``apply_chat_template`` call rather
than at ``__init__``. Callers that never enable chat templating no longer
need HF tokenizer files (``tokenizer_config.json`` etc.) in the model
directory; eager loading would have regressed those workflows.
- ``tokenizer_name`` now replaces both POSIX and Windows path separators with
``__`` so the lm-eval cache identifier is stable across platforms. The
previous implementation only handled forward slashes, leaving backslashes
in the key on Windows because ``str(Path(...))`` preserves the native
separator.
- Add unit tests for both behaviours:
- ``tokenizer_name`` parametrised over POSIX, relative, and Windows-style
paths to lock in the normalisation contract.
- ``apply_chat_template`` verified to (a) not load the HF tokenizer at
construction, (b) load once on first call, and (c) reuse the cached
tokenizer on subsequent calls. ``AutoTokenizer`` is patched so the
tests run without any HF tokenizer files on disk.
All four new tests pass; ``test_olive_evaluator.py`` as a whole stays green
(85 passed). ``lintrunner`` reports no new warnings on the changed files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes
Implement
tokenizer_nameandapply_chat_templateonLMEvalORTGenAIEvaluatorso the backend supportslm_eval.simple_evaluate(apply_chat_template=True). Without these, lm-eval raisesNotImplementedErrorat task setup for any chat-formatted task.Parity with the HuggingFace backend in
lm_eval/models/huggingface.py. The HF tokenizer is loaded lazily on the firstapply_chat_templatecall, so model directories without HF tokenizer files still work for non-chat evaluation. Generation continues to go throughog.Tokenizer.Checklist before requesting a review
lintrunner -aapply_chat_template=Truein lm-eval for ortgenai-backed evaluators.(Optional) Issue link
N/A