-
Notifications
You must be signed in to change notification settings - Fork 232
adjust building RH image #3883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adjust building RH image #3883
Conversation
Fix convert_tokenizer configuration use only non-restricted test models fixed version of optimum-intel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the build and test configuration for RedHat images to fix tokenizer handling and switch to non-restricted test models. The changes address issue #3844 by fixing the convert_tokenizer configuration, updating to a specific version of optimum-intel, and replacing restricted HuggingFace models with publicly accessible alternatives.
Key changes:
- Switched from
facebook/opt-125mtoHuggingFaceTB/SmolLM2-360M-Instructas the primary LLM test model - Changed
meta-llama/Llama-3.1-8B-Instructtounsloth/Llama-3.1-8B-Instructfor tokenizer testing - Fixed openvino_tokenizers installation process in RedHat Dockerfile to build from source and install Python bindings correctly
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| windows_prepare_llm_models.bat | Added new LLM_MODEL variable and updated LLAMA3_MODEL to use non-restricted model sources |
| src/test/pull_gguf_hf_model_test.cpp | Increased timeout and removed modelscope.cn test case |
| src/test/llm/output_parsers/llama3_output_parser_test.cpp | Updated tokenizer paths to use unsloth Llama model instead of meta-llama |
| src/test/llm/lm_legacy_regular.pbtxt | Changed models_path to use new SmolLM2 model |
| src/test/llm/lm_cb_regular.pbtxt | Changed models_path to use new SmolLM2 model |
| src/test/llm/llmnode_test.cpp | Updated test model paths and adjusted token limits for new model's max length (8192) |
| run_unit_tests.sh | Added proxy configuration support for bazel tests |
| prepare_llm_models.sh | Updated model variables, fixed optimum-intel version, and added facebook/opt-125m as separate download |
| docs/deploying_server_baremetal.md | Updated release version references from v2025.4 to v2025.4.1 |
| demos/python_demos/requirements.txt | Pinned optimum-intel to specific commit and updated transformers version constraint |
| demos/common/export_models/requirements.txt | Updated optimum-intel commit, openvino versions to 2025.4.1 |
| create_package.sh | Fixed convert_tokenizer script generation and updated version metadata |
| ci/build_test_OnCommit.groovy | Enabled tests for RedHat release image build (RUN_TESTS=1) |
| Dockerfile.redhat | Refactored openvino_tokenizers installation to always build from source with proper Python bindings |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // creating prompt that will be tokenized to 8194 tokens when model max length is 8192. | ||
| for (int i = 0; i < 8192 - 29; i++) { |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states '8194 tokens' but the loop creates 8163 tokens (8192 - 29). Additionally, the comment indicates this should exceed the max length of 8192, which would require 8163 + chat template tokens. Clarify whether the total including chat template tokens equals 8194.
|
This PR will likely need the same adjustments that were done for: #3890 |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
🛠 Summary
#3844
Fix convert_tokenizer configuration
use only non-restricted test models
fixed version of optimum-intel
🧪 Checklist
``