-
Notifications
You must be signed in to change notification settings - Fork 62
Add default 'auto' MODEL_IMPL_TYPE that resolves based on architecture #1255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@kyuyeunk Please review. |
kyuyeunk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be possible move 'auto' to 'match/case' as well?
|
It is possible to move it in to match-case, but in that case it will have duplicated codes, including: get_vllm_model, get_flax_model and the fall back check. I think resolve first then use the same code will be more clean. |
|
I don't see any update? Can you verify if the changes were pushed? |
kyuyeunk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for working on it!
|
seems like CI is failing. Have you rebase the branch to HEAD? |
|
I believe so, let me look into it. |
|
Looks like tests/layers/vllm/test_awq.py is failing from the main branch as well. |
|
seems like it's due to an upstream change. let me create a quick fix for this. |
|
Please wait until this is merged: #1284 |
|
the PR has been merged. please update the branch and try again. |
- Add 'auto' as default value for MODEL_IMPL_TYPE env var - For GptOssForCausalLM, 'auto' resolves to 'vllm' for better performance - For all other architectures, 'auto' resolves to 'flax_nnx' - Add _VLLM_REQUIRED_ARCHITECTURES frozenset in model_loader.py - Use match/case pattern in get_model() for implementation selection - Add tests for 'auto' resolution behavior Signed-off-by: Xing Liu <xingliu14@gmail.com>
|
https://github.com/vllm-project/tpu-inference/actions/runs/20125295806/job/57753569146?pr=1255 Please fix pre-commit failure. |
|
Thank you so much for making this feature! |
Description
autoas default value for MODEL_IMPL_TYPE env varautoresolves tovllmfor better performanceflax_nnxfor better performanceTests
pytest
Checklist
Before submitting this PR, please make sure: