Skip to content

Conversation

@xingliu14
Copy link
Collaborator

@xingliu14 xingliu14 commented Dec 5, 2025

Description

  • Add auto as default value for MODEL_IMPL_TYPE env var
  • For GptOssForCausalLM, auto resolves to vllm for better performance
  • For all other architectures, 'auto' resolves to flax_nnx for better performance
  • Add tests for 'auto' resolution behavior

Tests

pytest

tests/test_envs.py
tests/models/common/test_model_loader.py

Checklist

Before submitting this PR, please make sure:

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

@xingliu14
Copy link
Collaborator Author

@kyuyeunk Please review.

Copy link
Collaborator

@kyuyeunk kyuyeunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be possible move 'auto' to 'match/case' as well?

@xingliu14
Copy link
Collaborator Author

It is possible to move it in to match-case, but in that case it will have duplicated codes, including: get_vllm_model, get_flax_model and the fall back check. I think resolve first then use the same code will be more clean.

@kyuyeunk kyuyeunk added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 9, 2025
@kyuyeunk
Copy link
Collaborator

I don't see any update? Can you verify if the changes were pushed?

Copy link
Collaborator

@kyuyeunk kyuyeunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for working on it!

@kyuyeunk
Copy link
Collaborator

seems like CI is failing. Have you rebase the branch to HEAD?

@xingliu14
Copy link
Collaborator Author

I believe so, let me look into it.

@xingliu14
Copy link
Collaborator Author

Looks like tests/layers/vllm/test_awq.py is failing from the main branch as well.

@kyuyeunk
Copy link
Collaborator

seems like it's due to an upstream change. let me create a quick fix for this.

@kyuyeunk
Copy link
Collaborator

Please wait until this is merged: #1284

@kyuyeunk
Copy link
Collaborator

the PR has been merged. please update the branch and try again.

- Add 'auto' as default value for MODEL_IMPL_TYPE env var
- For GptOssForCausalLM, 'auto' resolves to 'vllm' for better performance
- For all other architectures, 'auto' resolves to 'flax_nnx'
- Add _VLLM_REQUIRED_ARCHITECTURES frozenset in model_loader.py
- Use match/case pattern in get_model() for implementation selection
- Add tests for 'auto' resolution behavior

Signed-off-by: Xing Liu <xingliu14@gmail.com>
Signed-off-by: Xing Liu <xingliu14@gmail.com>
@kyuyeunk
Copy link
Collaborator

https://github.com/vllm-project/tpu-inference/actions/runs/20125295806/job/57753569146?pr=1255

pre-commit hook(s) made changes.
If you are seeing this message in CI, reproduce locally with: `pre-commit run --all-files`.
To run `pre-commit` as part of git workflow, use `pre-commit install`.
All changes made by hooks:
diff --git a/tpu_inference/models/common/model_loader.py b/tpu_inference/models/common/model_loader.py
index f260e5f..fb035bd 100644
--- a/tpu_inference/models/common/model_loader.py
+++ b/tpu_inference/models/common/model_loader.py
@@ -27,7 +27,8 @@ _MODEL_REGISTRY = {}
 # Architectures that prefer "vllm" implementation type when MODEL_IMPL_TYPE is "auto".
 # These architectures are listed here because they have better performance with the
 # vLLM PyTorch backend compared to the flax_nnx JAX backend for now.
-_VLLM_PREFERRED_ARCHITECTURES: frozenset[str] = frozenset({"GptOssForCausalLM"})
+_VLLM_PREFERRED_ARCHITECTURES: frozenset[str] = frozenset(
+    {"GptOssForCausalLM"})
 
 
 class UnsupportedArchitectureError(ValueError):
Error: Process completed with exit code 1.

Please fix pre-commit failure.

Signed-off-by: Xing Liu <xingliu14@gmail.com>
@kyuyeunk kyuyeunk merged commit 9919cfb into vllm-project:main Dec 11, 2025
40 checks passed
@kyuyeunk
Copy link
Collaborator

Thank you so much for making this feature!

@xingliu14 xingliu14 deleted the env_var branch December 11, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants