-
Notifications
You must be signed in to change notification settings - Fork 833
[feature] sglang support #955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
wheresmyhair
commented
Nov 23, 2025
- Supports sglang inference backend
- bump lmflow version to 1.1.0
research4pan
requested changes
Nov 24, 2025
Contributor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unittest is needed.
Main feature
- Update
datasetsto 3.6.0 setup.py: addsglangas an optional dependency--inference_engineis for choosing vllm
Details
-
src/lmflow/args.pyinference_tensor_parallel_sizeandinferece_gpu_memory_utilizationis the latest argument for bothvllmandsglangvllm_tensor_parallel_sizeandvllm_gpu_memory_utilizationwill soon be deprecated (but still usable in this version with higher priority)
-
src/lmflow/models/hf_decoder_model.py- Add
@deprecated_argsto allow automatic deprecated arg translation to latest args (improve backward compatibility, also insrc/lmflow/utils/deprecated.py) - Add corresponding logics for sglang
- line 539: bos_token -> "" for
sglangfollowingsglang's logics
- Add
Question
src/lmflow/models/hf_decoder_model.py: line 539: bos_token -> "". This may lead to prompt distribution. Confirm its correctnesssrc/lmflow/models/hf_model_mixin.py: line 555-559: Can be removed if not necessary
Suggestions
- Add
README.mdfor how to usesglanginferencer - Add unittest for vllm, SGLang inference
- compare with the original version of vllm and SGLang with a small example
- a small example, compare with huggingface generation, check inference probability
Collaborator
Author
Collaborator
Author
|
research4pan
approved these changes
Nov 25, 2025
Contributor
research4pan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unittests. LGTM
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
