Align chat completions endpoint with vLLM#4063
Conversation
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
.../core/inference/text_generation_server/dynamic_text_gen_server/endpoints/chat_completions.py
Show resolved
Hide resolved
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
| return self._tokenizer.text_to_ids(text) | ||
|
|
||
| def detokenize(self, ids: List[int]) -> str: | ||
| def detokenize(self, ids: List[int], skip_special_tokens: bool = True) -> str: |
There was a problem hiding this comment.
TikToken tokenizer is defaulting to False, HuggingFace tokenizer to None & this to True. Let's default all 3 values to False and set it to True in your required parts of the code, WDYT?
There was a problem hiding this comment.
Thanks for the suggestion! I looked into this but I think changing the default values isn't safe without also updating all the backend ids_to_text implementations to accept skip_special_tokens, which is perhaps too invasive for this PR. I can add a TODO to revisit this in a follow-up PR if that works?
There was a problem hiding this comment.
Actually I ended up making the default None here as well, so at least that aligns with the Hugging Face tokenizer now.
| return self._tokenizer.text_to_ids(text) | ||
|
|
||
| def detokenize(self, ids: List[int]) -> str: | ||
| def detokenize(self, ids: List[int], skip_special_tokens: bool = True) -> str: |
There was a problem hiding this comment.
Can we also rename this argument to remove_special_tokens?
There was a problem hiding this comment.
So I think Hugging Face actually uses skip_special_tokens in their API: https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PythonBackend.decode.skip_special_tokens
But we're using remove_special_tokens just in the Megatron tokenizer library. Perhaps the better change is to update remove_special_tokens to skip_special_tokens?
|
/claude review |
|
Do we need to add any tests for this PR? |
| """ | ||
|
|
||
| return self._tokenizer.ids_to_text(ids) | ||
| return self._tokenizer.ids_to_text(ids, remove_special_tokens=skip_special_tokens) |
There was a problem hiding this comment.
Will this break with some tokenizers like NullTokenizer?
There was a problem hiding this comment.
Yes thanks for catching this - there was code in other places to gate this argument on whether the function actually has the parameter which I've now refactored into the accepts_parameter function. So now the code should be safe for all tokenizers (the remove_special_tokens argument will just be ignored if the parameter does not exist); I added a unit test to verify this also.
There was a problem hiding this comment.
I also changed the default value for the skip_special_tokens argument to None so that we only pass it if it's explicitly set.
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Added a unit test |
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
|
/ok to test 47795d5 |
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
|
/ok to test dae3dcb |
|
/ok to test 9df444b |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23884093133 |
|
🔄 Merge queue validation started! You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23886597078 |
What does this PR do ?
Aligns chat completions endpoint with vLLM.
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.