Skip to content

Align chat completions endpoint with vLLM#4063

Merged
asolergi-nv merged 12 commits intoNVIDIA:mainfrom
santhnm2:chat_completions_fixes
Apr 2, 2026
Merged

Align chat completions endpoint with vLLM#4063
asolergi-nv merged 12 commits intoNVIDIA:mainfrom
santhnm2:chat_completions_fixes

Conversation

@santhnm2
Copy link
Copy Markdown
Contributor

What does this PR do ?

Aligns chat completions endpoint with vLLM.

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

  1. When your PR is ready, click Ready for Review.
  2. An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
    • Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@santhnm2 santhnm2 marked this pull request as ready for review March 31, 2026 19:50
@santhnm2 santhnm2 requested review from a team as code owners March 31, 2026 19:50
@svcnvidia-nemo-ci svcnvidia-nemo-ci requested a review from a team March 31, 2026 19:50
return self._tokenizer.text_to_ids(text)

def detokenize(self, ids: List[int]) -> str:
def detokenize(self, ids: List[int], skip_special_tokens: bool = True) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TikToken tokenizer is defaulting to False, HuggingFace tokenizer to None & this to True. Let's default all 3 values to False and set it to True in your required parts of the code, WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I looked into this but I think changing the default values isn't safe without also updating all the backend ids_to_text implementations to accept skip_special_tokens, which is perhaps too invasive for this PR. I can add a TODO to revisit this in a follow-up PR if that works?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I ended up making the default None here as well, so at least that aligns with the Hugging Face tokenizer now.

return self._tokenizer.text_to_ids(text)

def detokenize(self, ids: List[int]) -> str:
def detokenize(self, ids: List[int], skip_special_tokens: bool = True) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also rename this argument to remove_special_tokens?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think Hugging Face actually uses skip_special_tokens in their API: https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PythonBackend.decode.skip_special_tokens
But we're using remove_special_tokens just in the Megatron tokenizer library. Perhaps the better change is to update remove_special_tokens to skip_special_tokens?

@asolergi-nv
Copy link
Copy Markdown
Contributor

/claude review

@ericharper
Copy link
Copy Markdown
Contributor

Do we need to add any tests for this PR?

"""

return self._tokenizer.ids_to_text(ids)
return self._tokenizer.ids_to_text(ids, remove_special_tokens=skip_special_tokens)
Copy link
Copy Markdown
Contributor

@ericharper ericharper Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break with some tokenizers like NullTokenizer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks for catching this - there was code in other places to gate this argument on whether the function actually has the parameter which I've now refactored into the accepts_parameter function. So now the code should be safe for all tokenizers (the remove_special_tokens argument will just be ignored if the parameter does not exist); I added a unit test to verify this also.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also changed the default value for the skip_special_tokens argument to None so that we only pass it if it's explicitly set.

@santhnm2
Copy link
Copy Markdown
Contributor Author

santhnm2 commented Apr 1, 2026

Do we need to add any tests for this PR?

Added a unit test test_hf_detokenize_skip_special_tokens to tests/unit_tests/tokenizers/test_tokenizer.py

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
@santhnm2
Copy link
Copy Markdown
Contributor Author

santhnm2 commented Apr 1, 2026

/ok to test 47795d5

@santhnm2
Copy link
Copy Markdown
Contributor Author

santhnm2 commented Apr 1, 2026

/ok to test dae3dcb

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Apr 1, 2026
@santhnm2
Copy link
Copy Markdown
Contributor Author

santhnm2 commented Apr 1, 2026

/ok to test 9df444b

@svcnvidia-nemo-ci svcnvidia-nemo-ci added Approved All necessary approvals have been made and removed Final Review PR is in the "final review" stage labels Apr 1, 2026
@asolergi-nv asolergi-nv added this pull request to the merge queue Apr 2, 2026
@svcnvidia-nemo-ci
Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23884093133

@svcnvidia-nemo-ci
Copy link
Copy Markdown

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23886597078

Merged via the queue into NVIDIA:main with commit cb3bb41 Apr 2, 2026
130 of 133 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved All necessary approvals have been made complexity: medium Run functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants