Align chat completions endpoint with vLLM by santhnm2 · Pull Request #4063 · NVIDIA/Megatron-LM

santhnm2 · 2026-03-30T18:56:18Z

What does this PR do ?

Aligns chat completions endpoint with vLLM.

Contribution process

Pre-checks

I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.

Step 1: Mark PR as "Ready for Review"

When your PR is ready, click Ready for Review.
An oncall reviewer is auto-assigned and expert reviewers are notified based on your changes.
- Some PRs may jump straight to step 2. This is determined by .github/CODEOWNERS.

⚠️ Only mark as ready once merge-conflicts are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

Step 2: Final Review

For PRs that change megatron/core, once all expert reviewers have approved, the Final Review label is applied automatically and final reviewers are assigned.

For PRs outside megatron/core, this step is skipped.

Step 3: Approved

Once all required reviewers have approved, the Approved label is applied automatically.

Merge

Any member of mcore-engineers will be able to merge your PR.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

copy-pr-bot · 2026-03-30T18:56:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

.../core/inference/text_generation_server/dynamic_text_gen_server/endpoints/chat_completions.py

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

asolergi-nv · 2026-03-31T20:05:54Z

megatron/core/tokenizers/text/text_tokenizer.py

        return self._tokenizer.text_to_ids(text)

-    def detokenize(self, ids: List[int]) -> str:
+    def detokenize(self, ids: List[int], skip_special_tokens: bool = True) -> str:


TikToken tokenizer is defaulting to False, HuggingFace tokenizer to None & this to True. Let's default all 3 values to False and set it to True in your required parts of the code, WDYT?

Thanks for the suggestion! I looked into this but I think changing the default values isn't safe without also updating all the backend ids_to_text implementations to accept skip_special_tokens, which is perhaps too invasive for this PR. I can add a TODO to revisit this in a follow-up PR if that works?

Actually I ended up making the default None here as well, so at least that aligns with the Hugging Face tokenizer now.

asolergi-nv · 2026-03-31T20:07:26Z

megatron/core/tokenizers/text/text_tokenizer.py

        return self._tokenizer.text_to_ids(text)

-    def detokenize(self, ids: List[int]) -> str:
+    def detokenize(self, ids: List[int], skip_special_tokens: bool = True) -> str:


Can we also rename this argument to remove_special_tokens?

So I think Hugging Face actually uses skip_special_tokens in their API: https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PythonBackend.decode.skip_special_tokens
But we're using remove_special_tokens just in the Megatron tokenizer library. Perhaps the better change is to update remove_special_tokens to skip_special_tokens?

asolergi-nv · 2026-03-31T20:08:05Z

/claude review

ericharper · 2026-03-31T23:48:46Z

Do we need to add any tests for this PR?

ericharper · 2026-04-01T00:03:27Z

megatron/core/tokenizers/text/text_tokenizer.py

        """

-        return self._tokenizer.ids_to_text(ids)
+        return self._tokenizer.ids_to_text(ids, remove_special_tokens=skip_special_tokens)


Will this break with some tokenizers like NullTokenizer?

Megatron-LM/megatron/core/tokenizers/text/libraries/null_tokenizer.py

Line 23 in 97e36aa

def ids_to_text(self, ids):

Yes thanks for catching this - there was code in other places to gate this argument on whether the function actually has the parameter which I've now refactored into the accepts_parameter function. So now the code should be safe for all tokenizers (the remove_special_tokens argument will just be ignored if the parameter does not exist); I added a unit test to verify this also.

I also changed the default value for the skip_special_tokens argument to None so that we only pass it if it's explicitly set.

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

santhnm2 · 2026-04-01T00:54:57Z

Do we need to add any tests for this PR?

Added a unit test test_hf_detokenize_skip_special_tokens to tests/unit_tests/tokenizers/test_tokenizer.py

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

santhnm2 · 2026-04-01T01:13:54Z

/ok to test 47795d5

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

santhnm2 · 2026-04-01T17:57:17Z

/ok to test dae3dcb

santhnm2 · 2026-04-01T21:01:29Z

/ok to test 9df444b

svcnvidia-nemo-ci · 2026-04-02T04:38:02Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23884093133

svcnvidia-nemo-ci · 2026-04-02T06:06:18Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/23886597078

Align chat completions endpoint with vLLM

8aec6db

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

santhnm2 commented Mar 30, 2026

View reviewed changes

.../core/inference/text_generation_server/dynamic_text_gen_server/endpoints/chat_completions.py Show resolved Hide resolved

santhnm2 added 3 commits March 30, 2026 12:00

Pull in Sanjeev's changes

f1f4e00

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

Merge remote-tracking branch 'upstream/main' into chat_completions_fixes

e63e08c

Merge branch 'main' into chat_completions_fixes

24efc86

santhnm2 marked this pull request as ready for review March 31, 2026 19:50

santhnm2 requested review from a team as code owners March 31, 2026 19:50

svcnvidia-nemo-ci requested a review from a team March 31, 2026 19:50

svcnvidia-nemo-ci added the complexity: medium label Mar 31, 2026

jaredcasper approved these changes Mar 31, 2026

View reviewed changes

wdykas approved these changes Mar 31, 2026

View reviewed changes

asolergi-nv reviewed Mar 31, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into chat_completions_fixes

3e9afdc

ericharper reviewed Apr 1, 2026

View reviewed changes

santhnm2 added 2 commits March 31, 2026 17:16

Merge remote-tracking branch 'upstream/main' into chat_completions_fixes

8889f70

Add unit test and guard

4857665

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

santhnm2 added 2 commits March 31, 2026 18:06

Refactor safety check into accepts_parameter, add additional unit test

1ee40ca

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

Linting

47795d5

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

svcnvidia-nemo-ci added this to the Core 0.16 milestone Apr 1, 2026

copy-pr-bot bot temporarily deployed to test April 1, 2026 01:14 Inactive

santhnm2 added 2 commits April 1, 2026 10:37

Default skip_special_tokens to None

6f34e71

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>

Merge remote-tracking branch 'upstream/main' into chat_completions_fixes

dae3dcb

copy-pr-bot bot temporarily deployed to test April 1, 2026 17:58 Inactive

asolergi-nv enabled auto-merge April 1, 2026 20:58

asolergi-nv approved these changes Apr 1, 2026

View reviewed changes

svcnvidia-nemo-ci added the Final Review PR is in the "final review" stage label Apr 1, 2026

Merge branch 'main' into chat_completions_fixes

9df444b

santhnm2 added the Run functional tests label Apr 1, 2026

copy-pr-bot bot temporarily deployed to test April 1, 2026 21:02 Inactive

ericharper approved these changes Apr 1, 2026

View reviewed changes

svcnvidia-nemo-ci added Approved All necessary approvals have been made and removed Final Review PR is in the "final review" stage labels Apr 1, 2026

asolergi-nv added this pull request to the merge queue Apr 2, 2026

Merged via the queue into NVIDIA:main with commit cb3bb41 Apr 2, 2026
130 of 133 checks passed

Conversation

santhnm2 commented Mar 30, 2026

What does this PR do ?

Contribution process

Pre-checks

Code review

Step 1: Mark PR as "Ready for Review"

Step 2: Final Review

Step 3: Approved

Merge

Uh oh!

copy-pr-bot bot commented Mar 30, 2026

Uh oh!

Uh oh!

asolergi-nv Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

santhnm2 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

santhnm2 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

asolergi-nv Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

santhnm2 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

asolergi-nv commented Mar 31, 2026

Uh oh!

ericharper commented Mar 31, 2026

Uh oh!

ericharper Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

santhnm2 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

santhnm2 Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

santhnm2 commented Apr 1, 2026

Uh oh!

santhnm2 commented Apr 1, 2026

Uh oh!

santhnm2 commented Apr 1, 2026

Uh oh!

santhnm2 commented Apr 1, 2026

Uh oh!

svcnvidia-nemo-ci commented Apr 2, 2026

Uh oh!

svcnvidia-nemo-ci commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ericharper Apr 1, 2026 •

edited

Loading