Summary
The streamed token decode path should treat completion usage metadata as optional on final or error chunks.
Problem
The streamed decode handler assumes final chunks always contain usage metadata fields such as:
prompt_tokens
completion_tokens
cached_tokens
Under large-load generation, some final or error chunks can omit those fields.
Impact
The worker raises a KeyError while building completion usage metadata.
This then propagates as backend errors and frontend 500 responses.
Desired behavior
The decode handler should emit completion_usage only when the required usage fields are present, instead of treating them as mandatory.
Scope
This is a source-tree issue in the SGLang streamed token decode path.
Validation target
Final or error chunks that omit usage metadata should still be processed without worker exceptions and without frontend 500 responses caused by missing usage keys.
Summary
The streamed token decode path should treat completion usage metadata as optional on final or error chunks.
Problem
The streamed decode handler assumes final chunks always contain usage metadata fields such as:
prompt_tokenscompletion_tokenscached_tokensUnder large-load generation, some final or error chunks can omit those fields.
Impact
The worker raises a
KeyErrorwhile building completion usage metadata.This then propagates as backend errors and frontend
500responses.Desired behavior
The decode handler should emit
completion_usageonly when the required usage fields are present, instead of treating them as mandatory.Scope
This is a source-tree issue in the SGLang streamed token decode path.
Validation target
Final or error chunks that omit usage metadata should still be processed without worker exceptions and without frontend
500responses caused by missing usage keys.