Skip to content

Streamed final chunks should not require usage metadata fields #5

@mvillmow

Description

@mvillmow

Summary

The streamed token decode path should treat completion usage metadata as optional on final or error chunks.

Problem

The streamed decode handler assumes final chunks always contain usage metadata fields such as:

  • prompt_tokens
  • completion_tokens
  • cached_tokens

Under large-load generation, some final or error chunks can omit those fields.

Impact

The worker raises a KeyError while building completion usage metadata.
This then propagates as backend errors and frontend 500 responses.

Desired behavior

The decode handler should emit completion_usage only when the required usage fields are present, instead of treating them as mandatory.

Scope

This is a source-tree issue in the SGLang streamed token decode path.

Validation target

Final or error chunks that omit usage metadata should still be processed without worker exceptions and without frontend 500 responses caused by missing usage keys.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions