Skip to content

Regression: first streaming token duplicated in /v1/chat/completions #9298

@lowne-inf

Description

@lowne-inf

Affected versions: any build after 773489e / #9244

Symptom: In streaming mode, the first content token for non-reasoning models is sent to the client twice. Reasoning models, non-streaming mode, and /v1/completions are unaffected. Visible in backend traces: chat_deltas.content has the first token duplicated, while response is correct.

Root cause: 773489e switched PredictStream to TASK_RESPONSE_TYPE_OAI_CHAT, which causes server_task_result_cmpl_partial::to_json_oaicompat_chat() to return a JSON array. For the first token (n_decoded == 1) that array has two elements: a role-init chunk {role:"assistant", content:null} followed by the actual content chunk {content:"<first token>"}.

In backend/cpp/llama-cpp/grpc-server.cpp, the loop that processes this array calls attach_chat_deltas(reply, first_result.get()) for every element with the same raw_result pointer. Since oaicompat_msg_diffs contains the first token's diff, both the role-init reply and the content reply get ChatDelta.Content = "<first token>" stamped on them. Go receives both over gRPC, accumulates both into allChatDeltas, and the streaming callback emits the first token's content twice to the SSE client.

Fix: In the array iteration loops in grpc-server.cpp (PredictStream, lines ~1721 and ~1747), skip attach_chat_deltas for role-init elements — detectable by the presence of "role" in choices[0].delta:

for (const auto & res : res_json) {
    auto reply = build_reply_from_json(res, result.get());
    // Skip role-init elements (delta has "role" key, no actual content/reasoning diffs)
    bool is_role_init = res.contains("choices") && !res["choices"].empty() &&
                        res["choices"][0].value("delta", json::object()).contains("role");
    if (!is_role_init) {
        attach_chat_deltas(reply, result.get());
    }
    writer->Write(reply);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions