Skip to content

Title: stream=True only returns output from basemodel, LoRA adapter is ignored. #777

@HARISHSENTHIL

Description

@HARISHSENTHIL

System Info

When using Lorax with a LoRA adapter via the /v1/chat/completions endpoint, the adapter works as expected when "stream": false.

However, when I set "stream": true, the response is clearly from the base model only, and the adapter (adapter_name) appears to be ignored.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Works:
{
"model": "Mistral-7B-Instruct-v0.1",
"adapter_name": "Medical-Insights-QA",
"stream": false,
"messages": [
{"role": "user", "content": "What are symptoms of cancer?"}
]
}
Broken (stream: true only uses base model):
{
"model": "Mistral-7B-Instruct-v0.1",
"adapter_name": "Medical-Insights-QA",
"stream": true,
"messages": [
{"role": "user", "content": "What are symptoms of cancer?"}
]
}

Expected behavior

When using the /v1/chat/completions endpoint with "stream": true, I expect the model to generate streamed responses using the specified LoRA adapter (adapter_name) — just like it does when "stream": false.
The adapter should influence generation in both streaming and non-streaming modes, resulting in consistent behavior and outputs aligned with the fine-tuned model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions