Title: stream=True only returns output from basemodel, LoRA adapter is ignored.

### System Info

When using Lorax with a LoRA adapter via the /v1/chat/completions endpoint, the adapter works as expected when "stream": false.

However, when I set "stream": true, the response is clearly from the base model only, and the adapter (adapter_name) appears to be ignored. 

### Information

- [x] Docker
- [ ] The CLI directly

### Tasks

- [x] An officially supported command
- [ ] My own modifications

### Reproduction

Works:
{
  "model": "Mistral-7B-Instruct-v0.1",
  "adapter_name": "Medical-Insights-QA",
  "stream": false,
  "messages": [
    {"role": "user", "content": "What are symptoms of cancer?"}
  ]
}
Broken (stream: true only uses base model):
{
  "model": "Mistral-7B-Instruct-v0.1",
  "adapter_name": "Medical-Insights-QA",
  "stream": true,
  "messages": [
    {"role": "user", "content": "What are symptoms of cancer?"}
  ]
}

### Expected behavior

When using the /v1/chat/completions endpoint with "stream": true, I expect the model to generate streamed responses using the specified LoRA adapter (adapter_name) — just like it does when "stream": false.
The adapter should influence generation in both streaming and non-streaming modes, resulting in consistent behavior and outputs aligned with the fine-tuned model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Title: stream=True only returns output from basemodel, LoRA adapter is ignored. #777

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Title: stream=True only returns output from basemodel, LoRA adapter is ignored. #777

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions