Skip to content

Eval bug: SYCL: Qwen3.5 spitting garbage on the second prompt #21589

@WizardlyBump17

Description

@WizardlyBump17

Name and Version

root@cb8f619a68ce:/app# ./llama-cli --version
load_backend: loaded SYCL backend from /app/libggml-sycl.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 8699 (4eb19514d)
built with IntelLLVM 2025.3.2 for Linux x86_64

Operating systems

Linux

GGML backends

SYCL

Hardware

Ryzen 7 5700X3D + B580

Models

Any Qwen3.5 that is not F16 or BF16

Problem description & steps to reproduce

Just send two prompts and watch the second one being responded with garbage

First Bad Commit

No response

Relevant log output

Logs
root@cb8f619a68ce:/app# ./llama-cli --model /models/Qwen3.5-35B-A3B-Q4_K_M.gguf
load_backend: loaded SYCL backend from /app/libggml-sycl.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so

Loading model...  


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8699-4eb19514d
model      : Qwen3.5-35B-A3B-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read <file>        add a text file
  /glob <pattern>     add text files using globbing pattern


> hi

[Start thinking]
Thinking Process:

1.  **Analyze the Request:**
    *   Input: "hi"
    *   Intent: Greeting.
    *   Context: Start of a conversation.
    *   Tone: Friendly, helpful, casual.

2.  **Determine the appropriate response:**
    *   Acknowledge the greeting.
    *   Offer assistance.
    *   Keep it concise but warm.

3.  **Draft potential responses:**
    *   "Hello! How can I help you today?"
    *   "Hi there! What's on your mind?"
    *   "Hello! Nice to meet you. How are you doing?"
    *   "Hi! 👋 Need any help with something specific?"

4.  **Select the best option:** A combination of friendly greeting and offer of help is standard and effective. "Hello! How can I help you today?" is clear and professional yet friendly. Adding an emoji can make it warmer.

5.  **Final Polish:** "Hello! 👋 How can I help you today?" or "Hi there! How's it going? What can I do for you?" Let's go with a simple, welcoming response.

6.  **Final Output Generation:** (Matches the selected draft)
    "Hello! 👋 How can I help you today?" or similar.

    *Self-Correction during thought process:* Keep it natural. Don't overthink a simple "hi".

    *Final Decision:* "Hello! 👋 How can I help you today?" or "Hi there! How's it going?" Let's use a friendly greeting + offer help.

    "Hello! 👋 How can I assist you today?" (Professional yet friendly)
    "Hi! How's it going? Anything I can help you with?" (More casual)

    Let's go with: "Hello! 👋 How can I help you today?" (Balanced)

    *Wait, checking system instructions:* I am an AI assistant. I should be helpful.

    *Revised Plan:* Just say hi back and ask how to help.

    *Final Output:* "Hello! 👋 How can I help you today?" or "Hi there! What can I do for you?"

    Let's pick: "Hello! 👋 How can I help you today?" (Simple and effective).

    *Wait, one more consideration:* Sometimes users just want to chat. So "How are you?" is also good.
    "Hi there! How's it going? Is there something specific you'd like to talk about or work on?"

    Let's keep it open-ended but inviting.

    "Hello! 👋 How's it going? Is there anything I can help you with today?"

    Actually, brevity is often better for a "hi".
    "Hi there! 👋 How can I help you today?"

    Okay, ready to output.cw

[End thinking]

Hello! 👋 How's it going? Is there anything I can help you with today?

[ Prompt: 48.2 t/s | Generation: 28.1 t/s ]

> hi

[Start thinking]
-s55=G,G9+125H89-*C"2@'8+=H53CAG%&4H8E<>6'<<.%F."4-<"-B)G1E8/'8+%D*@/2!F7>*C1DD2,(4=)CF;*E;D<CA6AHGA;F(HH<+&$&0GF+4H@'(C0>!%G55:,$E2GB2$'6=9,:,#"/E3G':+$+BBE+5C7B24.4E+-:6$>/E)F,:D63)F(19>AGB(4:88;C/&$5H$68C+50&56:<:!(":<D2A*<0H1&<G4.1E*0HF!4@A9/(BFG1G>!/!7/A#'-*CB,7:60;H"72+;6';#$$(#C#1:4E-*5'#&E)+E!72B6*,$5.G.6$*-H#F!2=:E.E09=!;#$>ABA,2:2G'HE;</)@=-#=*&@C'5DHHE:!5&/+&+))6$F(@-G,/H7/1;%E>$EB(7>#0+D=-C;@<<E,B=7-8#"G2+9DG($<!<*0.<''@&$'5B76"B-9*>8H@CH0C1+H6>7)E9;<A6&<;8(83(!%@15B,8-FEB87=3CG,#4+8>$$&:7H+91&6!F2%DF8>9+8D-88)-)>*!.*6-$!D#4HC4>&>DH7FA:9:&-186-,9&%C&;='/4E>E!7.C3"5%5<'5CEC=*;"!H"'+6%%/<3/>,>2*7='5<*E+,!G@AD4@5H"B2<C,AGBH9G"(#%1E,$6*

[ Prompt: 51.1 t/s | Generation: 29.1 t/s ]

> 

Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]                     | total   free     self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - SYCL0 (Intel(R) Arc(TM) B580 Graphics) | 12216 = 1023 + (10922 = 10282 +     142 +     497) +         270 |
llama_memory_breakdown_print: |   - Host                                   |                 19830 = 19814 +       0 +      16                |
root@cb8f619a68ce:/app# 

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions