root@cb8f619a68ce:/app# ./llama-cli --model /models/Qwen3.5-35B-A3B-Q4_K_M.gguf
load_backend: loaded SYCL backend from /app/libggml-sycl.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
Loading model...
▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀
build : b8699-4eb19514d
model : Qwen3.5-35B-A3B-Q4_K_M.gguf
modalities : text
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read <file> add a text file
/glob <pattern> add text files using globbing pattern
> hi
[Start thinking]
Thinking Process:
1. **Analyze the Request:**
* Input: "hi"
* Intent: Greeting.
* Context: Start of a conversation.
* Tone: Friendly, helpful, casual.
2. **Determine the appropriate response:**
* Acknowledge the greeting.
* Offer assistance.
* Keep it concise but warm.
3. **Draft potential responses:**
* "Hello! How can I help you today?"
* "Hi there! What's on your mind?"
* "Hello! Nice to meet you. How are you doing?"
* "Hi! 👋 Need any help with something specific?"
4. **Select the best option:** A combination of friendly greeting and offer of help is standard and effective. "Hello! How can I help you today?" is clear and professional yet friendly. Adding an emoji can make it warmer.
5. **Final Polish:** "Hello! 👋 How can I help you today?" or "Hi there! How's it going? What can I do for you?" Let's go with a simple, welcoming response.
6. **Final Output Generation:** (Matches the selected draft)
"Hello! 👋 How can I help you today?" or similar.
*Self-Correction during thought process:* Keep it natural. Don't overthink a simple "hi".
*Final Decision:* "Hello! 👋 How can I help you today?" or "Hi there! How's it going?" Let's use a friendly greeting + offer help.
"Hello! 👋 How can I assist you today?" (Professional yet friendly)
"Hi! How's it going? Anything I can help you with?" (More casual)
Let's go with: "Hello! 👋 How can I help you today?" (Balanced)
*Wait, checking system instructions:* I am an AI assistant. I should be helpful.
*Revised Plan:* Just say hi back and ask how to help.
*Final Output:* "Hello! 👋 How can I help you today?" or "Hi there! What can I do for you?"
Let's pick: "Hello! 👋 How can I help you today?" (Simple and effective).
*Wait, one more consideration:* Sometimes users just want to chat. So "How are you?" is also good.
"Hi there! How's it going? Is there something specific you'd like to talk about or work on?"
Let's keep it open-ended but inviting.
"Hello! 👋 How's it going? Is there anything I can help you with today?"
Actually, brevity is often better for a "hi".
"Hi there! 👋 How can I help you today?"
Okay, ready to output.cw
[End thinking]
Hello! 👋 How's it going? Is there anything I can help you with today?
[ Prompt: 48.2 t/s | Generation: 28.1 t/s ]
> hi
[Start thinking]
-s55=G,G9+125H89-*C"2@'8+=H53CAG%&4H8E<>6'<<.%F."4-<"-B)G1E8/'8+%D*@/2!F7>*C1DD2,(4=)CF;*E;D<CA6AHGA;F(HH<+&$&0GF+4H@'(C0>!%G55:,$E2GB2$'6=9,:,#"/E3G':+$+BBE+5C7B24.4E+-:6$>/E)F,:D63)F(19>AGB(4:88;C/&$5H$68C+50&56:<:!(":<D2A*<0H1&<G4.1E*0HF!4@A9/(BFG1G>!/!7/A#'-*CB,7:60;H"72+;6';#$$(#C#1:4E-*5'#&E)+E!72B6*,$5.G.6$*-H#F!2=:E.E09=!;#$>ABA,2:2G'HE;</)@=-#=*&@C'5DHHE:!5&/+&+))6$F(@-G,/H7/1;%E>$EB(7>#0+D=-C;@<<E,B=7-8#"G2+9DG($<!<*0.<''@&$'5B76"B-9*>8H@CH0C1+H6>7)E9;<A6&<;8(83(!%@15B,8-FEB87=3CG,#4+8>$$&:7H+91&6!F2%DF8>9+8D-88)-)>*!.*6-$!D#4HC4>&>DH7FA:9:&-186-,9&%C&;='/4E>E!7.C3"5%5<'5CEC=*;"!H"'+6%%/<3/>,>2*7='5<*E+,!G@AD4@5H"B2<C,AGBH9G"(#%1E,$6*
[ Prompt: 51.1 t/s | Generation: 29.1 t/s ]
>
Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - SYCL0 (Intel(R) Arc(TM) B580 Graphics) | 12216 = 1023 + (10922 = 10282 + 142 + 497) + 270 |
llama_memory_breakdown_print: | - Host | 19830 = 19814 + 0 + 16 |
root@cb8f619a68ce:/app#
Name and Version
Operating systems
Linux
GGML backends
SYCL
Hardware
Ryzen 7 5700X3D + B580
Models
Any Qwen3.5 that is not F16 or BF16
Problem description & steps to reproduce
Just send two prompts and watch the second one being responded with garbage
First Bad Commit
No response
Relevant log output
Logs