Using both igpu on the same machine as a gpu when igpu doesn't support matrix cores #18241

Fran2789 · 2025-12-21T01:38:07Z

Fran2789
Dec 21, 2025

This is something llama.cpp actually supports. i set the following environment variable: GGML_VK_VISIBLE_DEVICES like this in a linux command line:
GGML_VK_VISIBLE_DEVICES=0./llama-server -hf ggml-org/Qwen3-1.7B-GGUF -c 9000 --port 8081 -cram 0

you can replace the 0 in the environment variable with the device you want to use. This will force llama.cpp to use that device, because if you also have an nvidia gpu it will default to it.

I compiled llama cpp using vulkan so i am running it on a GCN5 amd igpu. An rx vega 7 on a 5600h cpu. The igpu may be slower than the cpu if you prompt the llm with questions, but it can free up the cpu, and on prompt sizes of over 1000 tokens, i find the speed of the igpu comparable to that of the cpu in the 5 to 11 tok/s range due to limited memory bandwidth using ddr4.

I use vulkan because it seems to be impossible to use opencl or rocm on integrated gpus because they don't support any llvm targets for rocm compilation.

I am also running arch Linux. Following the instructions on the build page is not necessary. I installed vulkan-headers, vulkan-devel and as long as vulkaninfo works, you do not need to source any shell scripts

On another related note, I was able to extract more performance out of the dedicated GPU by running several slots or users on llama server at the same time. As long as all their prompts fit within the context size set for llama server, it's effectively free extra performance. I disabled the kV cache since every prompt is self contained and does not rely on previous chats and that eliminated cache hits and improved performance. Context size is limited by ram amount

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using both igpu on the same machine as a gpu when igpu doesn't support matrix cores #18241

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Using both igpu on the same machine as a gpu when igpu doesn't support matrix cores #18241

Uh oh!

Uh oh!

Fran2789 Dec 21, 2025

Replies: 0 comments

Fran2789
Dec 21, 2025