fix(openvino): define PartialShape bounds for tensors by thedanhoffman · Pull Request #21637 · ggml-org/llama.cpp

thedanhoffman · 2026-04-08T16:06:03Z

Overview

Bound PartialShape dimensions to be within some reasonable range. This is sufficient to fix an OpenCL allocation issue observed by running #20938 with a GPU.

Even in the case with no functional problems, these shape bounds help inform the OpenVINO graph compiler.

Additional information

Discussed offline with @cavusmustafa. The underlying issue will probably be root-caused to an OpenVINO upstream issue, but if "best practices" on the API user end solves this issue, then I don't see much of a problem with enabling this to work around the issue.

export GGML_OPENVINO_STATEFUL_EXECUTION=1

/home/dhoffman/llama_openvino/llama.cpp/build/bin/llama-cli --verbose -m /home/dhoffman/models/Qwen3-4B-Instruct-2507-Q4_0.gguf -ngl 99 -np 1 -c 8192 -ctk f16 -ctv f16 -ctkd f16 -ctvd f16 --flash-attn on --jinja --reasoning-format deepseek --simple-io --single-turn -n 16 -p hello --no-warmup -c 16

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: All code written was done by myself personally, but I did throw an AI at the debug process. Again, I work at Intel and have discussed this with the owners in-person, so the changes are not unreasonable

cavusmustafa · 2026-04-08T17:56:53Z

ggml/src/ggml-openvino/ggml-decoder.cpp

+        if (m_is_static) {
+            input_shape = ov::PartialShape{1, 1, 1, m_compute_params.output_len};
+        } else {
+            input_shape = ov::PartialShape{1, 1, 1, ov::Dimension(1, m_compute_params.output_len)};


The changes look good to me but why do we need to remove one liners for this?

For example, can we still use something like this?

input_shape = ov::PartialShape{1, 1, 1, m_is_static ? m_compute_params.output_len : ov::Dimension(1, m_compute_params.output_len)};

yea i can change it back to how it was

cavusmustafa · 2026-04-08T18:01:22Z

ggml/src/ggml-openvino/ggml-decoder.cpp

        if (!m_is_static) {
            // do not fix ctx size to make llama-bench work across test params
-            input_shape[2] = -1;
+            input_shape[2] = dim_span_ctx;


llama-bench failed on my side for larger context sizes and for all stateful executions.
tested with llama3.2 1B q4_0

stateless: -d 512,1024 fails
stateful: all ctx sizes fail

investigating this now

cache wasn't being properly invalidated and there's no easy way (AFAICT) to get the max possible batch size

…ache

thedanhoffman · 2026-04-10T05:18:30Z

cache wasn't being invalidated for cases where the batch size changes. adjusted ComputeParams to accept the max observed prefill size and invalidate the cache in cases where the maximum increases

I'm hoping for most "real" workloads, there will be some sort of system prompt which exceeds whatever batch size is configured, so there shouldn't be a performance hit in this case (except maybe post-warmup, but that's a one-time cost, so not a big deal)

thedanhoffman · 2026-04-10T05:19:30Z

also went ahead and moved this to draft since how we go about invalidating the cache might be a broader architectural question

fix(openvino): define PartialShape bounds for tensors

b0991d4

cavusmustafa suggested changes Apr 8, 2026

View reviewed changes

adjust PartialShape based on max observed batch size and invalidate c…

014cff3

…ache

github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenVINO labels Apr 10, 2026

thedanhoffman marked this pull request as draft April 10, 2026 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openvino): define PartialShape bounds for tensors#21637

fix(openvino): define PartialShape bounds for tensors#21637
thedanhoffman wants to merge 2 commits intoggml-org:masterfrom
thedanhoffman:ov_partial_shape

thedanhoffman commented Apr 8, 2026

Uh oh!

cavusmustafa Apr 8, 2026

Uh oh!

thedanhoffman Apr 8, 2026

Uh oh!

cavusmustafa Apr 8, 2026

Uh oh!

thedanhoffman Apr 8, 2026

Uh oh!

thedanhoffman Apr 10, 2026

Uh oh!

thedanhoffman commented Apr 10, 2026

Uh oh!

thedanhoffman commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thedanhoffman commented Apr 8, 2026

Overview

Additional information

Requirements

Uh oh!

cavusmustafa Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

thedanhoffman Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

cavusmustafa Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

thedanhoffman Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

thedanhoffman Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

thedanhoffman commented Apr 10, 2026

Uh oh!

thedanhoffman commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants