fix(openvino): define PartialShape bounds for tensors#21637
fix(openvino): define PartialShape bounds for tensors#21637thedanhoffman wants to merge 2 commits intoggml-org:masterfrom
Conversation
| if (m_is_static) { | ||
| input_shape = ov::PartialShape{1, 1, 1, m_compute_params.output_len}; | ||
| } else { | ||
| input_shape = ov::PartialShape{1, 1, 1, ov::Dimension(1, m_compute_params.output_len)}; |
There was a problem hiding this comment.
The changes look good to me but why do we need to remove one liners for this?
For example, can we still use something like this?
input_shape = ov::PartialShape{1, 1, 1, m_is_static ?
m_compute_params.output_len : ov::Dimension(1, m_compute_params.output_len)};
There was a problem hiding this comment.
yea i can change it back to how it was
| if (!m_is_static) { | ||
| // do not fix ctx size to make llama-bench work across test params | ||
| input_shape[2] = -1; | ||
| input_shape[2] = dim_span_ctx; |
There was a problem hiding this comment.
llama-bench failed on my side for larger context sizes and for all stateful executions.
tested with llama3.2 1B q4_0
stateless: -d 512,1024 fails
stateful: all ctx sizes fail
There was a problem hiding this comment.
investigating this now
There was a problem hiding this comment.
cache wasn't being properly invalidated and there's no easy way (AFAICT) to get the max possible batch size
|
cache wasn't being invalidated for cases where the batch size changes. adjusted I'm hoping for most "real" workloads, there will be some sort of system prompt which exceeds whatever batch size is configured, so there shouldn't be a performance hit in this case (except maybe post-warmup, but that's a one-time cost, so not a big deal) |
|
also went ahead and moved this to draft since how we go about invalidating the cache might be a broader architectural question |
Overview
Bound
PartialShapedimensions to be within some reasonable range. This is sufficient to fix an OpenCL allocation issue observed by running #20938 with a GPU.Even in the case with no functional problems, these shape bounds help inform the OpenVINO graph compiler.
Additional information
Discussed offline with @cavusmustafa. The underlying issue will probably be root-caused to an OpenVINO upstream issue, but if "best practices" on the API user end solves this issue, then I don't see much of a problem with enabling this to work around the issue.
Requirements