server : improve cache reuse diagnostics for SWA and hybrid models by 1oridevs · Pull Request #21693 · ggml-org/llama.cpp

1oridevs · 2026-04-09T19:57:27Z

Overview

This PR improves diagnostics when prompt cache reuse falls back to full prompt re-processing.

It adds additional logging around SWA and hybrid/recurrent memory cases so it is easier to understand why cache reuse fails. This change does not modify any cache reuse logic — it only improves observability.

Additional information

The change is limited to tools/server/server-context.cpp and only affects logging behavior.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES — I used AI for brainstorming and wording help, but I wrote and verified the changes myself and take full responsibility for this PR.

ggml-gh-bot · 2026-04-09T20:02:19Z

Hi @1oridevs, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

1oridevs · 2026-04-09T20:08:30Z

Thanks for the note. I updated the PR description with an explicit AI usage disclosure.

I used AI for brainstorming and wording assistance only. I made the code changes myself, reviewed the diff, built the project locally, and take full responsibility for this submission.

If any further clarification is needed, I’m happy to provide it.

server : improve cache reuse diagnostics for SWA and hybrid models

5838a9f

1oridevs requested a review from a team as a code owner April 9, 2026 19:57

github-actions bot added examples server labels Apr 9, 2026

1oridevs closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : improve cache reuse diagnostics for SWA and hybrid models#21693

server : improve cache reuse diagnostics for SWA and hybrid models#21693
1oridevs wants to merge 1 commit intoggml-org:masterfrom
1oridevs:fix/cache-reuse-diagnostics

1oridevs commented Apr 9, 2026 •

edited

Loading

Uh oh!

ggml-gh-bot bot commented Apr 9, 2026

Uh oh!

1oridevs commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

1oridevs commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

ggml-gh-bot bot commented Apr 9, 2026

Uh oh!

1oridevs commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1oridevs commented Apr 9, 2026 •

edited

Loading

1oridevs commented Apr 9, 2026 •

edited

Loading