llama-mmap: hint THP on mmap'd weights (Linux) by Marxist-Leninist · Pull Request #19 · PrismML-Eng/llama.cpp

Marxist-Leninist · 2026-04-08T13:18:51Z

Issue madvise(MADV_HUGEPAGE) on the read-only file mapping used for model weights on Linux. For a 1 GB model this drops the potential page count from ~262K 4KB pages to ~512 2MB pages, reducing TLB pressure and (more importantly) reducing the number of re-faults when pages get evicted under memory pressure.

Linux-only, guarded by defined(MADV_HUGEPAGE) and __linux__. Skipped when numa is set. No-op where THP is disabled.

Bench on a Skylake-SP VM, Bonsai-8B Q1_0, -fa on -ctk q8_0 -ctv q8_0 -t 12 -ub 128: neutral (~9.5 t/s tg128 both before and after) because the VM isn't memory-constrained. The change is intended for systems where the mapping does get evicted under pressure (constrained laptops, containers).

Tried the same hint on ggml_aligned_malloc for KV/activation buffers as well — that showed a ~5% regression with no visible AnonHugePages, dropped that half.

Marxist-Leninist · 2026-04-08T13:29:52Z

Heads-up: the failing labeler check here is a pre-existing issue in the fork's workflow config, not related to this patch.

The PrismML-Eng labeler workflow (.github/workflows/labeler.yml) hard-codes repository: "ggml-org/llama.cpp" on its checkout step and reads the labeler config from there. Upstream ggml-org/llama.cpp/.github/labeler.yml still uses the v5 composition syntax:

server/webui:
    - changed-files:
        - all:
            - any-glob-to-any-file:
                - tools/server/webui/**

actions/labeler@v6 removed the all: / any: composition keys, so it errors with Unknown config options were under "changed-files": all on every PR — not just this one. The fix is either:

Bump actions/labeler back to v5 in the workflow, or
Flatten the server/webui entry in upstream's labeler.yml (drop the all: wrapper since it only contains one child):

server/webui:
    - changed-files:
        - any-glob-to-any-file:
            - tools/server/webui/**

Happy to open either fix as a separate PR if you'd like.

Issue madvise(MADV_HUGEPAGE) on the read-only file mapping used for model weights on Linux. For a 1 GB model this drops the potential page count from ~262K 4KB pages to ~512 2MB pages, reducing TLB pressure and (more importantly) reducing the number of re-faults when pages get evicted under memory pressure. No-op on kernels where THP is disabled. On 'madvise' mode (the common modern default for desktop distros), this is opt-in and requires the caller to ask. Guarded by defined(MADV_HUGEPAGE) so it compiles cleanly on non-Linux. Benchmark on a Skylake-SP VM, Bonsai-8B Q1_0, -fa on -ctk q8_0 -ctv q8_0 -t 12 -ub 128: neutral on this machine (~9.5 t/s tg128 both before and after) because the VM isn't memory-constrained. The change is intended for systems where the mapping does get evicted and re-faulted under pressure.

pl752 added 2 commits April 7, 2026 11:46

Implemented optimized q1_0 dot for x86 and generic

195593b

Removed redundant helper definition

e29cd48

Marxist-Leninist force-pushed the feat/madv-hugepage branch from d74dd9b to a4ce593 Compare April 8, 2026 14:00

Marxist-Leninist force-pushed the feat/madv-hugepage branch from a4ce593 to 036a707 Compare April 8, 2026 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-mmap: hint THP on mmap'd weights (Linux)#19

llama-mmap: hint THP on mmap'd weights (Linux)#19
Marxist-Leninist wants to merge 3 commits intoPrismML-Eng:masterfrom
Marxist-Leninist:feat/madv-hugepage

Marxist-Leninist commented Apr 8, 2026 •

edited

Loading

Uh oh!

Marxist-Leninist commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Marxist-Leninist commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Marxist-Leninist commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Marxist-Leninist commented Apr 8, 2026 •

edited

Loading