Skip to content

llama-mmap: hint THP on mmap'd weights (Linux)#19

Open
Marxist-Leninist wants to merge 3 commits intoPrismML-Eng:masterfrom
Marxist-Leninist:feat/madv-hugepage
Open

llama-mmap: hint THP on mmap'd weights (Linux)#19
Marxist-Leninist wants to merge 3 commits intoPrismML-Eng:masterfrom
Marxist-Leninist:feat/madv-hugepage

Conversation

@Marxist-Leninist
Copy link
Copy Markdown

@Marxist-Leninist Marxist-Leninist commented Apr 8, 2026

Issue madvise(MADV_HUGEPAGE) on the read-only file mapping used for model weights on Linux. For a 1 GB model this drops the potential page count from ~262K 4KB pages to ~512 2MB pages, reducing TLB pressure and (more importantly) reducing the number of re-faults when pages get evicted under memory pressure.

Linux-only, guarded by defined(MADV_HUGEPAGE) and __linux__. Skipped when numa is set. No-op where THP is disabled.

Bench on a Skylake-SP VM, Bonsai-8B Q1_0, -fa on -ctk q8_0 -ctv q8_0 -t 12 -ub 128: neutral (~9.5 t/s tg128 both before and after) because the VM isn't memory-constrained. The change is intended for systems where the mapping does get evicted under pressure (constrained laptops, containers).

Tried the same hint on ggml_aligned_malloc for KV/activation buffers as well — that showed a ~5% regression with no visible AnonHugePages, dropped that half.

@Marxist-Leninist
Copy link
Copy Markdown
Author

Heads-up: the failing labeler check here is a pre-existing issue in the fork's workflow config, not related to this patch.

The PrismML-Eng labeler workflow (.github/workflows/labeler.yml) hard-codes repository: "ggml-org/llama.cpp" on its checkout step and reads the labeler config from there. Upstream ggml-org/llama.cpp/.github/labeler.yml still uses the v5 composition syntax:

server/webui:
    - changed-files:
        - all:
            - any-glob-to-any-file:
                - tools/server/webui/**

actions/labeler@v6 removed the all: / any: composition keys, so it errors with Unknown config options were under "changed-files": all on every PR — not just this one. The fix is either:

  1. Bump actions/labeler back to v5 in the workflow, or
  2. Flatten the server/webui entry in upstream's labeler.yml (drop the all: wrapper since it only contains one child):
server/webui:
    - changed-files:
        - any-glob-to-any-file:
            - tools/server/webui/**

Happy to open either fix as a separate PR if you'd like.

Issue madvise(MADV_HUGEPAGE) on the read-only file mapping used for
model weights on Linux. For a 1 GB model this drops the potential
page count from ~262K 4KB pages to ~512 2MB pages, reducing TLB
pressure and (more importantly) reducing the number of re-faults
when pages get evicted under memory pressure.

No-op on kernels where THP is disabled. On 'madvise' mode (the
common modern default for desktop distros), this is opt-in and
requires the caller to ask. Guarded by defined(MADV_HUGEPAGE) so it
compiles cleanly on non-Linux.

Benchmark on a Skylake-SP VM, Bonsai-8B Q1_0, -fa on -ctk q8_0
-ctv q8_0 -t 12 -ub 128: neutral on this machine (~9.5 t/s tg128
both before and after) because the VM isn't memory-constrained.
The change is intended for systems where the mapping does get
evicted and re-faulted under pressure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants