Skip to content

Per-entry seed in XNNPACK weights cache for XNNPACK-upgrade invalidation#20170

Open
doggeral wants to merge 2 commits into
pytorch:mainfrom
doggeral:export-D108082431
Open

Per-entry seed in XNNPACK weights cache for XNNPACK-upgrade invalidation#20170
doggeral wants to merge 2 commits into
pytorch:mainfrom
doggeral:export-D108082431

Conversation

@doggeral

@doggeral doggeral commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary:
XNNPACK exposes xnn_weights_cache_look_up_key.seed — a per-ukernel value that XNNPACK guarantees is consistent across runs of the same ukernel and changes whenever a ukernel implementation changes. Store this seed per cache entry so a stale cached packing produced by an old XNNPACK ukernel is rejected after upgrade, instead of being handed back to a newer ukernel that expects a different layout.

Changes:

  • PackedDataMeta gains uint32_t seed{0}.
  • look_up rejects (returns SIZE_MAX) when a name hit has a stored seed that doesn't match cache_key->seed. This forces look_up_or_insert to re-pack with the current ukernel and avoids the slow memcmp path catching it later.
  • look_up_or_insert records cache_key->seed on insert.
  • On-disk index entry layout extended to [name_len:u32][name][file_offset:u64][data_size:u64][seed:u32] (was 16 bytes after the name, now 20).
  • load_packed_cache reads the per-entry seed and bumps the trailing bytes bound check accordingly.
  • kCacheVersion bumped 1 → 2 so existing v1 files (which carry no seed) are rejected at load instead of being loaded with seed=0 and mismatching every fresh look_up.

Cleanup of orphaned in-memory and on-disk entries left by an invalidated look-up is a follow-up — this diff only adds the detection.

Differential Revision: D108082431

doggeral added 2 commits June 9, 2026 16:42
Summary:
Add cross-load reuse + multi-PTE safety to the file-backed packed weight cache (D106673663). The first PTE in a session calls `save_packed_index()` to append a trailer; subsequent process launches mmap the file and pre-populate `name_to_packed_data_metadata_` so `look_up()` hits for every saved weight and `xnn_create_runtime` skips packing entirely.

## Cache file format

```
[packed data regions]                          (written by reserve_space)
[index entries]                                (written by save_packed_index)
  each: name_len(4B) | name(N) | file_offset(8B) | data_size(8B)
[footer: 20 bytes]
  index_start(8B) | entry_count(4B) | magic "XPWC"(4B) | version(4B)
```

## Lifecycle invariants

- `cache_loaded_` gate: `load_packed_cache()` runs at most once per process per path. Subsequent PTE inits for the same path reopen the write fd without re-reading the trailer.
- `from_load` flag: persistent entries (loaded from trailer or promoted on save) skip `delete_packed_data` cleanup. This keeps the mmap region and metadata alive across PTE unload/reload, so the next init hits the cache instead of repacking. Without this, every PTE destroy/recreate cycle appended a fresh copy to the file (~450 MB per cycle).
- No-op save short-circuit: `save_packed_index` returns early when no new `reserve_space` happened since the last save, avoiding the mtime churn that previously made the cache file look modified on every model load.

## Multi-PTE behavior

- Multiple PTEs (or methods that don't share weights) in the same model load share one cache file. Each PTE's `reserve_space` extends the file; `finalize_for_runtime` msyncs only newly added regions; `save_packed_index` writes one trailer covering all PTEs at the end of the load.
- Sibling PTEs that opt out of the mmap path (caller passes empty `packed_cache_path`) early-return from `initialize_for_runtime` and fall through to heap allocation, without touching the singleton's PLLM state.
- Cross-model coexistence relies on caller-side discipline: only models that opt in set a non-empty cache path. Setting different non-empty paths concurrently is not supported by this singleton design.

## Caller change

`XNNPACKBackend::init` always calls `set_packed_cache_path` (with empty string for non-opted-in PTEs). This keeps the singleton path in sync with the current PTE instead of inheriting a sibling's path.

## Test Plan

```
buck2 test fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache  # 5 pass
buck2 build fbsource//xplat/executorch/backends/xnnpack:xnnpack_backendApple
buck2 build fbsource//xplat/executorch/backends/xnnpack:xnnpack_backend
buck2 build fbcode//executorch/backends/xnnpack:xnnpack_backend
```

On device (iOS Stella build, PLLM + Llama3 runner):
- Cold start: load `(1184 entries)` from cache, `reserve_mmap=0` for cached weights
- Cache file size stable at ~593 MB across PLLM unload/reload cycles
- `app_peak ~700 MB` (vs ~2.5 GB pre-fix)
- `compressed ~100 MB` (vs ~1.7 GB pre-fix)

Differential Revision: D106717093
Summary:
XNNPACK exposes `xnn_weights_cache_look_up_key.seed` — a per-ukernel value that XNNPACK guarantees is consistent across runs of the same ukernel and changes whenever a ukernel implementation changes. Store this seed per cache entry so a stale cached packing produced by an old XNNPACK ukernel is rejected after upgrade, instead of being handed back to a newer ukernel that expects a different layout.

Changes:
- `PackedDataMeta` gains `uint32_t seed{0}`.
- `look_up` rejects (returns `SIZE_MAX`) when a name hit has a stored seed that doesn't match `cache_key->seed`. This forces `look_up_or_insert` to re-pack with the current ukernel and avoids the slow `memcmp` path catching it later.
- `look_up_or_insert` records `cache_key->seed` on insert.
- On-disk index entry layout extended to `[name_len:u32][name][file_offset:u64][data_size:u64][seed:u32]` (was 16 bytes after the name, now 20).
- `load_packed_cache` reads the per-entry seed and bumps the trailing bytes bound check accordingly.
- `kCacheVersion` bumped 1 → 2 so existing v1 files (which carry no seed) are rejected at load instead of being loaded with `seed=0` and mismatching every fresh `look_up`.

Cleanup of orphaned in-memory and on-disk entries left by an invalidated look-up is a follow-up — this diff only adds the detection.

Differential Revision: D108082431
@doggeral doggeral requested a review from digantdesai as a code owner June 9, 2026 23:42
@pytorch-bot

pytorch-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20170

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 7739ca2 with merge base 26b4be8 (image):

NEW FAILURE - The following job has failed:

  • Lint / lintrunner (gh)
    >>> Lint for backends/xnnpack/test/runtime/test_xnn_weights_cache.cpp:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 9, 2026
@meta-codesync

meta-codesync Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@doggeral has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108082431.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant