Skip to content

docs(orchestrator): canonical cache overlay strategy and self-hosted lessons#567

Closed
frostebite wants to merge 2 commits into
mainfrom
docs/canonical-cache-and-self-hosted-lessons
Closed

docs(orchestrator): canonical cache overlay strategy and self-hosted lessons#567
frostebite wants to merge 2 commits into
mainfrom
docs/canonical-cache-and-self-hosted-lessons

Conversation

@frostebite
Copy link
Copy Markdown
Member

@frostebite frostebite commented May 7, 2026

Summary

Documents the new opt-in localCacheMode: 'canonical-overlay' value introduced in game-ci/orchestrator#22, plus four operational lessons that benefit any long-lived self-hosted runner.

Adds two new sections to docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx:

Canonical Cache + Overlay (Advanced)

  • When to use it -- multiple self-hosted runners on the same physical host with multi-GB Library folders
  • Architecture -- canonical store + per-runner hardlink overlay, atomic-rename publish, SHA-versioned with latest pointer
  • Hardlink safety contract -- write-temp-then-rename invariant; per-subdirectory classifier table targeting Unity Library structure (PackageCache=junction, ScriptAssemblies/Artifacts/etc.=hardlink, PackageManager+DAG companions=copy, editor session state=skip)
  • Configuration example -- end-to-end YAML with all six new inputs
  • Sub-second hydration via cacheMaterialize: prepared
  • Failure modes table -- orphan staging cleanup, prepared-overlay cancel-safety, missing-canonical recovery, sentinel-mismatch discard
  • Cross-platform behaviour table -- NTFS / ReFS / ext4 / xfs / btrfs / zfs / APFS / Docker / cross-volume, with graceful fallback policy
  • Six stated limitations -- single-host topology, NTFS/ext4/xfs needed for full benefit, junction safety, 1023-link NTFS limit, modify-in-place propagation, multi-GB scale tested only on NTFS
  • Performance characteristics table -- eager 15-30 s, with junctions 2-8 s, prepared overlay < 1 s
  • Future strategies in the taxonomy -- reflink-overlay, bind-mount, nfs-passthrough, runner-affinity

Self-Hosted Operational Lessons

Four lessons that apply to any long-lived self-hosted runner. Symptoms ephemeral runners never see:

  • PackageCache subtree corruption from cross-runner restore -- detect-and-quarantine pattern for orphan-meta directories
  • PowerShell forward-reference bug -- function calls before definition silently fail in non-strict mode and downstream errors surface as the failure
  • GitHub Actions runner _runner_file_commands race -- concurrent core.setOutput from fan-out steps races on the shared runner-temp file; serialise output-emitting steps
  • Path-filter on the workflow entrypoint -- paths-ignore for docs/**, *.md, etc. to skip Unity CI on docs-only pushes

Inputs Reference

Extended with the six new inputs from game-ci/orchestrator#22:

  • localCacheMode -- now accepts canonical-overlay in addition to existing values
  • canonicalCacheRoot -- path for the canonical store
  • canonicalCacheClassifier -- JSON describing per-subdirectory hardlink/junction/copy/skip strategy
  • canonicalCacheVersionRetention -- how many SHA versions to keep per cache key
  • cacheMaterialize -- eager (live) or prepared (atomic-rename pre-built overlay)
  • cacheSentinelCanary -- defense-in-depth corruption check

Test plan

Pairs with

game-ci/orchestrator#22 -- the opt-in code that this documents. Both PRs are designed to land together.

Summary by CodeRabbit

Documentation

  • Introduced new canonical-overlay caching mode for multi-runner self-hosted environments, enabling efficient cache sharing through atomic content-addressed storage and per-runner overlay materialization
  • Expanded configuration reference with new inputs controlling canonical cache roots, version retention, overlay materialization, and sentinel canary behavior
  • Added detailed guidance on hardlink safety contracts, cross-platform behavior, performance expectations, and failure recovery mechanisms

…lessons

Adds two new sections to the orchestrator caching guide.

"Canonical Cache + Overlay (Advanced)" -- documents the new opt-in
`localCacheMode: canonical-overlay` value introduced in
game-ci/orchestrator. Covers:

  * When to use it (multiple self-hosted runners on the same physical
    host with multi-GB Library folders)
  * Architecture (canonical store + per-runner hardlink overlay,
    atomic-rename publish, SHA-versioned with `latest` pointer)
  * Hardlink safety contract (write-temp-then-rename invariant; per-
    subdirectory classifier table targeting Unity Library structure)
  * Configuration example
  * Sub-second hydration via `cacheMaterialize: prepared`
  * Failure modes table (orphan staging cleanup, prepared-overlay
    cancel-safety, missing-canonical recovery, sentinel mismatch
    discard)
  * Cross-platform behaviour table (NTFS/ReFS/ext4/xfs/btrfs/zfs/APFS/
    Docker/cross-volume) with graceful fallback policy
  * Six stated limitations (single-host topology, NTFS/ext4/xfs needed
    for full benefit, junction safety on PowerShell, 1023-link NTFS
    limit, modify-in-place propagation, multi-GB scale tested only
    on NTFS)
  * Performance characteristics table (eager 15-30 s, with junctions
    2-8 s, prepared overlay < 1 s)
  * Pointer to future strategies in the taxonomy (reflink-overlay,
    bind-mount, nfs-passthrough, runner-affinity)

"Self-Hosted Operational Lessons" -- four lessons applicable to any
long-lived self-hosted runner that ephemeral runners never encounter:

  * `PackageCache` subtree corruption from cross-runner restore --
    detect-and-quarantine pattern for orphan-meta directories
  * PowerShell forward-reference bug -- function calls before
    definition silently fail in non-strict mode and downstream errors
    surface as the failure
  * GitHub Actions runner `_runner_file_commands` race -- concurrent
    `core.setOutput` from fan-out steps races on the shared runner-temp
    file; serialise output-emitting steps
  * Path-filter on the workflow entrypoint -- `paths-ignore` for
    `docs/**`, `*.md`, etc. to skip Unity CI on docs-only pushes

Inputs Reference table extended with the six new orchestrator-PR
inputs (`localCacheMode` enum extension, `canonicalCacheRoot`,
`canonicalCacheClassifier`, `canonicalCacheVersionRetention`,
`cacheMaterialize`, `cacheSentinelCanary`).

Pairs with `game-ci/orchestrator` PR (feat/canonical-cache-overlay
branch).
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Cat Gif

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This documentation update introduces a new canonical-overlay caching mode for self-hosted multi-runner environments. It covers atomic publishing to a canonical content-addressed store, per-runner hardlink/junction overlays, configuration options, cross-platform fallback behavior, failure modes, performance characteristics, and operational guidance.

Changes

Canonical Cache + Overlay Feature Documentation

Layer / File(s) Summary
Architecture & Core Design
docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx
Introduces canonical-overlay strategy with canonical store, atomic staging-rename publish, per-runner hardlink/junction overlays, hardlink safety contract, and Unity-specific subdirectory classifier.
Configuration & Failure Handling
docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx
Documents canonical root, version retention, cacheMaterialize: prepared behavior, steady-state fast hydration, SHA mismatch fallback, and failure mode table for cancellation, missing overlays, canonical absence, and sentinel mismatch scenarios.
Cross-Platform Behavior & Limitations
docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx
Describes unsupported platform fallback, warning emission, inode-link limitations, and cross-platform compatibility constraints.
Performance & Future Strategies
docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx
Specifies performance characteristics per cache configuration and lists future strategy taxonomy (reflink-overlay, bind-mount, nfs-passthrough) as not yet implemented.
Operational Guidance
docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx
Adds lessons on package cache corruption, PowerShell forward-reference masking, GitHub Actions _runner_file_commands race, and workflow paths-ignore strategy for docs-only commits.
Inputs Reference Update
docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx
Updates Inputs Reference table with canonical-overlay option and new inputs: canonicalCacheRoot, canonicalCacheClassifier, canonicalCacheVersionRetention, cacheMaterialize, cacheSentinelCanary.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A cache that's both swift and wise,
Through hardlinks new, the stor­ies rise—
Canonical paths and overlays bright,
Multi-runner magic done just right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is comprehensive and well-structured, covering all major changes with detailed sections, but it does not follow the repository's template structure with the required 'Changes' section header and checklist. Restructure the description to follow the template: add a 'Changes' section listing key updates concisely, and include the completed checklist items (guide, readme, tests) at the bottom.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main documentation additions: canonical cache overlay strategy and self-hosted operational lessons, matching the core content of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/canonical-cache-and-self-hosted-lessons

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Microsoft Presidio Analyzer (2.2.362)
docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx

Microsoft Presidio Analyzer failed to scan this file


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Visit the preview URL for this PR (updated for commit 8801768):

https://game-ci-5559f--pr567-docs-canonical-cache-sgnlzgy7.web.app

(expires Thu, 14 May 2026 16:05:12 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: 1f0574f15f83e11bfc148eae8646486a6d0e078b

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@frostebite frostebite requested a review from webbertakken May 7, 2026 16:04
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx (1)

530-538: ⚡ Quick win

Use **/*.md and **/*.txt in paths-ignore to match markdown and text files across all directories.

The patterns '*.md' and '*.txt' match only files in the repository root; they do not match nested files like blog/post.md. If the intent is to skip CI for docs-only commits across the entire repo, replace these with '**/*.md' and '**/*.txt' to catch all markdown and text files regardless of nesting depth.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx` around lines
530 - 538, The paths-ignore patterns only match root-level files; update the
YAML in the on: push: block so that paths-ignore uses '**/*.md' and '**/*.txt'
instead of '*.md' and '*.txt' to ensure markdown and text files in nested
directories are ignored; modify the paths-ignore entry that currently lists
'docs/**', '*.md', '*.txt' to use '**/*.md' and '**/*.txt' alongside 'docs/**'.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx`:
- Line 557: The table entry for `canonicalCacheRoot` references `localCacheRoot`
which isn't defined here; add a corresponding `localCacheRoot` row (or a clear
inline link to its existing definition) so the fallback description is
self-contained—update the table to include a `localCacheRoot` row describing its
purpose and default path (e.g., `<localCacheRoot>/canonical` reference) and
ensure `canonicalCacheRoot` remains unchanged but now points to that
defined/linked `localCacheRoot`.

---

Nitpick comments:
In `@docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx`:
- Around line 530-538: The paths-ignore patterns only match root-level files;
update the YAML in the on: push: block so that paths-ignore uses '**/*.md' and
'**/*.txt' instead of '*.md' and '*.txt' to ensure markdown and text files in
nested directories are ignored; modify the paths-ignore entry that currently
lists 'docs/**', '*.md', '*.txt' to use '**/*.md' and '**/*.txt' alongside
'docs/**'.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2d1d4998-eb45-4b9d-8ccb-25c2173c76ba

📥 Commits

Reviewing files that changed from the base of the PR and between ab8eb11 and 8801768.

📒 Files selected for processing (1)
  • docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx

| `skipCache` | Skip cache restore entirely |
| `useCompressionStrategy` | Use LZ4 compression for cache archives |
| `localCacheMode` | One of `tar`, `move-directory`, `copy-directory`, `canonical-overlay` |
| `canonicalCacheRoot` | Path for the canonical store (when `localCacheMode: canonical-overlay`); falls back to `<localCacheRoot>/canonical` |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

canonicalCacheRoot description references localCacheRoot, but localCacheRoot is not defined in this table.

Please add a localCacheRoot row (or link to its definition) so the fallback path is self-contained and unambiguous here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/03-github-orchestrator/07-advanced-topics/01-caching.mdx` at line 557,
The table entry for `canonicalCacheRoot` references `localCacheRoot` which isn't
defined here; add a corresponding `localCacheRoot` row (or a clear inline link
to its existing definition) so the fallback description is self-contained—update
the table to include a `localCacheRoot` row describing its purpose and default
path (e.g., `<localCacheRoot>/canonical` reference) and ensure
`canonicalCacheRoot` remains unchanged but now points to that defined/linked
`localCacheRoot`.

@frostebite
Copy link
Copy Markdown
Member Author

Superseded by #569 — the combined branch merges this work in along with the other three docs branches from the May 2026 wave. Closing in favour of the combined PR.

@frostebite frostebite closed this May 7, 2026
auto-merge was automatically disabled May 7, 2026 16:37

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant