Skip to content

feat: @actions/cache optional path validation during restore#2414

Open
jasongin wants to merge 9 commits into
mainfrom
jasongin/cache-path-validation
Open

feat: @actions/cache optional path validation during restore#2414
jasongin wants to merge 9 commits into
mainfrom
jasongin/cache-path-validation

Conversation

@jasongin

@jasongin jasongin commented May 20, 2026

Copy link
Copy Markdown
Contributor

Optional path validation during cache restore

New restoreCache() option

A new opt-in DownloadOptions.pathValidation on @actions/cache streams a downloaded cache archive through an in-process validator before handing it to system tar for extraction. Each entry's path (and, for symlinks/hardlinks, its target) must resolve under one of the declared cache paths — anything that escapes (../..., absolute, UNC, drive-relative, NUL bytes, unsupported entry types) is caught by the validator.

await cache.restoreCache(paths, key, restoreKeys, {
  pathValidation: 'error' // 'off' (default) | 'warn' | 'error'
})
  • 'off' — legacy behavior, validation is skipped.
  • 'warn' — validate; on violations, log via core.warning / core.debug and extract anyway (legacy extraction, no allow-list).
  • 'error' — validate; on violations throw CacheIntegrityError (code: 'PATH_VIOLATION', with violations attached). No files are extracted. On a clean archive, extraction is additionally restricted to exactly the members the validator approved (see below).

Architecture: node-tar for listing, system tar for extraction — made sound with an approved-member allow-list

This is the biggest design call in the PR, and the area most changed in the latest revision.

Listing/validation uses node-tar v7's Parser in-process.

  • Pure JS, deterministic across platforms, exposes structured entry metadata. No subprocess, no parsing tar -tv text output that varies by GNU/BSD/Windows.
  • We're inspecting untrusted attacker-controlled bytes; doing it in JS avoids any shell-quoting/argv concern.
  • node-tar surfaces TAR_BAD_ARCHIVE / TAR_ENTRY_INVALID / TAR_ENTRY_ERROR / TAR_ABORT as discrete codes we promote to parse errors.
  • Gzip is handled natively; for zstd we spawn the system zstd binary for the same --long=30 window behavior the extract path uses.

Extraction stays on system tar.

  • The tar -P (--absolute-names) option must be used when extracting cache archives, because cache paths frequently live outside $GITHUB_WORKSPACE (~/.npm, ~/.cargo, ~/.gradle/caches, etc.). cacheUtils.resolvePaths rewrites every path relative to the workspace, so anything outside becomes ../../..., which GNU tar refuses without -P, and node-tar's extractor refuses by default — swapping it in wholesale would break essentially every common dependency cache.
  • The existing extract path is heavily tuned for cross-platform quirks (BSD/GNU detection, --force-local, --delay-directory-restore, zstdmt / zstd -T0, the BSD-tar-on-Windows-with-zstd workaround). Re-implementing all of that on node-tar's writer is a much larger, riskier change.
  • Zero behavior change for users who don't opt in.

Closing the listing↔extraction gap (new). The original design relied on an informal argument that "anything the validator approves, system tar also accepts." That argument is weak against path-channel parser differentials: a crafted entry (notably via PAX extended headers) can make node-tar compute a safe-looking path while system tar places the bytes somewhere else entirely. Rather than move validation onto system tar — which would require GNU tar everywhere and break BSD-tar / Windows tar.exe / many self-hosted runners — this revision keeps node-tar for listing and instead constrains the extractor to the validator's approved set:

  • In 'error' mode with a clean archive, listAndValidate now returns the exact member names node-tar derived (approvedNames) alongside the violations.
  • Those names are written to a temporary NUL-separated file (mode 0o600, in os.tmpdir(), randomized name) and passed to system tar via --null --no-recursion -T <file>.
    • --null MUST precede -T on GNU tar.
    • --no-recursion prevents an approved directory from implicitly pulling in unapproved children.
    • On GNU tar we additionally pass --no-wildcards --no-wildcards-match-slash --anchored as defense in depth; bsdtar lacks these, but glob metacharacters are already rejected during validation, so its fnmatch()-based -T matching degrades to exact-name matching.
  • The temp file is unlinked in a finally.

The effect: a member that system tar would extract to a different (escaped) path than node-tar computed simply isn't on the list and is never extracted. The parser differential becomes a no-op instead of a trusted assumption. 'warn' mode deliberately does not use the allow-list — on any violation it falls back to extracting everything, matching legacy behavior.

Performance tradeoff. Validation is a full pre-pass over the archive: decompress + parse headers (entry bodies are drained, never written to disk), then system tar decompresses a second time to extract. Roughly a 2× decompress cost on the archive stream, no extra disk I/O at the destination. The extra pass — and the allow-list — are skipped entirely when validation is 'off'.

Parser-differential & PAX header defense (new)

Beyond the allow-list, this revision hardens the validator itself against attacks that exploit the gap between node-tar's parsing and that of real extractors:

  • Length-correct PAX re-parser (pax-reparse.ts). node-tar v7 parses PAX extended-header bodies with a naive split('\n') (its own source notes XXX Values with \n in them will fail this). A PAX record is actually length-prefixed ("<len> <key>=<value>\n"), and GNU tar / libarchive consume exactly <len> bytes — so a value containing an embedded "\n<len> path=<other>\n" desynchronizes node-tar from every real extractor. The re-parser re-reads each captured PAX body byte-accurately and cross-checks path / linkpath against node-tar's view. Disagreement → PAX_DESYNC; a body node-tar treated as PAX that the re-parser can't fully account for → PAX_PARSE_FAIL.
  • Fail-closed PAX key allow-list. Any PAX key outside a known-good set (POSIX.1-2017 §3.4.3 plus the SCHILY. / GNU. / LIBARCHIVE. / POSIX. namespaces) → PAX_UNKNOWN_KEY, so a future placement-affecting PAX extension can't silently slip past.
  • Unsupported / silently-ignored entries. Entry types node-tar would silently drop are now surfaced as UNSUPPORTED_TYPE via an ignoredEntry listener, closing a class of "the validator never saw it, but tar might" bypasses.
  • Unsafe characters & glob metacharacters. Entry paths containing \n / \0UNSAFE_CHAR; paths containing * ? [ ]GLOB_METACHAR. The glob check is also what lets the bsdtar -T allow-list degrade safely to exact-name matching.
  • Meta-body flood guard. Captured PAX/global metadata bodies are bounded (maxMetaEntrySize of 1 MiB, MAX_PENDING_META cap) so a malicious archive can't exhaust memory via the validation pass.

Other design notes

  • String enum, not boolean. Three discrete modes map 1:1 onto behavior and leave room for future modes (e.g. 'quarantine').
  • Allowed roots derived from paths. deriveAllowedRoots mirrors cacheUtils.resolvePaths: tilde / env-var expansion, glob-prefix stripping, etc.
  • prepareAllowedRoots() pre-bakes per-root normalize/lowercase/trailing-sep forms once per archive, dropping the per-entry hot loop from O(N · R) to O(N + R).
  • Zstd lifecycle hardening: 64 KiB stderr cap, .catch(() => undefined) on the pipeline / exit promises to suppress late rejections, try/finally SIGTERM if the child is still running on early exit.

Violation codes

ABSOLUTE_PATH, UNC_PATH, NUL_BYTE, OUTSIDE_ROOTS, LINK_OUTSIDE_ROOTS, UNSUPPORTED_TYPE, PAX_DESYNC, PAX_PARSE_FAIL, PAX_UNKNOWN_KEY, UNSAFE_CHAR, GLOB_METACHAR.

Known limitations

  • TOCTOU between validation and extraction. The validator never touches the filesystem. A pre-existing symlink in the workspace that the archive then writes "through", or a concurrent process creating one between passes, won't be caught. (The allow-list constrains which members extract, not whether the destination tree is mutated under us.)
  • Case-sensitivity heuristic (caseInsensitiveFs()) is a platform-level guess. On case-sensitive APFS or per-dir-case-sensitive NTFS it loosens (fewer violations) rather than tightens.
  • bsdtar -T defense-in-depth flags (--no-wildcards etc.) are GNU-only; on BSD/Windows tar the allow-list relies on the validator's prior rejection of glob metacharacters to keep -T matching exact.

Tests

All path-validation suites pass: 176 passed, 3 skipped (zstd suites describe.skip when no zstd binary is on PATH). New/updated coverage in this revision:

  • __tests__/pax-reparse.test.ts — unit tests for the length-correct PAX re-parser and cross-check (PAX_DESYNC / PAX_PARSE_FAIL / PAX_UNKNOWN_KEY), 100% line coverage of pax-reparse.ts.
  • __tests__/tarPathValidationAttacks.test.ts — raw-tar attack builders covering the F1–F5 parser-differential bypass classes, UNSAFE_CHAR / GLOB_METACHAR / clean-PAX / meta-flood cases, plus end-to-end extraction tests against the real system tar.
  • __tests__/tarPathValidation.test.ts — extended for the 'error'-mode allow-list: asserts the --null --no-recursion -T args, the NUL-separated file contents, and temp-file cleanup.
  • __tests__/listAndValidate.test.ts — updated for the new {violations, approvedNames} return shape.

See path-validation-test-plan.md for the full test-coverage description.

Out of scope

  • Replacing system tar for extraction (see above).
  • Entry-count / total-bytes caps (cache backend enforces archive size at upload).
  • Save-path validation (restore-only here).

actions/cache update

actions/cache#1761 is a related PR to actually use this path validation when restoring from cache. The path validation mode will be a new input to the action; initially the default will be 'warn', then we can consider changing the default to 'error' in a major version update.

@jasongin jasongin requested a review from a team as a code owner May 20, 2026 19:33
Copilot AI review requested due to automatic review settings May 20, 2026 19:33
Comment thread packages/cache/src/internal/pathValidation.ts Dismissed
Comment thread packages/cache/src/internal/pathValidation.ts Fixed

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in integrity gate to @actions/cache restore that validates tar archive entry paths (and link targets) against the user-declared cache paths prior to extraction, surfacing violations as warnings or as a hard CacheIntegrityError depending on mode.

Changes:

  • Introduces DownloadOptions.pathValidation ('off' | 'warn' | 'error') and plumbs it through restoreCache*() to extractTar().
  • Adds an in-process archive listing/validation pass (tar Parser + custom path/link validation) and a new CacheIntegrityError for parse/violation failures.
  • Updates package metadata (version bump, tar dependency, Node engine) and adds comprehensive unit/integration tests + test plan doc.
Show a summary per file
File Description
packages/cache/src/options.ts Adds pathValidation option with defaults and validation in getDownloadOptions().
packages/cache/src/internal/tar.ts Runs optional pre-extraction validation and reports/throws based on mode.
packages/cache/src/internal/pathValidation.ts Implements allowed-root derivation, entry/link validation, and violation formatting.
packages/cache/src/internal/listAndValidate.ts Streams archives through tar Parser (plus zstd decompression) to collect violations.
packages/cache/src/internal/cacheIntegrityError.ts Defines CacheIntegrityError + error codes for integrity failures.
packages/cache/src/cache.ts Exposes CacheIntegrityError publicly and forwards pathValidation/paths into extraction.
packages/cache/package.json Bumps version, adds tar dep, adds engines.node, adjusts exports.
packages/cache/package-lock.json Updates lockfile for new version and dependency graph.
packages/cache/docs/path-validation-test-plan.md Documents test strategy/coverage for the new feature.
packages/cache/tests/tarPathValidation.test.ts Adds integration tests asserting extract behavior across modes.
packages/cache/tests/restoreCache.test.ts Updates assertions for new extractTar argument plumbing (v1).
packages/cache/tests/restoreCacheV2.test.ts Updates assertions for new extractTar argument plumbing (v2).
packages/cache/tests/pathValidation.test.ts Adds extensive unit tests for root derivation + entry/link validation.
packages/cache/tests/options.test.ts Validates new default/override behavior for pathValidation.
packages/cache/tests/listAndValidate.test.ts Adds real-archive tests for gzip/zstd parsing + validation behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files not reviewed (1)
  • packages/cache/package-lock.json: Language not supported
  • Files reviewed: 14/15 changed files
  • Comments generated: 3

Comment thread packages/cache/src/internal/pathValidation.ts Outdated
Comment thread packages/cache/src/internal/pathValidation.ts
Comment thread packages/cache/docs/path-validation-test-plan.md Outdated
* - 'error': surfaced as `CacheIntegrityError` with `code: 'PARSE_ERROR'`
* and extraction does not run.
*
* @default 'off'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a security bug fix.

I'm wondering if we should default to error. We could major-version the package if we're worried about a breaking change.

@jasongin jasongin May 20, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are consumers of this package other than actions/cache (primarily actions/setup-*), so I didn't want to risk impacting them automatically. Even with a major version update, they might not realize the implications of taking the new major version. So I think it's safer to require consumers to opt in explicitly. Or at best 'warn' could be the default with a major version update.

The linked PR actions/cache#1761 defaults to warn and I proposed updating the default to error in a major version update of the action (not the toolkit package). We should consider similar updates to actions/setup-* repos.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with a major version update, they might not realize the implications of taking the new major version

Breaking changes is what semver communicates.

We should have safe defaults. Is there a legitimate scenario where an archive entry should escape the declared cache paths?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was a simple validation then I might agree. But given the complexity of the implementation, I think there is some risk of edge cases with certain platforms or archive contents. So I'm proposing to leave the validation disabled by default (opt-in) until we have more data proving it is solid in many real-world scenarios.

Comment thread packages/cache/src/internal/pathValidation.ts Fixed
Comment thread packages/cache/src/internal/pathValidation.ts Fixed
@jasongin jasongin requested review from a team as code owners May 20, 2026 21:44
Comment thread packages/cache/src/internal/pathValidation.ts Fixed
Comment thread packages/cache/src/internal/pathValidation.ts Dismissed
@jasongin

Copy link
Copy Markdown
Contributor Author

I am considering switching to path validation using system tar, which would avoid some kinds of attacks that could bypass node-tar's path validation. However there's a compatibility challenge: BSD tar (macOS without gtar, Windows tar.exe, some self-hosted runners) lacks sufficient options to list files with quoted/escaped paths and full link targets. We could consider requiring GNU tar - it is already installed on hosted runners but would be a compatibility problem for some self-hosted runners.

jasongin and others added 8 commits June 17, 2026 14:39
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- Added a new test suite for parser-differential bypass detection in tar archives.
- Introduced length-correct PAX extended-header re-parsing to catch discrepancies between node-tar and system tar.
- Enhanced the listAndValidate function to return approved names alongside violations for better extraction control.
- Implemented checks for unsafe characters and glob metacharacters in entry paths.
- Updated the tar extraction logic to utilize an allow-list for approved entries when path validation is in error mode.
- Added utility functions for writing temporary allow-list files for system tar extraction.
@jasongin jasongin force-pushed the jasongin/cache-path-validation branch from dae5257 to 6e8379e Compare June 19, 2026 19:19
@jasongin

Copy link
Copy Markdown
Contributor Author

I pushed a major update that implements a revised approach to path validation. The main changes are the PAX header defense and use of an allow-list generated by node-tar passed to system-tar. They are explained in the updated PR description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants