feat: @actions/cache optional path validation during restore#2414
feat: @actions/cache optional path validation during restore#2414jasongin wants to merge 9 commits into
@actions/cache optional path validation during restore#2414Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in integrity gate to @actions/cache restore that validates tar archive entry paths (and link targets) against the user-declared cache paths prior to extraction, surfacing violations as warnings or as a hard CacheIntegrityError depending on mode.
Changes:
- Introduces
DownloadOptions.pathValidation('off' | 'warn' | 'error') and plumbs it throughrestoreCache*()toextractTar(). - Adds an in-process archive listing/validation pass (
tarParser + custom path/link validation) and a newCacheIntegrityErrorfor parse/violation failures. - Updates package metadata (version bump,
tardependency, Node engine) and adds comprehensive unit/integration tests + test plan doc.
Show a summary per file
| File | Description |
|---|---|
| packages/cache/src/options.ts | Adds pathValidation option with defaults and validation in getDownloadOptions(). |
| packages/cache/src/internal/tar.ts | Runs optional pre-extraction validation and reports/throws based on mode. |
| packages/cache/src/internal/pathValidation.ts | Implements allowed-root derivation, entry/link validation, and violation formatting. |
| packages/cache/src/internal/listAndValidate.ts | Streams archives through tar Parser (plus zstd decompression) to collect violations. |
| packages/cache/src/internal/cacheIntegrityError.ts | Defines CacheIntegrityError + error codes for integrity failures. |
| packages/cache/src/cache.ts | Exposes CacheIntegrityError publicly and forwards pathValidation/paths into extraction. |
| packages/cache/package.json | Bumps version, adds tar dep, adds engines.node, adjusts exports. |
| packages/cache/package-lock.json | Updates lockfile for new version and dependency graph. |
| packages/cache/docs/path-validation-test-plan.md | Documents test strategy/coverage for the new feature. |
| packages/cache/tests/tarPathValidation.test.ts | Adds integration tests asserting extract behavior across modes. |
| packages/cache/tests/restoreCache.test.ts | Updates assertions for new extractTar argument plumbing (v1). |
| packages/cache/tests/restoreCacheV2.test.ts | Updates assertions for new extractTar argument plumbing (v2). |
| packages/cache/tests/pathValidation.test.ts | Adds extensive unit tests for root derivation + entry/link validation. |
| packages/cache/tests/options.test.ts | Validates new default/override behavior for pathValidation. |
| packages/cache/tests/listAndValidate.test.ts | Adds real-archive tests for gzip/zstd parsing + validation behavior. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Files not reviewed (1)
- packages/cache/package-lock.json: Language not supported
- Files reviewed: 14/15 changed files
- Comments generated: 3
| * - 'error': surfaced as `CacheIntegrityError` with `code: 'PARSE_ERROR'` | ||
| * and extraction does not run. | ||
| * | ||
| * @default 'off' |
There was a problem hiding this comment.
This feels like a security bug fix.
I'm wondering if we should default to error. We could major-version the package if we're worried about a breaking change.
There was a problem hiding this comment.
There are consumers of this package other than actions/cache (primarily actions/setup-*), so I didn't want to risk impacting them automatically. Even with a major version update, they might not realize the implications of taking the new major version. So I think it's safer to require consumers to opt in explicitly. Or at best 'warn' could be the default with a major version update.
The linked PR actions/cache#1761 defaults to warn and I proposed updating the default to error in a major version update of the action (not the toolkit package). We should consider similar updates to actions/setup-* repos.
There was a problem hiding this comment.
Even with a major version update, they might not realize the implications of taking the new major version
Breaking changes is what semver communicates.
We should have safe defaults. Is there a legitimate scenario where an archive entry should escape the declared cache paths?
There was a problem hiding this comment.
If this was a simple validation then I might agree. But given the complexity of the implementation, I think there is some risk of edge cases with certain platforms or archive contents. So I'm proposing to leave the validation disabled by default (opt-in) until we have more data proving it is solid in many real-world scenarios.
|
I am considering switching to path validation using system tar, which would avoid some kinds of attacks that could bypass node-tar's path validation. However there's a compatibility challenge: BSD tar (macOS without gtar, Windows tar.exe, some self-hosted runners) lacks sufficient options to list files with quoted/escaped paths and full link targets. We could consider requiring GNU tar - it is already installed on hosted runners but would be a compatibility problem for some self-hosted runners. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- Added a new test suite for parser-differential bypass detection in tar archives. - Introduced length-correct PAX extended-header re-parsing to catch discrepancies between node-tar and system tar. - Enhanced the listAndValidate function to return approved names alongside violations for better extraction control. - Implemented checks for unsafe characters and glob metacharacters in entry paths. - Updated the tar extraction logic to utilize an allow-list for approved entries when path validation is in error mode. - Added utility functions for writing temporary allow-list files for system tar extraction.
dae5257 to
6e8379e
Compare
|
I pushed a major update that implements a revised approach to path validation. The main changes are the PAX header defense and use of an allow-list generated by node-tar passed to system-tar. They are explained in the updated PR description. |
Optional path validation during cache restore
New
restoreCache()optionA new opt-in
DownloadOptions.pathValidationon@actions/cachestreams a downloaded cache archive through an in-process validator before handing it to systemtarfor extraction. Each entry's path (and, for symlinks/hardlinks, its target) must resolve under one of the declared cachepaths— anything that escapes (../..., absolute, UNC, drive-relative, NUL bytes, unsupported entry types) is caught by the validator.'off'— legacy behavior, validation is skipped.'warn'— validate; on violations, log viacore.warning/core.debugand extract anyway (legacy extraction, no allow-list).'error'— validate; on violations throwCacheIntegrityError(code: 'PATH_VIOLATION', withviolationsattached). No files are extracted. On a clean archive, extraction is additionally restricted to exactly the members the validator approved (see below).Architecture: node-tar for listing, system
tarfor extraction — made sound with an approved-member allow-listThis is the biggest design call in the PR, and the area most changed in the latest revision.
Listing/validation uses node-tar v7's
Parserin-process.tar -tvtext output that varies by GNU/BSD/Windows.TAR_BAD_ARCHIVE/TAR_ENTRY_INVALID/TAR_ENTRY_ERROR/TAR_ABORTas discrete codes we promote to parse errors.zstdbinary for the same--long=30window behavior the extract path uses.Extraction stays on system
tar.tar -P(--absolute-names) option must be used when extracting cache archives, because cache paths frequently live outside$GITHUB_WORKSPACE(~/.npm,~/.cargo,~/.gradle/caches, etc.).cacheUtils.resolvePathsrewrites every path relative to the workspace, so anything outside becomes../../..., which GNU tar refuses without-P, and node-tar's extractor refuses by default — swapping it in wholesale would break essentially every common dependency cache.--force-local,--delay-directory-restore,zstdmt/zstd -T0, the BSD-tar-on-Windows-with-zstd workaround). Re-implementing all of that on node-tar's writer is a much larger, riskier change.Closing the listing↔extraction gap (new). The original design relied on an informal argument that "anything the validator approves, system tar also accepts." That argument is weak against path-channel parser differentials: a crafted entry (notably via PAX extended headers) can make node-tar compute a safe-looking path while system
tarplaces the bytes somewhere else entirely. Rather than move validation onto systemtar— which would require GNU tar everywhere and break BSD-tar / Windowstar.exe/ many self-hosted runners — this revision keeps node-tar for listing and instead constrains the extractor to the validator's approved set:'error'mode with a clean archive,listAndValidatenow returns the exact member names node-tar derived (approvedNames) alongside the violations.0o600, inos.tmpdir(), randomized name) and passed to systemtarvia--null --no-recursion -T <file>.--nullMUST precede-Ton GNU tar.--no-recursionprevents an approved directory from implicitly pulling in unapproved children.--no-wildcards --no-wildcards-match-slash --anchoredas defense in depth; bsdtar lacks these, but glob metacharacters are already rejected during validation, so itsfnmatch()-based-Tmatching degrades to exact-name matching.finally.The effect: a member that system
tarwould extract to a different (escaped) path than node-tar computed simply isn't on the list and is never extracted. The parser differential becomes a no-op instead of a trusted assumption.'warn'mode deliberately does not use the allow-list — on any violation it falls back to extracting everything, matching legacy behavior.Performance tradeoff. Validation is a full pre-pass over the archive: decompress + parse headers (entry bodies are drained, never written to disk), then system tar decompresses a second time to extract. Roughly a 2× decompress cost on the archive stream, no extra disk I/O at the destination. The extra pass — and the allow-list — are skipped entirely when validation is
'off'.Parser-differential & PAX header defense (new)
Beyond the allow-list, this revision hardens the validator itself against attacks that exploit the gap between node-tar's parsing and that of real extractors:
split('\n')(its own source notesXXX Values with \n in them will fail this). A PAX record is actually length-prefixed ("<len> <key>=<value>\n"), and GNU tar / libarchive consume exactly<len>bytes — so a value containing an embedded"\n<len> path=<other>\n"desynchronizes node-tar from every real extractor. The re-parser re-reads each captured PAX body byte-accurately and cross-checkspath/linkpathagainst node-tar's view. Disagreement →PAX_DESYNC; a body node-tar treated as PAX that the re-parser can't fully account for →PAX_PARSE_FAIL.SCHILY./GNU./LIBARCHIVE./POSIX.namespaces) →PAX_UNKNOWN_KEY, so a future placement-affecting PAX extension can't silently slip past.UNSUPPORTED_TYPEvia anignoredEntrylistener, closing a class of "the validator never saw it, but tar might" bypasses.\n/\0→UNSAFE_CHAR; paths containing* ? [ ]→GLOB_METACHAR. The glob check is also what lets the bsdtar-Tallow-list degrade safely to exact-name matching.maxMetaEntrySizeof 1 MiB,MAX_PENDING_METAcap) so a malicious archive can't exhaust memory via the validation pass.Other design notes
'quarantine').paths.deriveAllowedRootsmirrorscacheUtils.resolvePaths: tilde / env-var expansion, glob-prefix stripping, etc.prepareAllowedRoots()pre-bakes per-root normalize/lowercase/trailing-sep forms once per archive, dropping the per-entry hot loop from O(N · R) to O(N + R)..catch(() => undefined)on the pipeline / exit promises to suppress late rejections,try/finallySIGTERMif the child is still running on early exit.Violation codes
ABSOLUTE_PATH,UNC_PATH,NUL_BYTE,OUTSIDE_ROOTS,LINK_OUTSIDE_ROOTS,UNSUPPORTED_TYPE,PAX_DESYNC,PAX_PARSE_FAIL,PAX_UNKNOWN_KEY,UNSAFE_CHAR,GLOB_METACHAR.Known limitations
caseInsensitiveFs()) is a platform-level guess. On case-sensitive APFS or per-dir-case-sensitive NTFS it loosens (fewer violations) rather than tightens.-Tdefense-in-depth flags (--no-wildcardsetc.) are GNU-only; on BSD/Windows tar the allow-list relies on the validator's prior rejection of glob metacharacters to keep-Tmatching exact.Tests
All path-validation suites pass: 176 passed, 3 skipped (zstd suites
describe.skipwhen nozstdbinary is onPATH). New/updated coverage in this revision:__tests__/pax-reparse.test.ts— unit tests for the length-correct PAX re-parser and cross-check (PAX_DESYNC/PAX_PARSE_FAIL/PAX_UNKNOWN_KEY), 100% line coverage of pax-reparse.ts.__tests__/tarPathValidationAttacks.test.ts— raw-tar attack builders covering the F1–F5 parser-differential bypass classes,UNSAFE_CHAR/GLOB_METACHAR/ clean-PAX / meta-flood cases, plus end-to-end extraction tests against the real systemtar.__tests__/tarPathValidation.test.ts— extended for the'error'-mode allow-list: asserts the--null --no-recursion -Targs, the NUL-separated file contents, and temp-file cleanup.__tests__/listAndValidate.test.ts— updated for the new{violations, approvedNames}return shape.See path-validation-test-plan.md for the full test-coverage description.
Out of scope
tarfor extraction (see above).actions/cacheupdateactions/cache#1761 is a related PR to actually use this path validation when restoring from cache. The path validation mode will be a new input to the action; initially the default will be
'warn', then we can consider changing the default to'error'in a major version update.