fix(parse): prevent Zip Slip path traversal in _extract_zip (CWE-22) #89
+222
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
CodeRepositoryParser._extract_zip()callszipfile.extractall()without validating member paths. An attacker can craft a malicious zip containing entries like../../evil.txt,/etc/passwd,C:\evil.txt,..\..\evil.txt,\server\share\evil.txt, or symlink entries to write files outside the target directory during extraction (Zip Slip, CWE-22).This PR replaces
extractall()with per-member extraction and implements 5-layer defense-in-depth:info.is_dir()+stat.S_ISDIR) — directories are created implicitly when extracting their childrenstat.S_ISLNK) — prevents symlink-based escapes to external pathsValueErrorif raw path contains..components, is absolute, or has a drive letterzipfile._extract_member, then verify withPath.is_relative_to()ValueErrorif it escapesRelated Issue
No existing issue. Vulnerability discovered via code audit.
Type of Change
Changes Made
zip_ref.extractall(target_dir)with per-member safe extraction and 5-layer path validation (code.py+50/-2)import statandfrom pathlib import PurePosixPathtests/misc/test_extract_zip.py(11 regression tests)tests/README.mdwith missing entries for the misc/ test directoryTesting
11 test cases covering the following scenarios:
../traversal../../evil.txt/etc/passwdfoo/../../evil.txtC:\evil.txt..\..\evil.txt\server\share\evil.txtexternal_attrset toS_IFLNK..normalization./..(contains.., normalizes to empty)..rejected before normalization)mydir/All attack vectors verified as successfully blocked on Windows + Python 3.13.
Checklist
Additional Notes
Behavioral impact: No change for legitimate zip files.
extractall()is replaced with equivalent per-memberextract()calls (internally,extractallis just a loop overextract). The only new behavior is raisingValueErroron malicious entries.Performance impact: Negligible. All added checks are in-memory string operations (
split,replace,is_relative_to) with no additional I/O.Fail-fast design decision: The implementation raises
ValueErrorimmediately upon detecting a malicious entry, rather than skipping it and continuing. A zip containing malicious entries is considered untrustworthy as a whole — continuing to extract remaining members carries risk. Fail-fast ensures the caller is notified and can handle the exception.Python compatibility: The project requires Python 3.9+, and
zipfile's built-in../sanitization was only added in Python 3.12+. On Python 3.9–3.11,extractall()performs no path traversal protection at all, making this fix necessary. Even on Python 3.12+, this fix provides independent defense covering attack vectors thatzipfiledoes not handle (symlink entries, absolute paths, drive letter paths).