Scan compressed pickle artifacts#345
Conversation
|
Self-review notes:
|
| decompress = COMPRESSED_PICKLE_SUFFIXES.get(model.get_source().suffix) | ||
| if decompress is not None: | ||
| try: | ||
| stream = io.BytesIO(decompress(stream.read())) |
There was a problem hiding this comment.
Self-review: decompression is intentionally placed before _list_globals() so the existing unsafe-global detection remains unchanged. This keeps the patch small, but very large compressed artifacts could justify a streaming decompression follow-up.
| """Disassemble a Pickle stream and report issues""" | ||
| issues: List[Issue] = [] | ||
| stream = model.get_stream(offset) | ||
| decompress = COMPRESSED_PICKLE_SUFFIXES.get(model.get_source().suffix) |
There was a problem hiding this comment.
Self-review: this uses the outer compound extension to decide whether to decompress before pickle opcode scanning. That directly covers valid joblib.dump(..., compress=...) artifacts such as .joblib.gz; the tradeoff is that compression policy stays extension-based rather than magic-byte based.
| ), | ||
| ), | ||
| } | ||
| results = compressed_joblib.scan(Path(f"{file_path}/data/malicious16.joblib.gz")) |
There was a problem hiding this comment.
Self-review: this regression asserts the formerly skipped .joblib.gz artifact now goes through the pickle scanner and reports the embedded posix.system payload as CRITICAL, matching the reported scanner-bypass path.
Summary
.joblib.gz,.pkl.xz, and.dill.bz2Path.suffix.joblib.gzscanningTests
uv run --with-editable . --with pytest --with dill --with requests --with aiohttp --with torch --with tf-keras pytest tests/test_modelscan.py::test_scan_file_path tests/test_modelscan.py::test_scan_numpyuv run --with black black --check modelscan/middlewares/format_via_extension.py modelscan/modelscan.py modelscan/settings.py modelscan/tools/picklescanner.py tests/test_modelscan.py.joblib.gzand object-array.npysamples are reported as CRITICAL with this branch