fix: stop suppressing mixed-case pickle modules by mldangelo · Pull Request #698 · promptfoo/modelaudit

mldangelo · 2026-03-13T22:13:56Z

Summary

replace the lowercase-only pickle module plausibility check with a valid-identifier check so mixed-case modules like PIL and attacker-chosen names like EvilPkg are no longer auto-suppressed
add scanner-level regressions for GLOBAL, STACK_GLOBAL, memoized STACK_GLOBAL, and later-stream mixed-case REDUCE targets
keep malformed names like foo..bar or whitespace-containing labels on the implausible path instead of widening to junk data strings

Testing

uv run ruff format modelaudit/ tests/
uv run ruff check --fix modelaudit/ tests/
uv run ruff check modelaudit/ tests/
uv run ruff format --check modelaudit/ tests/
uv run mypy modelaudit/
uv run pytest tests/scanners/test_pickle_scanner.py -q -k "plausible or mixed_case or import_only or PIL or later_stream"
uv run pytest -n auto -m "not slow and not integration" --maxfail=1

QA

on untouched main, scanning /tmp/modelaudit-plausibility-qa.vd7LZf/mixed_case_reduce.pkl (EvilPkg.thing via REDUCE) was clean
on this branch, the same payload reports Found REDUCE opcode with non-allowlisted global: EvilPkg.thing
scanning /tmp/modelaudit-plausibility-qa.vd7LZf/malformed_reduce.pkl (foo..bar) stays clean on this branch, preserving the malformed-name guardrail

Scope note

mixed-case dangerous builtins such as Builtins.eval were already detected before this change
broader import-only coverage for non-dangerous mixed-case unknown modules remains separate work from the import-only findings task

Summary by CodeRabbit

Bug Fixes
- Improved module-name validation in pickle scanning to accept valid mixed-case Python module identifiers (e.g., PIL), preventing incorrect bypasses of import/reduce checks.
Behavior Changes
- Streaming scan result semantics adjusted so operational error flags align with operation success, keeping critical findings recorded without marking the operation as failed.
Tests
- Expanded test coverage for many pickle opcode scenarios and streaming scan exit-code/behavior cases.

coderabbitai · 2026-03-13T22:14:15Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Replaces per-segment lowercase/regex checks for pickle module-name plausibility with per-segment Python identifier validation (allowing mixed-case names like PIL); updates CHANGELOG security bullet; adjusts streaming scan has_errors derivation to use scan_result.success; expands tests for pickle payloads, streaming semantics, and CLI streaming exit behavior.

Changes

Cohort / File(s)	Summary
Changelog `CHANGELOG.md`	Replaced ONNX external_data CVE bullet with a new security note: mixed-case valid pickle module names are no longer treated as implausible, so import/REDUCE checks won't bypass on names like `PIL` or attacker-chosen `EvilPkg`.
Pickle scanner `modelaudit/scanners/pickle_scanner.py`	Reworked import-plausibility logic: removed regex/lowercasing and `re` usage; each dotted segment now validated with `str.isidentifier()` and a short case-sensitive allowlist introduced. `_is_plausible_import_only_module` updated to reuse the new check.
Streaming/core aggregation `modelaudit/core.py`, `modelaudit/models.py`	Streaming aggregate logic altered: `has_errors` is now derived from `scan_result.success` (set when `success` is False) rather than `scan_result.has_errors`, aligning operational error flag with scan success.
Tests — pickle scanner `tests/scanners/test_pickle_scanner.py`	Imported `_is_plausible_python_module`; added helpers to craft protocol-4 payloads and expanded tests covering mixed-case modules, GLOBAL/STACK_GLOBAL/REDUCE (including memoized variants), import-only cases, and malformed payloads.
Tests — streaming & CLI `tests/test_streaming_scan.py`, `tests/test_cli.py`	Added tests asserting streaming semantics: critical findings don't set operational errors (success remains True) but record failed checks; added CLI streaming test for local directory exit-code and file cleanup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I nibble bytes beneath the moon,
Mixed-case names now hum a tune.
PIL and friends hop in pursuit,
Identifiers keep paths astute.
A tiny hop — the scanner's root. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 64.52% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the primary change: stopping the suppression of mixed-case pickle modules. It is directly related to the main objective of replacing the lowercase-only validation with a valid-identifier check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/pickle-module-plausibility

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 721-731: The test test_pil_global_reduce_is_not_suppressed should
not depend on CheckStatus.FAILED; instead assert that a REDUCE Opcode Safety
Check mentions "PIL.Image" (regardless of status) and still assert that no
message contains "implausible module name 'PIL'". Update the first assertion to:
assert any("PIL.Image" in c.message for c in reduce_checks) and keep the second
assertion as-is to ensure PIL is not classified implausible; reference the
test_pil_global_reduce_is_not_suppressed function and the "REDUCE Opcode Safety
Check" checks when making the change.
- Around line 751-758: The test
test_mixed_case_import_only_payload_still_flags_import currently only checks
that a mixed-case dangerous global ("Builtins.eval") is flagged, which doesn't
exercise the new mixed-case plausibility logic; update the test to add a second
payload using _craft_global_import_only_pickle that represents the benign
mixed-case counterpart (e.g., a name that should not be considered the dangerous
global under the new rules) and assert that no suspicious import issue is
present for that payload, so the test covers both the flagged and non-flagged
mixed-case paths and prevents regressions in the mixed-case rules.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b0a6fa3b-73b8-4cc1-a9c3-272601d67dcc

📥 Commits

Reviewing files that changed from the base of the PR and between 698859d and c311b25.

📒 Files selected for processing (3)

CHANGELOG.md
modelaudit/scanners/pickle_scanner.py
tests/scanners/test_pickle_scanner.py

tests/scanners/test_pickle_scanner.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 864-875: The test
test_mixed_case_unknown_import_only_remains_non_failing currently only asserts
there is no FAILED "Global Module Reference Check" for "EvilPkg.thing"; tighten
it to also assert there are no WARNING or CRITICAL statuses for any check whose
name == "Global Module Reference Check" and
check.details.get("import_reference") == "EvilPkg.thing". Update the assertion
over result.checks (or add a new one) to ensure no check for that
import_reference has status in {CheckStatus.FAILED, CheckStatus.WARNING,
CheckStatus.CRITICAL}, referencing the result variable and
check.name/check.status/check.details keys used in the test.
- Around line 753-781: The two helpers _craft_global_import_only_pickle and
_craft_global_only_pickle produce identical GLOBAL-only payloads; consolidate by
having one delegate to the other to avoid duplication and future drift (e.g.,
implement _craft_global_import_only_pickle to simply return
_craft_global_only_pickle(module, func) or vice versa), keep the public
signatures and docstring intact, and ensure any callers still receive bytes as
before.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c4b9f6b2-4053-4dfd-a6a2-5078630bf651

📥 Commits

Reviewing files that changed from the base of the PR and between c311b25 and 549be18.

📒 Files selected for processing (3)

CHANGELOG.md
modelaudit/scanners/pickle_scanner.py
tests/scanners/test_pickle_scanner.py

tests/scanners/test_pickle_scanner.py

…ausibility # Conflicts: # CHANGELOG.md # tests/scanners/test_pickle_scanner.py

… into feat/pickle-module-plausibility # Conflicts: # CHANGELOG.md # tests/scanners/test_pickle_scanner.py

coderabbitai

♻️ Duplicate comments (2)

tests/scanners/test_pickle_scanner.py (2)

783-787: 🧹 Nitpick | 🔵 Trivial

Consolidate duplicate GLOBAL-only payload helpers.

_craft_global_import_only_pickle and _craft_global_only_pickle currently emit identical bytes. Delegate one to the other to avoid future drift.

♻️ Proposed refactor

 `@staticmethod`
 def _craft_global_import_only_pickle(module: str, func: str) -> bytes:
     """Craft minimal pickle that only imports a GLOBAL and stops."""
-
-    return b"\x80\x02" + b"c" + f"{module}\n{func}\n".encode() + b"."
+    return TestPickleScannerBlocklistHardening._craft_global_only_pickle(module, func)

Also applies to: 789-792

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_pickle_scanner.py` around lines 783 - 787, The two helper
functions _craft_global_import_only_pickle and _craft_global_only_pickle emit
identical bytes; refactor by delegating one to the other to avoid duplication
and future drift: pick one implementation (e.g., keep _craft_global_only_pickle
as the canonical builder) and change the other (e.g.,
_craft_global_import_only_pickle) to call and return that function with the same
parameters (module, func), ensuring both names remain available for tests that
reference either helper.

975-985: ⚠️ Potential issue | 🟡 Minor

Decouple the PIL plausibility regression from current allowlist status.

This test is about “not implausible anymore”, but it currently hard-requires CheckStatus.FAILED. If PIL.Image is later allowlisted, this regression would fail for the wrong reason.

🧪 Suggested assertion adjustment

         reduce_checks = [c for c in result.checks if c.name == "REDUCE Opcode Safety Check"]
-        assert any(c.status == CheckStatus.FAILED and "PIL.Image" in c.message for c in reduce_checks), (
-            f"Expected failed REDUCE check for PIL.Image, got: {[c.message for c in reduce_checks]}"
+        assert any("PIL.Image" in c.message for c in reduce_checks), (
+            f"Expected REDUCE analysis to resolve PIL.Image, got: {[c.message for c in reduce_checks]}"
         )
         assert not any("implausible module name 'PIL'" in c.message for c in reduce_checks), (
             "PIL should no longer be classified as an implausible module"
         )

Based on learnings: Preserve or strengthen security detections; test both benign and malicious samples when adding scanner/feature changes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_pickle_scanner.py` around lines 975 - 985, Update the
test_pil_global_reduce_is_not_suppressed to stop hard-requiring
CheckStatus.FAILED for the REDUCE Opcode Safety Check: locate the reduce_checks
filtered by name "REDUCE Opcode Safety Check" and assert that at least one
check.message references "PIL.Image" (regardless of check.status), and
separately assert that none of the reduce_checks contain the "implausible module
name 'PIL'" text; if desired, also keep an optional assertion that if any
check.status == CheckStatus.FAILED then its message contains "PIL.Image" to
preserve the original failure expectation without coupling to allowlist changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/scanners/test_pickle_scanner.py`:
- Around line 783-787: The two helper functions _craft_global_import_only_pickle
and _craft_global_only_pickle emit identical bytes; refactor by delegating one
to the other to avoid duplication and future drift: pick one implementation
(e.g., keep _craft_global_only_pickle as the canonical builder) and change the
other (e.g., _craft_global_import_only_pickle) to call and return that function
with the same parameters (module, func), ensuring both names remain available
for tests that reference either helper.
- Around line 975-985: Update the test_pil_global_reduce_is_not_suppressed to
stop hard-requiring CheckStatus.FAILED for the REDUCE Opcode Safety Check:
locate the reduce_checks filtered by name "REDUCE Opcode Safety Check" and
assert that at least one check.message references "PIL.Image" (regardless of
check.status), and separately assert that none of the reduce_checks contain the
"implausible module name 'PIL'" text; if desired, also keep an optional
assertion that if any check.status == CheckStatus.FAILED then its message
contains "PIL.Image" to preserve the original failure expectation without
coupling to allowlist changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3c09abff-f86f-4458-aca5-8934906eba7e

📥 Commits

Reviewing files that changed from the base of the PR and between 09ddc2f and 7c699d9.

📒 Files selected for processing (1)

tests/scanners/test_pickle_scanner.py

…ausibility

Keep mixed-case module names like PIL analyzable without reintroducing uppercase data-label false positives such as PEDRA_2020. Also preserve streaming security exit codes by separating operational errors from security findings.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/scanners/pickle_scanner.py`:
- Line 1906: Add a type annotation to the new constant
_CASE_SENSITIVE_IMPORT_SEGMENTS so its type is explicit (e.g.
_CASE_SENSITIVE_IMPORT_SEGMENTS: FrozenSet[str] = frozenset({"PIL", "Cython"}));
import FrozenSet from typing if not already present or use built-in
frozenset[str] for Python 3.9+ to satisfy the coding guideline requiring type
hints.

In `@tests/scanners/test_pickle_scanner.py`:
- Around line 988-1004: Update the two tests
test_mixed_case_stack_global_reduce_is_not_suppressed and
test_mixed_case_memoized_stack_global_reduce_is_not_suppressed to also assert
that the REDUCE failure message does not include the implausible-module text;
specifically, after locating reduce_checks and asserting a failed REDUCE for
"EvilPkg.thing", add an assertion that no reduce_check.message contains
"implausible module name 'EvilPkg'" (i.e., ensure the implausible-module warning
is absent while the REDUCE failure is present).

In `@tests/test_cli.py`:
- Around line 1423-1438: Update the
test_exit_code_security_issues_streaming_local_directory function to add type
hints (tmp_path: Path and return -> None) and assert a concrete malicious
finding in the CLI output instead of relying solely on exit code; specifically,
after invoking cli(["scan", "--stream", str(streamed_dir)]) check result.output
(or result.stdout) contains a known indicator such as the malicious check name
or exact issue message present in malicious_system_call.pkl (reference the test
name test_exit_code_security_issues_streaming_local_directory and the sample
file malicious_system_call.pkl) and keep the existing exit_code and streamed_dir
deletion assertion.

In `@tests/test_streaming_scan.py`:
- Around line 209-238: The test
test_scan_model_streaming_critical_findings_do_not_set_operational_errors should
assert the exact malicious finding instead of using the broad failed_checks >=
1; update it to check the ScanResult payload returned from scan_model_streaming
(e.g., inspect result.failed_checks/details or result.checks) to assert the
check name "Dangerous Pickle Check", severity IssueSeverity.CRITICAL, message
"Detected malicious payload", and location matching temp_test_files[0], and keep
the original success/has_errors assertions; also add a type annotation to the
test signature (e.g., def
test_scan_model_streaming_critical_findings_do_not_set_operational_errors(temp_test_files:
Sequence[Path]) -> None:) so the test is properly typed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 866f94f8-744b-4ca2-a327-d5e959f9bad6

📥 Commits

Reviewing files that changed from the base of the PR and between 7c699d9 and 22b88aa.

📒 Files selected for processing (6)

modelaudit/core.py
modelaudit/models.py
modelaudit/scanners/pickle_scanner.py
tests/scanners/test_pickle_scanner.py
tests/test_cli.py
tests/test_streaming_scan.py

modelaudit/scanners/pickle_scanner.py

tests/scanners/test_pickle_scanner.py

tests/test_cli.py

tests/test_streaming_scan.py

…ausibility # Conflicts: # modelaudit/core.py # tests/test_cli.py # tests/test_streaming_scan.py

…ausibility

# Conflicts: # CHANGELOG.md

fix: stop suppressing mixed-case pickle modules

c311b25

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

tests/scanners/test_pickle_scanner.py Show resolved Hide resolved

tests/scanners/test_pickle_scanner.py Show resolved Hide resolved

merge: bring main into mixed-case pickle plausibility branch

549be18

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

tests/scanners/test_pickle_scanner.py Show resolved Hide resolved

tests/scanners/test_pickle_scanner.py Outdated Show resolved Hide resolved

mldangelo added 3 commits March 14, 2026 18:26

merge: refresh mixed-case pickle plausibility branch with main

09ddc2f

Merge remote-tracking branch 'origin/main' into feat/pickle-module-pl…

9f8934a

…ausibility # Conflicts: # CHANGELOG.md # tests/scanners/test_pickle_scanner.py

Merge remote-tracking branch 'origin/feat/pickle-module-plausibility'…

7c699d9

… into feat/pickle-module-plausibility # Conflicts: # CHANGELOG.md # tests/scanners/test_pickle_scanner.py

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

mldangelo added 2 commits March 15, 2026 09:33

Merge remote-tracking branch 'origin/main' into feat/pickle-module-pl…

b364a2b

…ausibility

fix: restore pickle plausibility filtering

22b88aa

Keep mixed-case module names like PIL analyzable without reintroducing uppercase data-label false positives such as PEDRA_2020. Also preserve streaming security exit codes by separating operational errors from security findings.

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

modelaudit/scanners/pickle_scanner.py Outdated Show resolved Hide resolved

tests/scanners/test_pickle_scanner.py Show resolved Hide resolved

tests/test_cli.py Outdated Show resolved Hide resolved

tests/test_streaming_scan.py Outdated Show resolved Hide resolved

mldangelo added 9 commits March 15, 2026 11:37

Merge remote-tracking branch 'origin/main' into feat/pickle-module-pl…

35959e7

…ausibility # Conflicts: # modelaudit/core.py # tests/test_cli.py # tests/test_streaming_scan.py

test: fix merge fallout in streaming tests

4dccb4a

Merge remote-tracking branch 'origin/main' into feat/pickle-module-pl…

a47bc94

…ausibility

Merge remote-tracking branch 'origin/main' into feat/pickle-module-pl…

37f0fbe

…ausibility

Merge remote-tracking branch 'origin/main' into feat/pickle-module-pl…

58cb026

…ausibility

test: strengthen mixed-case pickle plausibility regressions

729df0d

test: make streaming CLI pickle assertion platform-safe

a7efd55

Merge branch 'main' into review-pr-698

01ba356

# Conflicts: # CHANGELOG.md

Merge remote-tracking branch 'origin/main' into audit-pr698-mainmerge

da444f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stop suppressing mixed-case pickle modules#698

fix: stop suppressing mixed-case pickle modules#698
mldangelo wants to merge 16 commits intomainfrom
feat/pickle-module-plausibility

mldangelo commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mldangelo commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

QA

Scope note

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mldangelo commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading