Skip to content

fix: accept valid ExecuTorch FlatBuffers binaries#715

Open
yash2998chhabria wants to merge 6 commits intomainfrom
fix/executorch-binary-validation
Open

fix: accept valid ExecuTorch FlatBuffers binaries#715
yash2998chhabria wants to merge 6 commits intomainfrom
fix/executorch-binary-validation

Conversation

@yash2998chhabria
Copy link
Contributor

@yash2998chhabria yash2998chhabria commented Mar 16, 2026

Summary

  • recognize valid ExecuTorch FlatBuffers binaries in .pte files instead of treating every non-ZIP file as an invalid archive
  • teach shared file-type detection to identify the ET12 ExecuTorch binary signature so valid public models do not emit S901 mismatch noise
  • add regression coverage for scanner handling, file-type validation, and changelog the false-positive reduction

Validation

  • /Users/yashchhabria/projects/modelauditing/modelaudit/.venv/bin/ruff format modelaudit/ tests/
  • /Users/yashchhabria/projects/modelauditing/modelaudit/.venv/bin/ruff check --fix modelaudit/ tests/
  • /Users/yashchhabria/projects/modelauditing/modelaudit/.venv/bin/mypy modelaudit/
  • /Users/yashchhabria/projects/modelauditing/modelaudit/.venv/bin/pytest -n auto -m "not slow and not integration" --maxfail=1
  • 10-model Hugging Face rerun in the worktree: executorch flagged 10/10 before the fix and 0/10 after the fix

Summary by CodeRabbit

  • Bug Fixes

    • Eliminated false positives for valid ExecuTorch FlatBuffers binaries by recognizing ExecuTorch binaries and short-circuiting archive checks.
    • Improved file-type validation for .pte model files to accept genuine ExecuTorch binaries alongside archive formats.
  • Tests

    • Added tests covering ExecuTorch binary detection, acceptance, and rejection scenarios.
    • Enabled ExecuTorch scanner tests to run in the test matrix.
  • Documentation

    • Added a changelog entry noting the fix.

Recognize valid ExecuTorch FlatBuffers programs in .pte files,
prevent file-type validation noise for those binaries, and add
regression coverage for scanner and detection helpers.

Co-Authored-By: Codex <noreply@openai.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 16, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Detects ExecuTorch FlatBuffers binaries by inspecting header bytes and adds an early-accept path in the ExecuTorch scanner. File-type detection and validation now recognize ExecuTorch binaries; tests and CHANGELOG updated accordingly.

Changes

Cohort / File(s) Summary
Changelog
CHANGELOG.md
Added an Unreleased "Fixed" entry documenting elimination of false positives for ExecuTorch FlatBuffers binaries and .pte file-type validation.
ExecuTorch scanner
modelaudit/scanners/executorch_scanner.py
Imported _is_valid_executorch_binary; scanner now reads an 8‑byte header, short-circuits to accept valid ExecuTorch binaries, records a passing "ExecuTorch Binary Format Validation", sets bytes_scanned to file size, finishes result, and returns early for valid binaries.
File detection utilities
modelaudit/utils/file/detection.py
Added _is_executorch_binary_signature and _is_valid_executorch_binary; detect_file_format_from_magic returns "executorch" for valid binaries; validate_file_type treats ExecuTorch binaries as valid even if not ZIP.
Tests & config
tests/conftest.py, tests/scanners/test_executorch_scanner.py, tests/utils/file/test_filetype.py
Enabled scanner tests in conftest; added helper to create minimal ExecuTorch binaries and tests for accepting versioned/valid headers and rejecting invalid signatures; extended filetype tests for detection and validation of ExecuTorch binaries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble bytes, four to eight I peep,
ET12 winks — the scanner's calm and deep.
No zip to chase, no false alarm in sight,
I hop, I mark, the binary sleeps tight.
Hooray — the pte is properly right! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: accept valid ExecuTorch FlatBuffers binaries' directly and accurately summarizes the main change: modifying the scanner to recognize and accept valid ExecuTorch FlatBuffers binaries instead of treating them as invalid archives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/executorch-binary-validation
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 84-87: Remove the duplicate "### Fixed" heading and merge its
following content ("eliminate false positives for valid ExecuTorch FlatBuffers
binaries and file-type validation on public `.pte` models") into the existing
"### Fixed" section above; specifically, delete the second "### Fixed" header
and append its bullet/text to the first "### Fixed" block so the changelog
follows the Keep a Changelog format and avoids the MD024 duplicate-heading lint
warning.

In `@modelaudit/scanners/executorch_scanner.py`:
- Around line 40-44: The function _is_executorch_binary currently checks only
header[4:6] against b"ET" which contradicts the comment ("bytes 4..7") and is
looser than the 4-byte signature used elsewhere; update _is_executorch_binary to
check header length >= 8 and compare header[4:8] to the full 4-byte signature
used in detection.py (refer to _is_executorch_binary_signature) — also update
the inline comment to say "bytes 4..7" or "bytes 4-7" to reflect the 4-byte
check.

In `@modelaudit/utils/file/detection.py`:
- Around line 192-195: The detection helpers are inconsistent:
_is_executorch_binary_signature(prefix) checks for a 4-byte identifier b"ET12"
while ExecuTorchScanner._is_executorch_binary(header) only checks header[4:6] ==
b"ET"; align them by using the same 4-byte check everywhere. Update
ExecuTorchScanner._is_executorch_binary to check header[4:8] == b"ET12" (or
conversely change _is_executorch_binary_signature to match the 2-byte check if
you prefer the looser rule) so both _is_executorch_binary_signature and
ExecuTorchScanner._is_executorch_binary use the identical byte-range and value
for detection.

In `@tests/scanners/test_executorch_scanner.py`:
- Around line 60-68: The test function
test_executorch_scanner_accepts_binary_program_header is missing a return type
hint; update its definition to include the standard test annotation "-> None"
(i.e., change the def signature for
test_executorch_scanner_accepts_binary_program_header to include the return type
hint) so it follows the project coding guidelines for tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 35ef307b-6091-42f6-b321-ac4cf2bdf625

📥 Commits

Reviewing files that changed from the base of the PR and between d9fe283 and da0125e.

📒 Files selected for processing (6)
  • CHANGELOG.md
  • modelaudit/scanners/executorch_scanner.py
  • modelaudit/utils/file/detection.py
  • tests/conftest.py
  • tests/scanners/test_executorch_scanner.py
  • tests/utils/file/test_filetype.py

Align ExecuTorch binary signature checks with the shared detector,
remove the duplicate changelog heading, and add the missing test
return annotation.

Co-Authored-By: Codex <noreply@openai.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/scanners/test_executorch_scanner.py (1)

60-60: 🧹 Nitpick | 🔵 Trivial

Add type hint for tmp_path parameter.

Per coding guidelines, test parameters should have type hints: tmp_path: Path.

Proposed fix
-def test_executorch_scanner_accepts_binary_program_header(tmp_path) -> None:
+def test_executorch_scanner_accepts_binary_program_header(tmp_path: Path) -> None:

As per coding guidelines: "Use type hints -> None on all test methods and tmp_path: Path / monkeypatch: pytest.MonkeyPatch on test parameters"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scanners/test_executorch_scanner.py` at line 60, Add a type hint for
the test parameter by changing the function signature of
test_executorch_scanner_accepts_binary_program_header to accept tmp_path: Path
(i.e., tmp_path: Path) while preserving the existing -> None return annotation;
also ensure Path is imported (from pathlib import Path) at the top of the test
module if it isn't already imported.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/scanners/test_executorch_scanner.py`:
- Line 60: Add a type hint for the test parameter by changing the function
signature of test_executorch_scanner_accepts_binary_program_header to accept
tmp_path: Path (i.e., tmp_path: Path) while preserving the existing -> None
return annotation; also ensure Path is imported (from pathlib import Path) at
the top of the test module if it isn't already imported.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 956755eb-cc0c-464a-9114-557c55ecd183

📥 Commits

Reviewing files that changed from the base of the PR and between da0125e and 6c5ba7b.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • modelaudit/scanners/executorch_scanner.py
  • tests/scanners/test_executorch_scanner.py

@yash2998chhabria
Copy link
Contributor Author

Testing Metadata Update

  • Timestamp: 2026-03-16T12:11:49.545724-07:00
  • Validation method: one external worktree per changed scanner, public Hugging Face models relevant to each scanner, target 10 models per scanner where available
  • Parallelism: 5 workers, chosen after resource monitoring on this machine (14 CPU, 24 GiB RAM, tight disk headroom)
  • Broad changed-scanner sweep status: 39/39 scanners completed
  • Pending broad-sweep scanner: none
  • Note: the broad table below is the baseline run on main before PR-specific fixes; PR-focused reruns are listed separately

Branch Validation

  • ruff format modelaudit/ tests/
  • ruff check --fix modelaudit/ tests/
  • mypy modelaudit/
  • pytest -n auto -m "not slow and not integration" --maxfail=1
Broad Hugging Face Sweep (baseline on main)
scanner discovered flagged clean
catboost 10 0 10
cntk 0 0 0
compressed 0 0 0
coreml 10 0 10
executorch 10 10 0
flax_msgpack 10 0 10
gguf 10 0 10
jax_checkpoint 2 0 2
joblib 10 3 7
keras_h5 10 1 9
keras_zip 10 7 3
lightgbm 10 0 10
llamafile 5 0 5
manifest 10 0 10
mxnet 0 0 0
numpy 10 1 9
oci_layer 0 0 0
onnx 10 0 10
openvino 10 0 10
paddle 3 0 3
pickle 10 10 0
pmml 0 0 0
pytorch_binary 10 0 10
pytorch_zip 10 4 6
r_serialized 1 1 0
rknn 10 1 9
safetensors 10 0 10
skops 4 0 4
tar 1 1 0
tensorrt 10 10 0
text 10 0 10
tf_metagraph 7 3 4
tf_savedmodel 10 9 1
tflite 10 1 9
torch7 0 0 0
torchserve_mar 0 0 0
weight_distribution 10 0 10
xgboost 10 0 10
zip 8 0 8

Focused Rerun For This PR

  • Scanner: executorch
  • Baseline on main: discovered=10, flagged=10, clean=0
  • Post-fix rerun in PR worktree: discovered=10, flagged=0, clean=10
  • Regression outcome: public ExecuTorch .pte binaries that previously failed as invalid archives now scan cleanly with no residual file-type mismatch noise

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelaudit/utils/file/detection.py`:
- Around line 1011-1013: The comment above the ExecuTorch format check is out of
date; update the comment that precedes the conditional that checks ext_format ==
"executorch" (the block that returns header_format == "zip" or
_is_valid_executorch_binary(path")) to state that ExecuTorch files may be either
ZIP archives or valid ExecuTorch FlatBuffers binaries (and not only ZIPs).
Locate the ext_format/header_format check and adjust the single-line comment to
reflect both accepted formats, referencing the function
_is_valid_executorch_binary and the variables ext_format, header_format, and
path.
- Around line 475-476: The code calls _is_valid_executorch_binary(file_path) for
every file which triggers extra I/O; instead, first perform the cheap in-memory
signature/magic-byte check (the existing lightweight check used elsewhere in
this module—e.g. the function/constant that verifies the ExecuTorch magic bytes)
and only if that quick check passes call _is_valid_executorch_binary(file_path);
update the current if block that returns "executorch" to gate the expensive
validation behind the in-memory signature check, avoiding open/stat/seek on
files that don't match the signature.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d5a46e68-0236-4681-bf6d-138cd46f3ce8

📥 Commits

Reviewing files that changed from the base of the PR and between 74a8bd6 and 73cf4a5.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • modelaudit/utils/file/detection.py
  • tests/utils/file/test_filetype.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants