Skip to content

fix(detector): detect input reduction fake output#253

Open
prasannakotyal wants to merge 6 commits intoSinatrasC:mainfrom
prasannakotyal:kg-blue-input-reduction-red-252
Open

fix(detector): detect input reduction fake output#253
prasannakotyal wants to merge 6 commits intoSinatrasC:mainfrom
prasannakotyal:kg-blue-input-reduction-red-252

Conversation

@prasannakotyal
Copy link
Copy Markdown

Summary

  • add INPUT_REDUCTION_OUTPUT for fake outputs that directly return torch.any/all(input)
  • scope it to entrypoint bodies that contain only imports/docstrings, one optional local alias, and the reduction return
  • register the pattern as a hard auto-filter rule under the fake-output family

Target

KernelGuard-Red-Submission: 252

Validation

  • UV_CACHE_DIR=/tmp/uvcache uv run python -m py_compile kernelguard.py
  • direct return torch.any(d) sample: classification=hacked, should_filter=true, pattern INPUT_REDUCTION_OUTPUT
  • UV_CACHE_DIR=/tmp/uvcache uv run python ../../kernelguard_bypasses/eval_blue_patch.py kernelguard.py clean fixtures remain should_filter=False
  • Nonlocal input-float detector passed official eval on PR fix(detector): detect nonlocal input float passthrough #252 with TP 20/20, FP 20/20, surgicalness 1.0

@prasannakotyal prasannakotyal temporarily deployed to kernelguard-api-control-plane May 5, 2026 14:42 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

KernelGuard Blue Evaluation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant