Claude/charming hamilton 2n l pc by hyperpolymath · Pull Request #310 · hyperpolymath/hypatia

hyperpolymath · 2026-05-23T23:26:19Z

No description provided.

Recipe matcher rejected every scorecard-source finding (~310 ecosystem- wide), routing them to :control "no safe fix available" advisories. Root cause: `lib/recipe_matcher.ex` filtered candidate recipes with `"*" in langs or language in langs`. Two failure modes: 1. 12 recipes declared `languages: ["any"]` — never matched, since `"any"` is not a sentinel the filter recognises and no repo has `"any"` as its primary language. 2. 8 scorecard / workflow-file recipes declared `languages: ["yaml"]` — never matched, since yaml is a workflow-file type, not any repo's primary language. So `recipe-pin-dependencies`, `recipe-fix-workflow-permissions`, etc. were unreachable for SC013/ SC018 findings — the exact rule families dominating the daily remediation sweep. Fix: - `langs_match?/2` private helper accepts `"*"` and `"any"` as synonymous language-agnostic sentinels. - `effective_language_for/2` remaps the lookup language to `"yaml"` for patterns whose `source` is `"scorecard"` or whose `category` names a known workflow-file rule family (DependencyPinning, TokenPermissions, DangerousWorkflow, etc.). The repo's primary language is irrelevant for workflow-file findings. - Applied to `best_recipe/2`, `category_match_recipe/2`, and `fuzzy_match_recipe/2`. Tests pin all three invariants. All 22 scorecard recipe `fix_script` references already exist on disk in `scripts/fix-scripts/` — the bug was purely in matcher reachability, not missing fix implementations. Closes the dispatcher half of the "no security stuff being sorted" symptom. Remaining M7 work (PAT for cross-repo dispatch, push fixes to remotes) still needs operator action, but the manifests will now carry populated fix_script fields for scorecard findings.

The baseline had drifted into pure historical risk: 71 accepted findings (31 critical, 40 high) generated before the #278 stale-escript fix and the wave of code_safety/security_errors cleanups landed. A fresh scan against the current tree finds 35 findings, all medium-or-lower: - 32 low (code_safety hot-path expects, ncl_docker_not_podman, workflow_audit missing-workflow, structural_drift, etc.) - 3 medium (git_state transient + structural_drift) - 0 critical, 0 high Most old baseline entries are either: - fixed in code (e.g. the believe_me at src/abi/RuleEngine.idr is now inline-suppressed with a documented `-- hypatia: allow` directive), - migrated/refactored (e.g. lib/direct_github_pr.ex no longer exists), - or were covered by the new total-Python-ban / scanner-soundness wave. Net effect: every gate threshold of "fail on critical|high above baseline" now starts from an empty critical/high ledger — net-new critical or high findings will stand out, which is what the baseline is supposed to enable. Generated with the canonical Elixir escript pipeline against this tree (no rule changes, just a snapshot refresh). Severity threshold "low" so the snapshot reflects the full advisory surface, not just gates.

The HYPATIA_DISPATCH_PAT was provisioned with read access to secret-scanning alerts, code-scanning alerts, and Dependabot alerts. Only Dependabot was actually being consumed (lib/rules/dependabot_alerts.ex, DA001-DA004) — the other two alert surfaces were granted but unused. Adds two new rule modules mirroring the DependabotAlerts shape: lib/rules/secret_scanning_alerts.ex (SSA001-SSA004) SSA001 — Open leaked-secret alerts (always :critical; staleness surfaced in the reason for triage prioritisation). SSA002 — Repo-level meta-finding when any open alert exists. SSA003 — Stale open alerts past the 7-day rotation threshold. SSA004 — Resolved alerts with no documented resolution vocabulary (anything outside revoked/used_in_tests/pattern_deleted/ pattern_edited). lib/rules/code_scanning_alerts.ex (CSA001-CSA004) CSA001 — Open code-scanning alerts (CodeQL + third-party SARIF including Hypatia's own `hypatia` category). Severity mapped from `security_severity_level`/`severity` onto the canonical four-bucket scale. CSA002 — Severity summary (any critical, ≥5 high, or ≥10 total). CSA003 — Stale open alerts (3/7/30/90 days by severity bucket). CSA004 — Dismissed without documented reason. Wires both into `Hypatia.CLI`: - registered in `@all_rule_modules` so the default scan includes them, - scan blocks emit normalised findings alongside the rest, - `format_module_name/1` gives them display names, - usage strings updated to list the new --rules tokens. Workflow comment in `.github/workflows/hypatia-scan.yml` updated to note that the existing `security-events: write` grant now covers all three alert APIs, not just Dependabot. No new permissions needed. Tests pin token-absent behaviour and the non-GitHub-remote error path for each module's helpers.

PR #278 documented that the deployed escript had been silently dropping the Elixir/Erlang/Coq/Lean/Agda/Zig/F*/Ada code_safety pattern families for days because the binary was stale relative to the rule sources. "No findings" looks identical whether the code is clean or the rule is broken — that ambiguity is the soundness gap. Closes it with the simplest possible mechanism: for every rule the scanner is supposed to detect, keep a known-bad sample on disk, and assert in CI that the rule fires on its sample at the expected severity. A rule that goes silent (regex drift, file pruning, packaging regression, module rename) breaks the build instead of silently weakening the estate's security posture. Layout: test/soundness/ manifest.json -- rule -> fixture -> severity fixtures/code_safety/ believe_me.idr -- Idris2 sorry.lean -- Lean admitted.v -- Coq unsafe_coerce.hs -- Haskell obj_magic_ocaml.ml -- OCaml getexn_on_external.res -- ReScript unwrap_without_check.rs -- Rust transmute.rs -- Rust unsafe elixir_system_shell.ex -- THE PR#278 false-negative elixir_os_cmd.ex -- Elixir os.cmd elixir_code_eval.ex -- Elixir Code.eval shell_download_then_run.sh -- curl|bash agda_postulate.agda -- Agda zig_ptr_cast.zig -- Zig README.adoc -- how to add a fixture test/soundness_test.exs -- runner, @moduletag :soundness Manifest entries cover all the language families PR #278 specifically called out as having been silently dropped. The runner is data-driven: adding a rule means dropping a fixture + a manifest entry, no test code change. Hand-run against the current tree: 14/14 fixtures fire at the expected severity. The soundness gate is operational. Out of scope (next iteration): - End-to-end escript-build soundness (build the escript, run it against the fixture corpus -- exact PR #278 reproduction). The in-process test catches rule-definition regressions, but a packaging regression that strips a module would still slip through. - Fixtures for non-code_safety families (workflow_audit, cicd_rules, structural_drift, scorecard, dependabot_alerts, ...).

The OutcomeTracker.verify_fix/3 re-scan mechanism existed but its result was discarded on the success path: clean re-scans produced no marker, unclean re-scans were re-recorded as :false_positive without preserving the "this was verification, not an organic failure" distinction. The outcomes log had no way to answer "what fraction of this recipe's 'successes' were actually verified clean by post-fix re-scan?" That's the closed-loop metric this commit adds. lib/outcome_tracker.ex record_outcome/4,5 Optional `metadata` map merges into the record (under the canonical fields so a caller can't overwrite recipe_id/repo/file/outcome/ timestamp/bot by accident). record_and_verify/5 Now persists the verification verdict on every branch: verified -> success record with "verification" = "verified" still_present -> success record with "verification" = "still_present" PLUS a follow-up :false_positive record (caused_by = "post_fix_rescan") scan_failed -> success record with "verification" = "scan_failed" verify: false -> outcome record with "verification" = "unverified" The distinction between "scan_failed" and "unverified" matters: a recipe is not penalised for being run in environments without panic-attack. verification_rate/2 For a recipe_id, returns counts {verified, still_present, scan_failed, unverified} and a rate = verified / (verified + still_present). scan_failed and unverified records are excluded from the denominator so a low-verification-attempt environment doesn't artificially deflate the rate. Returns :insufficient_data below min_attempts. recipe_health/1 Aggregates across every recipe with recorded outcomes. Returns a list of maps with dispatches / successes / failures / FPs / success_rate / verification breakdown / status, sorted so the most actionable rows (quarantine_candidate, degraded) surface first. Configurable thresholds. lib/mix/tasks/hypatia.recipe_health.ex mix hypatia.recipe_health [--format json] [--only-actionable] Prints the report in a human-readable table or JSON. test/recipe_health_test.exs Pins the rate calculation (verified/still_present ratio, scan_failed + unverified excluded), the insufficient_data threshold, and the healthy/degraded/quarantine_candidate status mapping. Hand-run against the current outcomes log: 4 recipes found, all flagged :insufficient_data because the historical log was written before the verification marker existed. From the next `record_and_verify`-enabled dispatch onwards, recipes will accumulate verification data and migrate to :healthy / :degraded / :quarantine_candidate based on real evidence.

@@ -0,0 +1,5 @@
+// SPDX-License-Identifier: MPL-2.0


github-actions · 2026-05-23T23:27:12Z

🔍 Hypatia Security Scan

Findings: 2 issues detected

Severity	Count
🔴 Critical	0
🟠 High	1
🟡 Medium	1

View findings

[
  {
    "reason": "Js.Dict deprecated -- use Dict (2 occurrences)",
    "type": "deprecated_api",
    "file": "/home/runner/work/hypatia/hypatia/test/soundness/fixtures/code_safety/getexn_on_external.res",
    "action": "module_replace",
    "rule_module": "migration_rules",
    "severity": "high"
  },
  {
    "reason": "Repository has 2 non-main remote branch(es). Policy: single main branch only.",
    "type": "GS007",
    "file": ".",
    "action": "delete_remote_branches",
    "rule_module": "git_state",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

claude and others added 6 commits May 23, 2026 18:57

Merge branch 'main' into claude/charming-hamilton-2nLPc

fa807c8

github-advanced-security AI found potential problems May 23, 2026

View reviewed changes

Comment thread test/soundness/fixtures/code_safety/getexn_on_external.res

@@ -0,0 +1,5 @@

// SPDX-License-Identifier: MPL-2.0

hyperpolymath merged commit f13d853 into main May 23, 2026
28 of 30 checks passed

hyperpolymath deleted the claude/charming-hamilton-2nLPc branch May 23, 2026 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude/charming hamilton 2n l pc#310

Claude/charming hamilton 2n l pc#310
hyperpolymath merged 6 commits into
mainfrom
claude/charming-hamilton-2nLPc

hyperpolymath commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

hyperpolymath commented May 23, 2026

Uh oh!

github-actions Bot commented May 23, 2026

🔍 Hypatia Security Scan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants