Skip to content

Claude/charming hamilton 2n l pc#310

Merged
hyperpolymath merged 6 commits into
mainfrom
claude/charming-hamilton-2nLPc
May 23, 2026
Merged

Claude/charming hamilton 2n l pc#310
hyperpolymath merged 6 commits into
mainfrom
claude/charming-hamilton-2nLPc

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

No description provided.

claude and others added 6 commits May 23, 2026 18:57
Recipe matcher rejected every scorecard-source finding (~310 ecosystem-
wide), routing them to :control "no safe fix available" advisories.

Root cause: `lib/recipe_matcher.ex` filtered candidate recipes with
`"*" in langs or language in langs`. Two failure modes:

  1. 12 recipes declared `languages: ["any"]` — never matched, since
     `"any"` is not a sentinel the filter recognises and no repo has
     `"any"` as its primary language.
  2. 8 scorecard / workflow-file recipes declared `languages: ["yaml"]`
     — never matched, since yaml is a workflow-file type, not any
     repo's primary language. So `recipe-pin-dependencies`,
     `recipe-fix-workflow-permissions`, etc. were unreachable for SC013/
     SC018 findings — the exact rule families dominating the daily
     remediation sweep.

Fix:

  - `langs_match?/2` private helper accepts `"*"` and `"any"` as
    synonymous language-agnostic sentinels.
  - `effective_language_for/2` remaps the lookup language to `"yaml"`
    for patterns whose `source` is `"scorecard"` or whose `category`
    names a known workflow-file rule family (DependencyPinning,
    TokenPermissions, DangerousWorkflow, etc.). The repo's primary
    language is irrelevant for workflow-file findings.
  - Applied to `best_recipe/2`, `category_match_recipe/2`, and
    `fuzzy_match_recipe/2`.

Tests pin all three invariants. All 22 scorecard recipe `fix_script`
references already exist on disk in `scripts/fix-scripts/` — the bug
was purely in matcher reachability, not missing fix implementations.

Closes the dispatcher half of the "no security stuff being sorted"
symptom. Remaining M7 work (PAT for cross-repo dispatch, push fixes
to remotes) still needs operator action, but the manifests will now
carry populated fix_script fields for scorecard findings.
The baseline had drifted into pure historical risk: 71 accepted findings
(31 critical, 40 high) generated before the #278 stale-escript fix and
the wave of code_safety/security_errors cleanups landed.

A fresh scan against the current tree finds 35 findings, all
medium-or-lower:
  - 32 low (code_safety hot-path expects, ncl_docker_not_podman,
    workflow_audit missing-workflow, structural_drift, etc.)
  - 3 medium (git_state transient + structural_drift)
  - 0 critical, 0 high

Most old baseline entries are either:
  - fixed in code (e.g. the believe_me at src/abi/RuleEngine.idr is now
    inline-suppressed with a documented `-- hypatia: allow` directive),
  - migrated/refactored (e.g. lib/direct_github_pr.ex no longer exists),
  - or were covered by the new total-Python-ban / scanner-soundness wave.

Net effect: every gate threshold of "fail on critical|high above
baseline" now starts from an empty critical/high ledger — net-new
critical or high findings will stand out, which is what the baseline is
supposed to enable.

Generated with the canonical Elixir escript pipeline against this tree
(no rule changes, just a snapshot refresh). Severity threshold "low" so
the snapshot reflects the full advisory surface, not just gates.
The HYPATIA_DISPATCH_PAT was provisioned with read access to
secret-scanning alerts, code-scanning alerts, and Dependabot alerts.
Only Dependabot was actually being consumed (lib/rules/dependabot_alerts.ex,
DA001-DA004) — the other two alert surfaces were granted but unused.

Adds two new rule modules mirroring the DependabotAlerts shape:

  lib/rules/secret_scanning_alerts.ex (SSA001-SSA004)
    SSA001 — Open leaked-secret alerts (always :critical; staleness
             surfaced in the reason for triage prioritisation).
    SSA002 — Repo-level meta-finding when any open alert exists.
    SSA003 — Stale open alerts past the 7-day rotation threshold.
    SSA004 — Resolved alerts with no documented resolution vocabulary
             (anything outside revoked/used_in_tests/pattern_deleted/
             pattern_edited).

  lib/rules/code_scanning_alerts.ex (CSA001-CSA004)
    CSA001 — Open code-scanning alerts (CodeQL + third-party SARIF
             including Hypatia's own `hypatia` category). Severity
             mapped from `security_severity_level`/`severity` onto the
             canonical four-bucket scale.
    CSA002 — Severity summary (any critical, ≥5 high, or ≥10 total).
    CSA003 — Stale open alerts (3/7/30/90 days by severity bucket).
    CSA004 — Dismissed without documented reason.

Wires both into `Hypatia.CLI`:
  - registered in `@all_rule_modules` so the default scan includes them,
  - scan blocks emit normalised findings alongside the rest,
  - `format_module_name/1` gives them display names,
  - usage strings updated to list the new --rules tokens.

Workflow comment in `.github/workflows/hypatia-scan.yml` updated to
note that the existing `security-events: write` grant now covers all
three alert APIs, not just Dependabot. No new permissions needed.

Tests pin token-absent behaviour and the non-GitHub-remote error path
for each module's helpers.
PR #278 documented that the deployed escript had been silently dropping
the Elixir/Erlang/Coq/Lean/Agda/Zig/F*/Ada code_safety pattern families
for days because the binary was stale relative to the rule sources.
"No findings" looks identical whether the code is clean or the rule is
broken — that ambiguity is the soundness gap.

Closes it with the simplest possible mechanism: for every rule the
scanner is supposed to detect, keep a known-bad sample on disk, and
assert in CI that the rule fires on its sample at the expected severity.
A rule that goes silent (regex drift, file pruning, packaging
regression, module rename) breaks the build instead of silently
weakening the estate's security posture.

Layout:
  test/soundness/
    manifest.json                         -- rule -> fixture -> severity
    fixtures/code_safety/
      believe_me.idr                      -- Idris2
      sorry.lean                          -- Lean
      admitted.v                          -- Coq
      unsafe_coerce.hs                    -- Haskell
      obj_magic_ocaml.ml                  -- OCaml
      getexn_on_external.res              -- ReScript
      unwrap_without_check.rs             -- Rust
      transmute.rs                        -- Rust unsafe
      elixir_system_shell.ex              -- THE PR#278 false-negative
      elixir_os_cmd.ex                    -- Elixir os.cmd
      elixir_code_eval.ex                 -- Elixir Code.eval
      shell_download_then_run.sh          -- curl|bash
      agda_postulate.agda                 -- Agda
      zig_ptr_cast.zig                    -- Zig
    README.adoc                           -- how to add a fixture

  test/soundness_test.exs                 -- runner, @moduletag :soundness

Manifest entries cover all the language families PR #278 specifically
called out as having been silently dropped. The runner is data-driven:
adding a rule means dropping a fixture + a manifest entry, no test code
change.

Hand-run against the current tree: 14/14 fixtures fire at the expected
severity. The soundness gate is operational.

Out of scope (next iteration):
  - End-to-end escript-build soundness (build the escript, run it
    against the fixture corpus -- exact PR #278 reproduction). The
    in-process test catches rule-definition regressions, but a
    packaging regression that strips a module would still slip
    through.
  - Fixtures for non-code_safety families (workflow_audit, cicd_rules,
    structural_drift, scorecard, dependabot_alerts, ...).
The OutcomeTracker.verify_fix/3 re-scan mechanism existed but its result
was discarded on the success path: clean re-scans produced no marker,
unclean re-scans were re-recorded as :false_positive without preserving
the "this was verification, not an organic failure" distinction. The
outcomes log had no way to answer "what fraction of this recipe's
'successes' were actually verified clean by post-fix re-scan?"

That's the closed-loop metric this commit adds.

  lib/outcome_tracker.ex
    record_outcome/4,5
      Optional `metadata` map merges into the record (under the canonical
      fields so a caller can't overwrite recipe_id/repo/file/outcome/
      timestamp/bot by accident).
    record_and_verify/5
      Now persists the verification verdict on every branch:
        verified       -> success record with "verification" = "verified"
        still_present  -> success record with "verification" = "still_present"
                          PLUS a follow-up :false_positive record
                          (caused_by = "post_fix_rescan")
        scan_failed    -> success record with "verification" = "scan_failed"
        verify: false  -> outcome record with "verification" = "unverified"
      The distinction between "scan_failed" and "unverified" matters: a
      recipe is not penalised for being run in environments without
      panic-attack.
    verification_rate/2
      For a recipe_id, returns counts {verified, still_present,
      scan_failed, unverified} and a rate = verified / (verified +
      still_present). scan_failed and unverified records are excluded
      from the denominator so a low-verification-attempt environment
      doesn't artificially deflate the rate. Returns :insufficient_data
      below min_attempts.
    recipe_health/1
      Aggregates across every recipe with recorded outcomes. Returns a
      list of maps with dispatches / successes / failures / FPs /
      success_rate / verification breakdown / status, sorted so the
      most actionable rows (quarantine_candidate, degraded) surface
      first. Configurable thresholds.

  lib/mix/tasks/hypatia.recipe_health.ex
    mix hypatia.recipe_health [--format json] [--only-actionable]
    Prints the report in a human-readable table or JSON.

  test/recipe_health_test.exs
    Pins the rate calculation (verified/still_present ratio, scan_failed
    + unverified excluded), the insufficient_data threshold, and the
    healthy/degraded/quarantine_candidate status mapping.

Hand-run against the current outcomes log: 4 recipes found, all flagged
:insufficient_data because the historical log was written before the
verification marker existed. From the next `record_and_verify`-enabled
dispatch onwards, recipes will accumulate verification data and migrate
to :healthy / :degraded / :quarantine_candidate based on real evidence.
@@ -0,0 +1,5 @@
// SPDX-License-Identifier: MPL-2.0
@github-actions
Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 2 issues detected

Severity Count
🔴 Critical 0
🟠 High 1
🟡 Medium 1
View findings
[
  {
    "reason": "Js.Dict deprecated -- use Dict (2 occurrences)",
    "type": "deprecated_api",
    "file": "/home/runner/work/hypatia/hypatia/test/soundness/fixtures/code_safety/getexn_on_external.res",
    "action": "module_replace",
    "rule_module": "migration_rules",
    "severity": "high"
  },
  {
    "reason": "Repository has 2 non-main remote branch(es). Policy: single main branch only.",
    "type": "GS007",
    "file": ".",
    "action": "delete_remote_branches",
    "rule_module": "git_state",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath merged commit f13d853 into main May 23, 2026
28 of 30 checks passed
@hyperpolymath hyperpolymath deleted the claude/charming-hamilton-2nLPc branch May 23, 2026 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants