Skip to content

feat: CVE-usage check — second-stage usage filter on top of pip_audit #3

@misnaej

Description

@misnaej

Problem

Forge ships pip_audit as a non-blocking pre-commit step. It flags vulnerable packages — every CVE in a dependency's advisory database, whether or not the project actually uses the vulnerable code path. On a large dependency tree this produces chronic noise: a CVE in a function the project never calls is reported every run, trains contributors to ignore the step, and never gets a clean signal.

The missing layer is a second-stage filter: given the CVEs that pip-audit currently reports, grep the codebase for actual usage of the vulnerable functions. Only warn when the vulnerable code path is genuinely present. When the package is upgraded and the CVE drops off the pip-audit report, the warning disappears automatically — no manual list maintenance.

How it works (reference design, ~360 LOC)

pip-audit --skip-editable --format=json
        │
        ▼
  set of currently-active CVE IDs  ──────┐
                                          │ intersect
  CVE → vulnerable-usage-patterns map  ───┤  (consumer-maintained config)
                                          ▼
        for each active CVE that has a pattern entry:
            grep src/ projects/ scripts/ test/ for the patterns
                │
                ▼
        report: CVE, package, file:line of actual usage,
                risk note, mitigation

Key properties:

  • Self-maintaining. The pattern map is keyed by CVE ID. A pattern is only checked if pip-audit is currently reporting that CVE. Upgrade the package → CVE leaves the report → that pattern is skipped → warning gone. No stale entries to prune.
  • Usage-scoped. A CVE in package.foo() with zero package.foo calls in the codebase produces no warning. Only matched usage is surfaced.
  • Risk + mitigation per finding. Each pattern entry carries a human-readable risk ("only risky if parsing untrusted XML") and mitigation ("ensure XML sources are trusted") so the contributor gets actionable context, not just a line number.

Reference pattern-map shape (consumer-maintained)

CVE_PATTERNS: dict[str, dict] = {
    "CVE-XXXX-NNNNN": {
        "package": "<package>",
        "description": "<one-line CVE summary>",
        "patterns": [
            r"from\s+<pkg>\s+import\s+<vulnerable_symbol>",
            r"<pkg>\.<vulnerable_call>\(",
        ],
        "risk": "<when this is actually exploitable>",
        "mitigation": "<how to neutralize without upgrading, if possible>",
    },
}

The map is consumer config, not shipped by forge — each repo's vulnerable surface differs. Forge ships the engine (run pip-audit, intersect, grep, report) and reads the map from a known location.

Where the config lives

Two options:

  1. [tool.forge.cve_usage] in pyproject.toml — inline TOML table of CVE → patterns. Keeps everything in one config file; awkward for multi-line regex.
  2. A dedicated cve_usage_patterns.py (or .toml) at repo root — discovered by the CLI. Cleaner for regex-heavy entries.

Recommend option 2 (dedicated file) given the regex density, with the CLI falling back to "no patterns configured → skip cleanly" when absent.

Components to ship

Component Purpose Approx LOC
forge.verify_cve_usage (verify-forge-cve-usage) Run pip-audit, intersect with the pattern map, grep, write code_health/cve_usage.log ~360
step_cve_usage in forge.precommit Non-blocking step (mirrors pip_audit), opt-in via presence of a pattern config small
FOUNDATION note §2 or §4 — document the two-stage model (pip_audit flags packages; cve-usage flags usage) small

Imports / dependencies

  • Reuses the existing pip-audit invocation forge already does in step_pip_audit. Consider sharing a helper rather than shelling out to pip-audit twice.
  • Pure stdlib for the grep + JSON parse. No new third-party dep.

How consumers use it

Pre-commit (non-blocking, opt-in)

# presence of cve_usage_patterns.* at repo root enables the step

Step runs after pip_audit, writes code_health/cve_usage.log, never blocks (advisory). --strict (PR-finalization) escalates findings to blocking, same as pip_audit.

CI

- name: Security Scan - CVE Usage Check
  run: |
    forge-precommit  # cve_usage step runs in sequence
    # or standalone:
    verify-forge-cve-usage 2>&1 | tee ./code_health/cve_usage.log

Behavioral guarantees

  • No false positives from inactive CVEs. A pattern whose CVE is not in the current pip-audit report is never checked.
  • Comment-aware. Lines starting with # are skipped (a pattern string inside a comment is not real usage).
  • Self-excluding. The pattern-config file itself is excluded from the grep (it contains the patterns verbatim).
  • Graceful when pip-audit unavailable. If pip-audit is missing or times out, the step logs a skip and exits 0 — never hard-fails on missing tooling (FOUNDATION §15 posture).

Acceptance criteria

  • verify-forge-cve-usage CLI on [project.scripts].
  • Reads a consumer pattern config; skips cleanly when absent.
  • Intersects pattern keys with live pip-audit CVE IDs; only checks active CVEs.
  • Writes code_health/cve_usage.log with file:line, risk, mitigation per finding.
  • step_cve_usage non-blocking in normal, blocking in strict.
  • Tests cover: active CVE with usage (finding), active CVE without usage (no finding), inactive CVE (skipped), comment-line exclusion, pip-audit-unavailable skip.

Out of scope

  • Maintaining the CVE pattern map for consumers (it's their config).
  • Replacing pip_audit (this layers on top; both stay).
  • Auto-generating patterns from CVE advisories (the regex needs human judgment about which call is the vulnerable one).

Related

  • forge.precommit.step_pip_audit — the first-stage package-level check this builds on.
  • FOUNDATION §15 runtime-context posture — the graceful-when-tooling-missing behavior.
  • issue_local_smart_test_suite.md — same "forge ships the engine, consumer supplies config" pattern.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci-testingCI / test infrastructurefeatureNew capabilitysecuritySecurity-sensitivetier-2-highImportant + high ROI

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions