Skip to content

Enhance Snyk vuln analyzer for checksum-qualified Maven PURLs + surface Snyk match confidence #2106

@mehab

Description

@mehab

Current Behavior

Summary

Hyades processes uploaded SBOMs. Some SBOM generators (e.g., vendor-distributed Maven artifacts such as Cloudera) produce Maven PURLs with a checksum qualifier.

Snyk’s packages/issues API can successfully evaluate those checksum-qualified PURLs and returns:

  • vulnerability data, and
  • authoritative per-submitted-package match metadata under meta.packages[].match indicating full, partial, or none.

We want to update only the Snyk vulnerability analyzer so it can:

  • correctly handle checksum qualifiers in PURLs (for Maven only, where this is required), and
  • record/surface Snyk’s match confidence to users without gating (i.e., vulnerabilities returned by Snyk are still shown; they’re just flagged as uncertain where appropriate),
  • do so with minimal caching/performance impact and correct cache isolation.

Current behavior

Qualifier handling

  • The Snyk analyzer correlates PURLs largely using coordinates-only keys (PackageURL.getCoordinates()), which drops qualifiers like checksum.
  • This can prevent correct correlation of the Snyk response to the exact uploaded SBOM identity when Snyk relies on checksum-qualified matching.

Match metadata visibility

  • Snyk returns meta.packages[*].match (starting api version 2025-11-05, but the current analyzer does not use this to annotate results to users.

Cache-key considerations

  • Snyk and OSS Index both use the cache namespace name "results" and cache keys are strings derived from PURLs.
  • If Snyk changes keying strategy (e.g., to include checksum), we must avoid cache collisions—especially negative-cache (null) behavior that can suppress remote fetches.

Proposed Behavior

Part 1: Preserve and handle checksum qualifier PURLs (Snyk analyzer only)

Update hyades-apiserver’s Snyk vuln analyzer (vuln-analysis/snyk) so that for Maven PURLs that include a checksum qualifier:

  • The analyzer must treat the full canonical PURL (including qualifiers) as the correlation/request identity for that subset.
  • Correlation must align with Snyk’s response mapping:
    • Snyk response meta.packages entries are keyed by the submitted PURL, so we must correlate using that same identity form.
  • For Maven PURLs without checksum, keep existing coordinates-only behavior to minimize changes.
    (Implementation-wise, this means updating the logic in SnykVulnAnalyzer where it collects “analyzable PURLs”, where it keys cache entries, and where it matches Snyk response PURLs back to submitted components.)

Part 2: Surface meta.packages[*].match as a user-visible flag

Parse and store the match metadata from meta.packages[submittedPurl].match:

  • full → confidence 100% (applicable)
  • partial → show vuln but mark as partial match / verify
  • none → show vuln but mark as not matched / verify (since the vulnerability data may still be present; user must decide)

How to expose the flag in Hyades:

Can use existing attribution metadata in Hyades:

  • FindingAttribution.matchingPercentage (confidence)
  • FindingAttribution.alternateIdentifier (can store match type + match.description)

This avoids introducing new DB schema or changing vulnerability visibility logic.

Part 3: Cache update with minimal performance impact and correct isolation

Because Snyk and OSS Index share the same cache namespace name "results", we must ensure Snyk’s new checksum-qualified keying cannot corrupt/mask OSS Index or vice versa.

Proposed cache handling:

  • Keep cache-key changes localized to Snyk:
  • For Snyk, cache using the chosen identity key form:
  • coordinates-only for PURLs without checksum
  • full canonical PURL (including checksum) for Maven PURLs with checksum
  • Ensure cache isolation to prevent collisions:
  • Option A (preferred): use a Snyk-specific cache namespace name, e.g. cacheManager.getCache("snyk-results")
  • Option B: prefix cache keys with snyk: to prevent cross-analyzer collisions
    This also protects against the “negative cache” problem (null cached entries) where one analyzer could otherwise suppress another analyzer’s remote fetch for a colliding key.

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions