Skip to content

feat(conda): add conda/pixi ecosystem support, baseline roots, and first exposure catalogs#36

Open
jviehhauser wants to merge 6 commits into
perplexityai:mainfrom
jviehhauser:jv/conda-ecosystem
Open

feat(conda): add conda/pixi ecosystem support, baseline roots, and first exposure catalogs#36
jviehhauser wants to merge 6 commits into
perplexityai:mainfrom
jviehhauser:jv/conda-ecosystem

Conversation

@jviehhauser
Copy link
Copy Markdown

@jviehhauser jviehhauser commented May 28, 2026

Adds conda/pixi as bumblebee's 9th ecosystem, wires up baseline roots so a default scan picks up the common install layouts, and ships two threat-intel catalogs that exercise the new code path.

What's in the diff

Scanner. internal/ecosystem/conda parses <env>/conda-meta/<name>-<version>-<build>.json — the canonical install record conda, mamba, micromamba, and pixi all share. Each record carries the exact name, version, build, and channel emitted by the package builder, so a parsed record is high-confidence proof of an installed version. Pip-installed packages that conda records under the same conda-meta/ directory (channel = pypi) surface as package_manager=pip so receivers can tell them apart from conda-channel installs.

Roots. Baseline now picks up ~/.pixi, ~/miniconda3, ~/anaconda3, ~/miniforge3, ~/mambaforge, ~/micromamba, and the Homebrew --cask anaconda prefixes (/opt/homebrew/anaconda*, /usr/local/anaconda* — glob-discovered so versioned variants are caught too).

File-size default. Bumped --max-file-size from 5 MiB → 16 MiB. conda-meta records enumerate every file a package ships and reach 16+ MiB on bundles like google-cloud-sdk; the old default silently skipped them.

Catalogs. Two new files in threat_intel/:

File Entries What
conda-forge-metadata-2025-03-04.json 1 × pypi CVE-2025-27510 dep-confusion RCE in conda-forge-metadata ≤ 0.4.1 via the unregistered conda-oci-mirror optional dep
conda-tooling-2025-06-14.json 3 × conda CVE-2025-32798/-32799 against conda-build ≤ 25.3.2 (recipe-selector RCE + Tarslip), CVE-2025-49824 against conda-smithy ≤ 3.47.0 (RSA padding-oracle) — all from the 7ASecurity / OSTIF / STA conda-forge audit

conda-build and conda-smithy ship only via conda-forge (their PyPI namesakes are inert placeholders pointing readers to the conda channel), so those advisories are properly ecosystem: conda and rely on this PR's scanner to fire.

Proof it works

Live scan against a real macOS host with a Homebrew Anaconda install plus pixi envs:

profile=deep status=complete duration_ms=10597
files_considered=887751  findings_emitted=3

  conda/conda-build@3.21.9   → CVE-2025-32798 (recipe-selector RCE)
  conda/conda-build@3.21.9   → CVE-2025-32799 (Tarslip path traversal)
  conda/conda-smithy@3.21.3  → CVE-2025-49824 (RSA padding-oracle)

Patched installs (conda-build 26.3.0, conda-smithy 3.59.0 in pixi envs) produced zero findings. bumblebee selftest reports selftest OK (4 findings in 3ms) with a new conda fixture that exercises the channelFromURL fallback — the most bug-prone code path here, since every real conda-meta record on a typical pixi host omits the schannel field the parser would otherwise short-circuit on.

What's not here, and why

No conda "campaign" catalog. I looked — there isn't one to cite. conda-forge's Apr 2025 token-exposure postmortem found no shipped malicious packages, the Jul 2025 audit findings are infrastructure hardening, conda-forge was unaffected by xz CVE-2024-3094, and the Shai-Hulud / Mini-Shai-Hulud / TrapDoor campaigns are npm + PyPI with no public record of conda republication. When the next cross-ecosystem campaign hits a conda-forge feedstock, mirroring affected names under ecosystem: conda will be a few-line follow-up.

No pixi.lock / pixi.toml parsing. conda-meta is the authoritative installed-state source and is shared by every conda-compatible tool; a lockfile parser would need hand-rolled YAML (à la pnpm) and is its own PR.

Cross-ecosystem refactors deferred. A 3-agent maintainability audit on the final commit surfaced items worth doing — centralising the scanner dispatch, extracting readBounded (now duplicated across pypi/composer/conda), a typed PackageManager alias, per-ecosystem MaxFileSize, and conda cases in scanner_integration_test.go. Each touches all 9 ecosystems, so they belong in focused follow-up PRs rather than threading through this diff.

Security review

Multi-agent review on this branch returned zero HIGH/MEDIUM findings. Six concerns were investigated and cleared: symlink behaviour in walk dispatch, JSON-decode robustness against attacker input, NDJSON output injection, walk-dispatch case ordering, credential exposure via the new conda roots, and readBounded parity with the established pypi/composer precedent.

Test plan

  • go build ./..., go test ./..., go vet ./..., gofmt -l . clean
  • bumblebee selftest — 4 findings, including the conda fixture that exercises channelFromURL
  • Both catalogs load via --exposure-catalog without parse errors
  • 17 synthetic match probes pass (positive, negative, ecosystem isolation, case-insensitive normalisation, URL-shape coverage)
  • Real-host end-to-end run: 3 expected findings + 320 package records + 0 diagnostics
  • Cross-platform spot-check (Linux conda installs, micromamba envs)

Supersedes #37 (closed; commits cherry-picked here).

Parses per-package install records that conda, mamba, micromamba, and
pixi all write at <env>/conda-meta/<name>-<version>-<build>.json. Records
carry exact name/version/build emitted by the package builder, so each
parsed record is high-confidence proof of an installed package version
in the surrounding env prefix. pip-installed packages co-located in the
same env (schannel="pypi") are surfaced as package_manager=pip so
receivers can disambiguate install sources within a shared env.

Adds EcosystemConda="conda" (matching the PURL conda type — OSV does
not enumerate conda yet), a Conda() name normalizer, baseline roots for
~/.pixi, ~/miniconda3, ~/anaconda3, ~/miniforge3, ~/mambaforge, and
~/micromamba, and walk-time dispatch for any *.json file whose parent
directory is conda-meta/.

pixi.lock and pixi.toml manifests are intentionally not parsed in this
change; conda-meta is the authoritative installed-state source and is
shared by every conda-compatible package manager.

Co-Authored-By: Claude <noreply@anthropic.com>
jviehhauser added a commit to jviehhauser/bumblebee that referenced this pull request May 28, 2026
Adds threat_intel/conda-tooling-7asecurity-2025-06-14.json covering the
three conda-channel-package CVEs published 2025-06-14 from the
7ASecurity OSTIF/STA-sponsored conda-forge audit (March-April 2025):

  - CVE-2025-32798 / GHSA-6cc8-c3c9-3rgr: conda-build <=25.3.2 arbitrary
    code execution via unsafe evaluation of malicious recipe selectors
  - CVE-2025-32799 / GHSA-h499-pxgj-qh5h: conda-build <=25.3.2 Tarslip
    path traversal via crafted tar entry paths
  - CVE-2025-49824 / GHSA-2xf4-hg9q-m58q: conda-smithy <=3.47.0 RSA
    PKCS#1 v1.5 padding-oracle in travis_encrypt_binstar_token

All three are ecosystem:"conda" — the PyPI namesakes are inert
placeholders that point readers to the conda channel, so these
advisories only match against conda-meta records produced by the conda
scanner added in PR perplexityai#36. Affected version arrays enumerate every
conda-forge release at or below each advisory's "<=X.Y.Z" cutoff per
the project convention (112 conda-build versions, 228 conda-smithy
versions), pulled from the anaconda.org channel API.

Match coverage verified locally: every enumerated version produces the
expected hit count (1 or 2 per record), patched versions and current
releases produce zero hits, wrong-ecosystem records do not match, and
case-insensitive name normalization works through the existing
exposure-catalog lowercase fallback.

The broader 7ASecurity audit also produced infrastructure-level CVEs
(CVE-2025-31484 anaconda.org token exposure, CVE-2025-49823
staged-recipes weak permissions, CVE-2025-32784/-32797 conda-smithy CI
hardening) which are not catalogable as on-disk package presence; see
https://conda-forge.org/blog/2025/07/16/security-audit/ for the full
audit summary.

Co-Authored-By: Claude <noreply@anthropic.com>
jviehhauser and others added 3 commits May 28, 2026 15:39
Two follow-ups surfaced by a real conda-meta end-to-end run against
/opt/homebrew/anaconda3:

  - Add /opt/homebrew/anaconda3 (Apple Silicon) and /usr/local/anaconda3
    (Intel) to macOS systemRoots. Homebrew's `--cask anaconda` installs
    to these prefixes, which sit outside /opt/homebrew/lib. Without
    these roots a baseline scan misses the base env's conda-meta
    records entirely, even though it's the most common conda install
    location on macOS. filterExistingRoots drops the entries on hosts
    that lack the install, matching how /Library/Python is handled.

  - Raise the default --max-file-size from 5 MiB to 16 MiB. The 5 MiB
    default was set against PyPI dist-info METADATA sizes (typically
    well under 100 KiB); conda-meta records, by contrast, enumerate
    every file shipped by a package and can reach several MiB on
    bundles like google-cloud-sdk. On the test host a real install
    skipped a 16.5 MiB conda-meta record because the cap was 5 MiB.
    16 MiB covers the common case while still rejecting truly
    pathological inputs; operators can pass --max-file-size to tune
    further.

The selftest's hardcoded MaxFileSize is bumped to match the new CLI
default so the two stay coupled.

Co-Authored-By: Claude <noreply@anthropic.com>
…catalog

Catalogs the GHSA-vwfh-m3q7-9jpw dependency-confusion RCE in
conda-forge-metadata <=0.4.1. The package declares an optional
dependency on `conda-oci-mirror` (an unregistered PyPI name) under its
`[oci]` extras; an attacker who claimed that PyPI name before
conda-forge did could RCE on anyone running
`pip install conda-forge-metadata[oci]`. The fix was applied upstream
by registering the placeholder name, so affected installed releases
(0.3.0 and 0.4.1 — the only two releases at or below 0.4.1 per PyPI
history) remain useful to flag on inventory scans.

Ecosystem is `pypi` because conda-forge-metadata ships via PyPI rather
than via the conda-forge channel. This is the first catalog covering
the conda/pixi tooling supply chain; pixi users picking up conda
tooling via pixi.lock's pypi section, or anyone with a
`pip install conda-forge-metadata` in their environment, would
surface here.

Co-Authored-By: Claude <noreply@anthropic.com>
Adds threat_intel/conda-tooling-7asecurity-2025-06-14.json covering the
three conda-channel-package CVEs published 2025-06-14 from the
7ASecurity OSTIF/STA-sponsored conda-forge audit (March-April 2025):

  - CVE-2025-32798 / GHSA-6cc8-c3c9-3rgr: conda-build <=25.3.2 arbitrary
    code execution via unsafe evaluation of malicious recipe selectors
  - CVE-2025-32799 / GHSA-h499-pxgj-qh5h: conda-build <=25.3.2 Tarslip
    path traversal via crafted tar entry paths
  - CVE-2025-49824 / GHSA-2xf4-hg9q-m58q: conda-smithy <=3.47.0 RSA
    PKCS#1 v1.5 padding-oracle in travis_encrypt_binstar_token

All three are ecosystem:"conda" — the PyPI namesakes are inert
placeholders that point readers to the conda channel, so these
advisories only match against conda-meta records produced by the conda
scanner added in PR perplexityai#36. Affected version arrays enumerate every
conda-forge release at or below each advisory's "<=X.Y.Z" cutoff per
the project convention (112 conda-build versions, 228 conda-smithy
versions), pulled from the anaconda.org channel API.

Match coverage verified locally: every enumerated version produces the
expected hit count (1 or 2 per record), patched versions and current
releases produce zero hits, wrong-ecosystem records do not match, and
case-insensitive name normalization works through the existing
exposure-catalog lowercase fallback.

The broader 7ASecurity audit also produced infrastructure-level CVEs
(CVE-2025-31484 anaconda.org token exposure, CVE-2025-49823
staged-recipes weak permissions, CVE-2025-32784/-32797 conda-smithy CI
hardening) which are not catalogable as on-disk package presence; see
https://conda-forge.org/blog/2025/07/16/security-audit/ for the full
audit summary.

Co-Authored-By: Claude <noreply@anthropic.com>
@jviehhauser jviehhauser changed the title feat(conda): add conda/pixi ecosystem support via conda-meta records feat(conda): add conda/pixi ecosystem support, baseline roots, and first exposure catalogs May 28, 2026
jviehhauser and others added 2 commits May 28, 2026 15:53
Audit pass over the conda branch surfaced two issues worth fixing before
merge:

1. channelFromURL mis-handled two common real-world shapes. On every
   pixi env scanned during local verification the `channel` field was
   `https://conda.anaconda.org/conda-forge/` (trailing slash, no
   subdir); the old "second-to-last segment" heuristic returned the
   hostname `conda.anaconda.org` instead of the channel name
   `conda-forge`. Likewise the bare-string `channel: "pypi"` that some
   mamba/micromamba versions write (instead of the schannel field)
   returned the empty string, so pip-installed packages were silently
   mis-attributed as `package_manager=conda`. Rewritten to strip
   scheme+host explicitly and return the first path segment, which is
   always the channel; subdir, when present, is always trailing.
   New unit tests cover URLs with/without subdir, trailing-slash URLs,
   non-anaconda hosts, bare-string channels, and the
   "schannel-absent + channel=pypi" pip-attribution path that exercises
   the fix end-to-end.

2. The bumblebee selftest had no conda coverage. The fleet rollout
   smoke test the README advertises would not have caught a regression
   in conda-meta detection, the normalize-Conda hookup, or the conda
   ecosystem's catalog-matching path. Added a conda-fixture under
   selftest/fixtures/ (a realistic <name>-<version>-<build>.json
   conda-meta record using the bumblebee-selftest-evil sentinel name),
   a fourth selftest catalog entry for ecosystem "conda", and bumped
   expectedSelftestFindings from 3 to 4. `bumblebee selftest` now
   reports "selftest OK (4 findings in 3ms)" on a freshly built
   binary.

Docs updated to describe both the schannel and channel-fallback paths
and to spell out the first-path-segment channel extraction.

Co-Authored-By: Claude <noreply@anthropic.com>
Round of small, focused fixes surfaced by parallel code/test/catalog
maintainability reviews. None are functional regressions; all reduce
future-maintenance friction.

selftest:
  - Fixture now omits `schannel` and uses a trailing-slash channel URL
    (`https://conda.anaconda.org/bumblebee-selftest/`). Every live
    conda-meta record on a typical pixi install has this shape, but the
    previous fixture set both fields and short-circuited on `Schannel`,
    leaving `channelFromURL` — the most bug-prone code path in the
    conda scanner — unexercised by the fleet rollout smoke test.

catalogs:
  - Rename `conda-tooling-7asecurity-2025-06-14.json` ->
    `conda-tooling-2025-06-14.json`. The audit-firm slug felt dated
    and inconsistent with sibling catalogs that use campaign names or
    package names; the threat_intel/README row still attributes the
    audit.
  - Reformat the conda-build and conda-smithy version arrays from
    10/line to 1/line, matching the convention used by
    laravel-lang-2026-05-23.json and every other multi-version catalog.
  - Drop the redundant `audit_source` key from each entry's
    `indicators` block. Provenance is already captured in the
    catalog-level `_comment`, and `indicators` should stay reserved
    for IOCs (vectors, hashes, network markers) per the
    node-ipc/trapdoor precedent.
  - Split the giant single-string `_comment` into `_comment` (campaign
    scope) and a new `_comment_excluded_cves` (the related-but-not-
    catalogable CVE list: CVE-2025-31484, -49823, -32784, -32797,
    -49842, -49843). The exclusion list was buried in 1000 characters
    of prose; now it's discoverable.

docs:
  - threat_intel/README.md gains a "Catalog entry fields" section
    documenting the four required keys + optional severity/name, and
    explicitly calls out that everything else (cve, ghsa, cvss,
    published, patched_version, indicators, _comment, source, ...) is
    free-form documentation the loader silently drops. Previously this
    distinction lived only as Go comments on the Entry struct.
  - docs/inventory-sources.md's prose paragraph describing the conda
    channel-extraction rules is replaced with a one-line pointer to
    the channelFromURL doc-comment in
    internal/ecosystem/conda/conda.go, which already enumerates every
    URL shape with examples. Removes a drift point between code and
    docs.
  - README's coverage table row for Conda/pixi drops the
    "(conda, mamba, micromamba, pixi all share this layout)"
    parenthetical to match the wording shape of the other eight rows;
    the manager list is still documented in inventory-sources.md.
  - README gains a parallel "Example conda package record" details
    block alongside the existing pnpm one, so downstream consumers
    have a worked example of `source_type=conda-meta` /
    `package_manager=conda|pip` payloads.

roots:
  - macOS Homebrew anaconda roots are now glob-discovered
    (`/opt/homebrew/anaconda*`, `/usr/local/anaconda*`) instead of
    hardcoded `anaconda3`. Matches the existing pattern used for
    `/usr/lib/python*` on Linux, and picks up versioned variants
    (`anaconda3-2024.02`, etc.) without future edits.

tests:
  - TestScanCondaMetaRecord_MalformedJSON now asserts that the
    returned error message references the offending file path, not
    just that an error was returned. Locks in the error-wrapping
    contract so a future refactor that drops the path wrap can't
    silently degrade the orchestrator's downstream diagnostics.

Larger architectural items raised by the audit (centralised dispatch
table, extracted readBounded helper, typed PackageManager enum,
per-ecosystem MaxFileSize, conda coverage in scanner_integration_test)
are deferred to follow-up PRs as cross-ecosystem refactors rather than
threading them through the conda diff.

Co-Authored-By: Claude <noreply@anthropic.com>
@jviehhauser jviehhauser marked this pull request as ready for review May 28, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant