feat(conda): add conda/pixi ecosystem support, baseline roots, and first exposure catalogs#36
Open
jviehhauser wants to merge 6 commits into
Open
feat(conda): add conda/pixi ecosystem support, baseline roots, and first exposure catalogs#36jviehhauser wants to merge 6 commits into
jviehhauser wants to merge 6 commits into
Conversation
Parses per-package install records that conda, mamba, micromamba, and pixi all write at <env>/conda-meta/<name>-<version>-<build>.json. Records carry exact name/version/build emitted by the package builder, so each parsed record is high-confidence proof of an installed package version in the surrounding env prefix. pip-installed packages co-located in the same env (schannel="pypi") are surfaced as package_manager=pip so receivers can disambiguate install sources within a shared env. Adds EcosystemConda="conda" (matching the PURL conda type — OSV does not enumerate conda yet), a Conda() name normalizer, baseline roots for ~/.pixi, ~/miniconda3, ~/anaconda3, ~/miniforge3, ~/mambaforge, and ~/micromamba, and walk-time dispatch for any *.json file whose parent directory is conda-meta/. pixi.lock and pixi.toml manifests are intentionally not parsed in this change; conda-meta is the authoritative installed-state source and is shared by every conda-compatible package manager. Co-Authored-By: Claude <noreply@anthropic.com>
4 tasks
jviehhauser
added a commit
to jviehhauser/bumblebee
that referenced
this pull request
May 28, 2026
Adds threat_intel/conda-tooling-7asecurity-2025-06-14.json covering the three conda-channel-package CVEs published 2025-06-14 from the 7ASecurity OSTIF/STA-sponsored conda-forge audit (March-April 2025): - CVE-2025-32798 / GHSA-6cc8-c3c9-3rgr: conda-build <=25.3.2 arbitrary code execution via unsafe evaluation of malicious recipe selectors - CVE-2025-32799 / GHSA-h499-pxgj-qh5h: conda-build <=25.3.2 Tarslip path traversal via crafted tar entry paths - CVE-2025-49824 / GHSA-2xf4-hg9q-m58q: conda-smithy <=3.47.0 RSA PKCS#1 v1.5 padding-oracle in travis_encrypt_binstar_token All three are ecosystem:"conda" — the PyPI namesakes are inert placeholders that point readers to the conda channel, so these advisories only match against conda-meta records produced by the conda scanner added in PR perplexityai#36. Affected version arrays enumerate every conda-forge release at or below each advisory's "<=X.Y.Z" cutoff per the project convention (112 conda-build versions, 228 conda-smithy versions), pulled from the anaconda.org channel API. Match coverage verified locally: every enumerated version produces the expected hit count (1 or 2 per record), patched versions and current releases produce zero hits, wrong-ecosystem records do not match, and case-insensitive name normalization works through the existing exposure-catalog lowercase fallback. The broader 7ASecurity audit also produced infrastructure-level CVEs (CVE-2025-31484 anaconda.org token exposure, CVE-2025-49823 staged-recipes weak permissions, CVE-2025-32784/-32797 conda-smithy CI hardening) which are not catalogable as on-disk package presence; see https://conda-forge.org/blog/2025/07/16/security-audit/ for the full audit summary. Co-Authored-By: Claude <noreply@anthropic.com>
Two follow-ups surfaced by a real conda-meta end-to-end run against
/opt/homebrew/anaconda3:
- Add /opt/homebrew/anaconda3 (Apple Silicon) and /usr/local/anaconda3
(Intel) to macOS systemRoots. Homebrew's `--cask anaconda` installs
to these prefixes, which sit outside /opt/homebrew/lib. Without
these roots a baseline scan misses the base env's conda-meta
records entirely, even though it's the most common conda install
location on macOS. filterExistingRoots drops the entries on hosts
that lack the install, matching how /Library/Python is handled.
- Raise the default --max-file-size from 5 MiB to 16 MiB. The 5 MiB
default was set against PyPI dist-info METADATA sizes (typically
well under 100 KiB); conda-meta records, by contrast, enumerate
every file shipped by a package and can reach several MiB on
bundles like google-cloud-sdk. On the test host a real install
skipped a 16.5 MiB conda-meta record because the cap was 5 MiB.
16 MiB covers the common case while still rejecting truly
pathological inputs; operators can pass --max-file-size to tune
further.
The selftest's hardcoded MaxFileSize is bumped to match the new CLI
default so the two stay coupled.
Co-Authored-By: Claude <noreply@anthropic.com>
…catalog Catalogs the GHSA-vwfh-m3q7-9jpw dependency-confusion RCE in conda-forge-metadata <=0.4.1. The package declares an optional dependency on `conda-oci-mirror` (an unregistered PyPI name) under its `[oci]` extras; an attacker who claimed that PyPI name before conda-forge did could RCE on anyone running `pip install conda-forge-metadata[oci]`. The fix was applied upstream by registering the placeholder name, so affected installed releases (0.3.0 and 0.4.1 — the only two releases at or below 0.4.1 per PyPI history) remain useful to flag on inventory scans. Ecosystem is `pypi` because conda-forge-metadata ships via PyPI rather than via the conda-forge channel. This is the first catalog covering the conda/pixi tooling supply chain; pixi users picking up conda tooling via pixi.lock's pypi section, or anyone with a `pip install conda-forge-metadata` in their environment, would surface here. Co-Authored-By: Claude <noreply@anthropic.com>
Adds threat_intel/conda-tooling-7asecurity-2025-06-14.json covering the three conda-channel-package CVEs published 2025-06-14 from the 7ASecurity OSTIF/STA-sponsored conda-forge audit (March-April 2025): - CVE-2025-32798 / GHSA-6cc8-c3c9-3rgr: conda-build <=25.3.2 arbitrary code execution via unsafe evaluation of malicious recipe selectors - CVE-2025-32799 / GHSA-h499-pxgj-qh5h: conda-build <=25.3.2 Tarslip path traversal via crafted tar entry paths - CVE-2025-49824 / GHSA-2xf4-hg9q-m58q: conda-smithy <=3.47.0 RSA PKCS#1 v1.5 padding-oracle in travis_encrypt_binstar_token All three are ecosystem:"conda" — the PyPI namesakes are inert placeholders that point readers to the conda channel, so these advisories only match against conda-meta records produced by the conda scanner added in PR perplexityai#36. Affected version arrays enumerate every conda-forge release at or below each advisory's "<=X.Y.Z" cutoff per the project convention (112 conda-build versions, 228 conda-smithy versions), pulled from the anaconda.org channel API. Match coverage verified locally: every enumerated version produces the expected hit count (1 or 2 per record), patched versions and current releases produce zero hits, wrong-ecosystem records do not match, and case-insensitive name normalization works through the existing exposure-catalog lowercase fallback. The broader 7ASecurity audit also produced infrastructure-level CVEs (CVE-2025-31484 anaconda.org token exposure, CVE-2025-49823 staged-recipes weak permissions, CVE-2025-32784/-32797 conda-smithy CI hardening) which are not catalogable as on-disk package presence; see https://conda-forge.org/blog/2025/07/16/security-audit/ for the full audit summary. Co-Authored-By: Claude <noreply@anthropic.com>
Audit pass over the conda branch surfaced two issues worth fixing before merge: 1. channelFromURL mis-handled two common real-world shapes. On every pixi env scanned during local verification the `channel` field was `https://conda.anaconda.org/conda-forge/` (trailing slash, no subdir); the old "second-to-last segment" heuristic returned the hostname `conda.anaconda.org` instead of the channel name `conda-forge`. Likewise the bare-string `channel: "pypi"` that some mamba/micromamba versions write (instead of the schannel field) returned the empty string, so pip-installed packages were silently mis-attributed as `package_manager=conda`. Rewritten to strip scheme+host explicitly and return the first path segment, which is always the channel; subdir, when present, is always trailing. New unit tests cover URLs with/without subdir, trailing-slash URLs, non-anaconda hosts, bare-string channels, and the "schannel-absent + channel=pypi" pip-attribution path that exercises the fix end-to-end. 2. The bumblebee selftest had no conda coverage. The fleet rollout smoke test the README advertises would not have caught a regression in conda-meta detection, the normalize-Conda hookup, or the conda ecosystem's catalog-matching path. Added a conda-fixture under selftest/fixtures/ (a realistic <name>-<version>-<build>.json conda-meta record using the bumblebee-selftest-evil sentinel name), a fourth selftest catalog entry for ecosystem "conda", and bumped expectedSelftestFindings from 3 to 4. `bumblebee selftest` now reports "selftest OK (4 findings in 3ms)" on a freshly built binary. Docs updated to describe both the schannel and channel-fallback paths and to spell out the first-path-segment channel extraction. Co-Authored-By: Claude <noreply@anthropic.com>
Round of small, focused fixes surfaced by parallel code/test/catalog
maintainability reviews. None are functional regressions; all reduce
future-maintenance friction.
selftest:
- Fixture now omits `schannel` and uses a trailing-slash channel URL
(`https://conda.anaconda.org/bumblebee-selftest/`). Every live
conda-meta record on a typical pixi install has this shape, but the
previous fixture set both fields and short-circuited on `Schannel`,
leaving `channelFromURL` — the most bug-prone code path in the
conda scanner — unexercised by the fleet rollout smoke test.
catalogs:
- Rename `conda-tooling-7asecurity-2025-06-14.json` ->
`conda-tooling-2025-06-14.json`. The audit-firm slug felt dated
and inconsistent with sibling catalogs that use campaign names or
package names; the threat_intel/README row still attributes the
audit.
- Reformat the conda-build and conda-smithy version arrays from
10/line to 1/line, matching the convention used by
laravel-lang-2026-05-23.json and every other multi-version catalog.
- Drop the redundant `audit_source` key from each entry's
`indicators` block. Provenance is already captured in the
catalog-level `_comment`, and `indicators` should stay reserved
for IOCs (vectors, hashes, network markers) per the
node-ipc/trapdoor precedent.
- Split the giant single-string `_comment` into `_comment` (campaign
scope) and a new `_comment_excluded_cves` (the related-but-not-
catalogable CVE list: CVE-2025-31484, -49823, -32784, -32797,
-49842, -49843). The exclusion list was buried in 1000 characters
of prose; now it's discoverable.
docs:
- threat_intel/README.md gains a "Catalog entry fields" section
documenting the four required keys + optional severity/name, and
explicitly calls out that everything else (cve, ghsa, cvss,
published, patched_version, indicators, _comment, source, ...) is
free-form documentation the loader silently drops. Previously this
distinction lived only as Go comments on the Entry struct.
- docs/inventory-sources.md's prose paragraph describing the conda
channel-extraction rules is replaced with a one-line pointer to
the channelFromURL doc-comment in
internal/ecosystem/conda/conda.go, which already enumerates every
URL shape with examples. Removes a drift point between code and
docs.
- README's coverage table row for Conda/pixi drops the
"(conda, mamba, micromamba, pixi all share this layout)"
parenthetical to match the wording shape of the other eight rows;
the manager list is still documented in inventory-sources.md.
- README gains a parallel "Example conda package record" details
block alongside the existing pnpm one, so downstream consumers
have a worked example of `source_type=conda-meta` /
`package_manager=conda|pip` payloads.
roots:
- macOS Homebrew anaconda roots are now glob-discovered
(`/opt/homebrew/anaconda*`, `/usr/local/anaconda*`) instead of
hardcoded `anaconda3`. Matches the existing pattern used for
`/usr/lib/python*` on Linux, and picks up versioned variants
(`anaconda3-2024.02`, etc.) without future edits.
tests:
- TestScanCondaMetaRecord_MalformedJSON now asserts that the
returned error message references the offending file path, not
just that an error was returned. Locks in the error-wrapping
contract so a future refactor that drops the path wrap can't
silently degrade the orchestrator's downstream diagnostics.
Larger architectural items raised by the audit (centralised dispatch
table, extracted readBounded helper, typed PackageManager enum,
per-ecosystem MaxFileSize, conda coverage in scanner_integration_test)
are deferred to follow-up PRs as cross-ecosystem refactors rather than
threading them through the conda diff.
Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds conda/pixi as bumblebee's 9th ecosystem, wires up baseline roots so a default scan picks up the common install layouts, and ships two threat-intel catalogs that exercise the new code path.
What's in the diff
Scanner.
internal/ecosystem/condaparses<env>/conda-meta/<name>-<version>-<build>.json— the canonical install record conda, mamba, micromamba, and pixi all share. Each record carries the exact name, version, build, and channel emitted by the package builder, so a parsed record is high-confidence proof of an installed version. Pip-installed packages that conda records under the sameconda-meta/directory (channel =pypi) surface aspackage_manager=pipso receivers can tell them apart from conda-channel installs.Roots. Baseline now picks up
~/.pixi,~/miniconda3,~/anaconda3,~/miniforge3,~/mambaforge,~/micromamba, and the Homebrew--cask anacondaprefixes (/opt/homebrew/anaconda*,/usr/local/anaconda*— glob-discovered so versioned variants are caught too).File-size default. Bumped
--max-file-sizefrom 5 MiB → 16 MiB. conda-meta records enumerate every file a package ships and reach 16+ MiB on bundles likegoogle-cloud-sdk; the old default silently skipped them.Catalogs. Two new files in
threat_intel/:conda-forge-metadata-2025-03-04.jsonpypiconda-forge-metadata≤ 0.4.1 via the unregisteredconda-oci-mirroroptional depconda-tooling-2025-06-14.jsoncondaconda-build≤ 25.3.2 (recipe-selector RCE + Tarslip), CVE-2025-49824 againstconda-smithy≤ 3.47.0 (RSA padding-oracle) — all from the 7ASecurity / OSTIF / STA conda-forge auditconda-buildandconda-smithyship only via conda-forge (their PyPI namesakes are inert placeholders pointing readers to the conda channel), so those advisories are properlyecosystem: condaand rely on this PR's scanner to fire.Proof it works
Live scan against a real macOS host with a Homebrew Anaconda install plus pixi envs:
Patched installs (
conda-build 26.3.0,conda-smithy 3.59.0in pixi envs) produced zero findings.bumblebee selftestreportsselftest OK (4 findings in 3ms)with a new conda fixture that exercises thechannelFromURLfallback — the most bug-prone code path here, since every real conda-meta record on a typical pixi host omits theschannelfield the parser would otherwise short-circuit on.What's not here, and why
No conda "campaign" catalog. I looked — there isn't one to cite. conda-forge's Apr 2025 token-exposure postmortem found no shipped malicious packages, the Jul 2025 audit findings are infrastructure hardening, conda-forge was unaffected by xz CVE-2024-3094, and the Shai-Hulud / Mini-Shai-Hulud / TrapDoor campaigns are npm + PyPI with no public record of conda republication. When the next cross-ecosystem campaign hits a conda-forge feedstock, mirroring affected names under
ecosystem: condawill be a few-line follow-up.No
pixi.lock/pixi.tomlparsing. conda-meta is the authoritative installed-state source and is shared by every conda-compatible tool; a lockfile parser would need hand-rolled YAML (à la pnpm) and is its own PR.Cross-ecosystem refactors deferred. A 3-agent maintainability audit on the final commit surfaced items worth doing — centralising the scanner dispatch, extracting
readBounded(now duplicated across pypi/composer/conda), a typedPackageManageralias, per-ecosystemMaxFileSize, and conda cases inscanner_integration_test.go. Each touches all 9 ecosystems, so they belong in focused follow-up PRs rather than threading through this diff.Security review
Multi-agent review on this branch returned zero HIGH/MEDIUM findings. Six concerns were investigated and cleared: symlink behaviour in walk dispatch, JSON-decode robustness against attacker input, NDJSON output injection, walk-dispatch case ordering, credential exposure via the new conda roots, and
readBoundedparity with the established pypi/composer precedent.Test plan
go build ./...,go test ./...,go vet ./...,gofmt -l .cleanbumblebee selftest— 4 findings, including the conda fixture that exerciseschannelFromURL--exposure-catalogwithout parse errorsSupersedes #37 (closed; commits cherry-picked here).