diff --git a/content/posts/2026-05-05-yocto-sbom-deep-dive-introduction.md b/content/posts/2026-05-05-yocto-sbom-deep-dive-introduction.md index c594e45..748fe74 100644 --- a/content/posts/2026-05-05-yocto-sbom-deep-dive-introduction.md +++ b/content/posts/2026-05-05-yocto-sbom-deep-dive-introduction.md @@ -73,6 +73,6 @@ Yocto's approach was designed for exactly this level of fidelity. The next posts - Part 1: How Yocto Generates SBOMs Behind the Scenes _(this post)_ - Part 2: [A Deep Dive into Yocto's SPDX 2.2 Pipeline](/2026/05/12/yocto-spdx-2-2-pipeline/) -- Part 3: SPDX 3.0 in Yocto: What Changed and Why It Matters _(coming soon)_ -- Part 4: VEX in the SBOM: How Yocto Embeds Vulnerability Assessments _(coming soon)_ +- Part 3: [SPDX 3.0 in Yocto: What Changed and Why It Matters](/2026/05/19/yocto-spdx-3-0-overview/) +- Part 4: [VEX in the SBOM: How Yocto Embeds Vulnerability Assessments](/2026/05/26/yocto-vex-spdx-3-0/) - Part 5: Yocto SBOM in Production: Configuration, Tooling, and What's Still Missing _(coming soon)_ diff --git a/content/posts/2026-05-12-yocto-spdx-2-2-pipeline.md b/content/posts/2026-05-12-yocto-spdx-2-2-pipeline.md index dd8edc4..743023a 100644 --- a/content/posts/2026-05-12-yocto-spdx-2-2-pipeline.md +++ b/content/posts/2026-05-12-yocto-spdx-2-2-pipeline.md @@ -160,6 +160,6 @@ Each file within the package gets a detailed entry with checksums: - Part 1: [How Yocto Generates SBOMs Behind the Scenes](/2026/05/05/yocto-sbom-deep-dive-introduction/) - Part 2: A Deep Dive into Yocto's SPDX 2.2 Pipeline _(this post)_ -- Part 3: SPDX 3.0 in Yocto: What Changed and Why It Matters _(coming soon)_ -- Part 4: VEX in the SBOM: How Yocto Embeds Vulnerability Assessments _(coming soon)_ +- Part 3: [SPDX 3.0 in Yocto: What Changed and Why It Matters](/2026/05/19/yocto-spdx-3-0-overview/) +- Part 4: [VEX in the SBOM: How Yocto Embeds Vulnerability Assessments](/2026/05/26/yocto-vex-spdx-3-0/) - Part 5: Yocto SBOM in Production: Configuration, Tooling, and What's Still Missing _(coming soon)_ diff --git a/content/posts/2026-05-19-yocto-spdx-3-0-overview.md b/content/posts/2026-05-19-yocto-spdx-3-0-overview.md new file mode 100644 index 0000000..68dedcb --- /dev/null +++ b/content/posts/2026-05-19-yocto-spdx-3-0-overview.md @@ -0,0 +1,129 @@ +--- +title: "SPDX 3.0 in Yocto: What Changed and Why It Matters" +description: "Part 3 of the Yocto SBOM series. SPDX 3.0 support arrived in Styhead (Yocto 5.1) with single-document JSON-LD output, first-class Build elements, native VEX support, and richer build provenance features." +author: + display_name: Joshua Watt +categories: + - guide +tags: [sbom, yocto, openembedded, spdx, spdx-3, json-ld, embedded-linux] +keywords: [yocto spdx 3.0, create-spdx-3.0 bbclass, spdx json-ld, yocto styhead spdx, build provenance sbom, spdx 3 vex] +tldr: "SPDX 3.0 support landed in Yocto Styhead (5.1) and is a major architectural leap: single-document JSON-LD output instead of tarballs, first-class Build elements with hasInput/hasOutput relationships, profile-based architecture, and native VEX support through the security profile. The trade-off is size — SBOMs can run 250 MB compressed and 2 GB uncompressed." +date: 2026-05-19 +slug: yocto-spdx-3-0-overview +--- + +SPDX 3.0 support was added in the Styhead release (Yocto 5.1) and represents a significant architectural leap. The implementation lives in `create-spdx-3.0.bbclass` with supporting libraries in `meta/lib/oe/spdx30.py` (auto-generated SPDX 3.0 bindings) and `meta/lib/oe/sbom30.py` (SBOM construction utilities). + +This is part 3 of a 5-part series on how Yocto generates SBOMs. [Part 1](/2026/05/05/yocto-sbom-deep-dive-introduction/) covered the high-level architecture and [Part 2](/2026/05/12/yocto-spdx-2-2-pipeline/) walked through the SPDX 2.2 pipeline. + +## What Changed Architecturally + +The most immediately visible difference is the output format: SPDX 3.0 uses JSON-LD (JSON for Linked Data) instead of plain JSON. This makes the documents RDF-compliant, meaning you can load them into any RDF tooling (like Python's `rdflib`) for sophisticated graph queries. The JSON-LD output also conforms to a strict JSON schema, so you do not necessarily need RDF tooling; simpler JSON parsers work just fine for most use cases. + +But the deeper changes are structural. + +**Single-document output.** Unlike SPDX 2.2's tarball of separate documents, the SPDX 3.0 implementation produces a single JSON-LD document that describes the entire image. This is possible because SPDX 3.0 uses global unique IDs for all objects, which makes the merging algorithm much simpler since it never has to worry about name collisions. The class builds up per-recipe SPDX data during the build, then merges everything into one cohesive document at image time. + +**First-class Build objects.** SPDX 2.2 had no concept of a "build." The `create-spdx-2.2` class shoehorned build information into package descriptions. SPDX 3.0 introduces `Build` as a first-class element, with proper `hasInput` and `hasOutput` relationships. This means you can express that a specific build took in some source files as input and produced some packages as output. + +**Profile-based architecture.** SPDX 3.0 documents declare which profiles they conform to. The Yocto implementation generates documents conforming to: `core`, `build`, `software`, `simpleLicensing`, and `security`. + +**Native VEX support.** This is arguably the biggest win for security-conscious teams. SPDX 3.0 natively supports VEX information through its security profile, meaning CVE data and vulnerability assessments live inside the SBOM rather than in a separate file. + +## New Variables and Configuration + +```bash +SPDX_VERSION = "3.0.0" +SPDX_PROFILES ?= "core build software simpleLicensing security" + +# Build provenance +SPDX_INCLUDE_BUILD_VARIABLES ??= "0" +SPDX_INCLUDE_BITBAKE_PARENT_BUILD ??= "0" +SPDX_INCLUDE_TIMESTAMPS ?= "0" + +# VEX control +SPDX_INCLUDE_VEX ??= "current" + +# Identity and namespacing +SPDX_UUID_NAMESPACE ??= "sbom.openembedded.org" +SPDX_NAMESPACE_PREFIX ??= "http://spdx.org/spdxdocs" +``` + +Most of the new variables control build provenance features that are disabled by default because they make the output non-reproducible (build timestamps, variable dumps, and so on). The VEX variable, however, is on by default (set to `current`), which is a deliberate choice to make vulnerability information available out of the box. + +## SPDX 3.0 Task Flow + +**`spdx30_build_started_handler`** — A BitBake event handler (not a task) that fires at the beginning of the build. If `SPDX_INCLUDE_BITBAKE_PARENT_BUILD` is set, it creates a `Build` element representing the overall BitBake invocation and writes it to `bitbake.spdx.json` in the deploy directory. This is the parent build that individual recipe builds can reference. + +**`do_create_spdx`** — Similar in purpose to its SPDX 2.2 counterpart, but the output format and data model are very different. It creates an `ObjSet` (object set), a `software_Package` element for the recipe, a `Build` element representing the recipe's build, links source files as `hasInput` relationships on the `Build`, links produced packages as `hasOutput` relationships on the `Build`, adds license information using the `simpleLicensing` profile, and processes CVE data to create VEX relationship elements. The per-recipe data is written as individual JSON-LD files to the deploy directory. + +**`do_create_package_spdx`** — A new task (not present in SPDX 2.2) that creates SPDX data for each individual package, including file-level detail for packaged files with checksums. + +**`do_create_image_spdx` / `do_create_image_sbom`** — The image-level task merges all per-recipe JSON-LD documents into a single output file. The merging algorithm loads the image recipe's own SPDX data, then for each package included in the image loads its SPDX document and its recipe's SPDX document, merges all objects into a single object set deduplicating by SPDX ID, and serializes the merged object set as a single JSON-LD document. The result is a single `IMAGE-MACHINE.spdx.json` file in `tmp/deploy/images/MACHINE/`. + +## Build Provenance Features in SPDX 3.0 + +**Build Variables** (`SPDX_INCLUDE_BUILD_VARIABLES = "1"`) — Captures every BitBake variable visible during the SPDX task and attaches it to the `Build` element. This is a lot of data, but it means you can determine exactly how a recipe was configured just from the SBOM. + +**Nested Builds** (`SPDX_INCLUDE_BITBAKE_PARENT_BUILD = "1"`) — Creates a hierarchy of `Build` elements. The top-level `Build` represents the BitBake invocation, and each recipe's `Build` is linked to it via `ancestorOf`. This is particularly useful for tracking shared state (sstate): you can see which recipes were rebuilt in a given BitBake run versus pulled from cache. + +**Agent Tracking:** + +```bash +SPDX_INVOKED_BY_name = "GitHub Actions" +SPDX_INVOKED_BY_type = "software" +SPDX_ON_BEHALF_OF_name = "Jane Developer" +SPDX_ON_BEHALF_OF_type = "person" +SPDX_ON_BEHALF_OF_id_email = "jane@example.com" +``` + +This records that your CI system ran the build on behalf of a specific person. The idea here is that GitHub Actions is the software agent that mechanically ran BitBake, but it was triggered by a pull request or tag made by a specific user. + +**Build Host Linking** (`SPDX_BUILD_HOST`) — If you have an SBOM for the host system you are building on, you can link it into the generated documents using the `hasHost` relationship. This gives you a deep supply chain that extends from the build environment itself down through your target image. + +**Package Supplier:** + +```bash +SPDX_PACKAGE_SUPPLIER_name = "Acme Corporation" +SPDX_PACKAGE_SUPPLIER_type = "organization" +``` + +All of these provenance features are disabled by default because they make the SPDX output non-reproducible. In a CI/CD environment where reproducibility of the SPDX metadata is less important than traceability, you would enable the ones relevant to your compliance requirements. + +## The Supporting Libraries + +**`oe/spdx30.py`** — Auto-generated SPDX 3.0 Python bindings, roughly 6,000 lines of code. These are generated by the `shacl2code` tool from the official SPDX 3.0 RDF model. This means the Yocto implementation automatically stays in sync with the SPDX specification, and other tools can use these same bindings to manipulate SPDX 3.0 documents. `shacl2code` can also generate C++ and Go bindings and is available as a standalone project. + +**`oe/sbom30.py`** — SPDX 3.0 SBOM assembly utilities, including the document merging algorithm and convenience methods for creating VEX relationships. + +## The Size Question + +A compressed SPDX 3.0 document for a standard Styhead distro can be around 250 MB compressed and roughly 2 GB uncompressed. This is partly because the single-document approach includes everything, and partly because the JSON-LD format with its `@context` declarations and full IRIs is more verbose than SPDX 2.2's simpler JSON. + +It is also easy to generate SPDX 3.0 output that is larger than the deliverable it describes, because compilers are very good at compressing source code into small binaries. The SBOM that describes a 50 MB root filesystem might be 500 MB of structured data. + +If you are generating a new SBOM with every release build (as you should be for traceability and compliance), you need a storage strategy for these large files. + +## Switching Between Versions + +```bash +# For SPDX 2.2 (if 3.0 is default) +INHERIT:remove = "create-spdx" +INHERIT += "create-spdx-2.2" + +# For SPDX 3.0 (if 2.2 is default) +INHERIT:remove = "create-spdx" +INHERIT += "create-spdx-3.0" +``` + +SPDX 2.2 has broader tooling support today, while SPDX 3.0 offers richer data and a more future-proof format. There are no plans to backport SPDX 3.0 support to older Yocto releases. The implementation is invasive and touches many parts of the build system. + +--- + +**Series: How Yocto Generates SBOMs Behind the Scenes** + +- Part 1: [How Yocto Generates SBOMs Behind the Scenes](/2026/05/05/yocto-sbom-deep-dive-introduction/) +- Part 2: [A Deep Dive into Yocto's SPDX 2.2 Pipeline](/2026/05/12/yocto-spdx-2-2-pipeline/) +- Part 3: SPDX 3.0 in Yocto: What Changed and Why It Matters _(this post)_ +- Part 4: [VEX in the SBOM: How Yocto Embeds Vulnerability Assessments](/2026/05/26/yocto-vex-spdx-3-0/) +- Part 5: Yocto SBOM in Production: Configuration, Tooling, and What's Still Missing _(coming soon)_ diff --git a/content/posts/2026-05-26-yocto-vex-spdx-3-0.md b/content/posts/2026-05-26-yocto-vex-spdx-3-0.md new file mode 100644 index 0000000..5a43c9c --- /dev/null +++ b/content/posts/2026-05-26-yocto-vex-spdx-3-0.md @@ -0,0 +1,127 @@ +--- +title: "VEX in the SBOM: How Yocto Embeds Vulnerability Assessments with SPDX 3.0" +description: "Part 4 of the Yocto SBOM series. How vulnerability information flows from CVE_STATUS recipe metadata into VEX relationship elements in the final SPDX 3.0 SBOM, and the kernel-specific tooling that cuts CVE false positives by 70-80%." +author: + display_name: Joshua Watt +categories: + - guide +tags: [sbom, yocto, openembedded, spdx, spdx-3, vex, cve, vulnerability-management] +keywords: [yocto vex, spdx 3 vex, yocto cve, cve_status yocto, kernel cve false positives, vex sbom embedded] +tldr: "SPDX 3.0's security profile lets Yocto embed VEX assessments directly inside the SBOM. CVE data flows from CVE_STATUS recipe metadata, patch file scanning, and upstream version checks into VexFixedVulnAssessmentRelationship, VexAffectedVulnAssessmentRelationship, and VexNotAffectedVulnAssessmentRelationship elements. Kernel CVE noise is reduced 70-80% by cross-referencing the kernel CNA database with compiled source files." +date: 2026-05-26 +slug: yocto-vex-spdx-3-0 +--- + +VEX support is one of the most compelling reasons to adopt SPDX 3.0 for your Yocto builds. This post traces exactly how vulnerability information flows from recipe metadata into VEX statements in the final SBOM. + +This is part 4 of a 5-part series on how Yocto generates SBOMs. Earlier parts covered the [overall architecture](/2026/05/05/yocto-sbom-deep-dive-introduction/), the [SPDX 2.2 pipeline](/2026/05/12/yocto-spdx-2-2-pipeline/), and the [SPDX 3.0 implementation](/2026/05/19/yocto-spdx-3-0-overview/) that makes embedded VEX possible. + +## The CVE Infrastructure That Feeds VEX + +Before getting to the SPDX output, it helps to understand the CVE infrastructure that Yocto already maintains. The `cve-check` class and its associated tooling track CVEs using several key variables. + +**`CVE_PRODUCT`** — Maps a recipe to its identifier in the CVE database. Defaults to `BPN` but can be overridden per recipe (for example, `tiff.bb` sets `CVE_PRODUCT = "libtiff"`). + +**`CVE_VERSION`** — The version string used for CVE matching. Defaults to `PV`. + +**`CVE_STATUS`** — A per-CVE variable flag that records the status of individual CVEs for a recipe. Each flag entry encodes a status mapping, a detail string, and an optional description: + +```bash +CVE_STATUS[CVE-2022-12345] = "not-applicable-config: Feature not enabled in our build" +``` + +The `cve-check` class also automatically detects CVEs that have been fixed in the upstream version being used, marking them as `fixed-version`. Additionally, patched CVEs are detected by scanning the recipe's patch files — BitBake looks for CVE identifiers in patch filenames and headers (using the `CVE:` tag in patch metadata) and the `get_patched_cves()` function collects these automatically. + +## How the SPDX 3.0 Class Processes CVE Data + +During `do_create_spdx`, the SPDX 3.0 class performs the following steps to generate VEX data. + +### Step 1: Check the VEX inclusion level + +The `SPDX_INCLUDE_VEX` variable controls how much CVE data to include: + +| Value | Behavior | +| --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `none` | Skip all VEX processing entirely. Useful if you do not care about vulnerability data in the SBOM and want faster builds. | +| `current` _(default)_ | Only include VEX data for CVEs that are not already fixed by the upstream version. This is the recommended setting because it surfaces only the CVEs that are actually relevant to your build. | +| `all` | Include every known CVE, including those already fixed upstream. This generates significantly more data, particularly for the Linux kernel, which has thousands of historical CVEs. | + +### Step 2: Collect patched CVEs + +The class calls `oe.cve_check.get_patched_cves(d)`, which scans the recipe's patch files for CVE references. Each patch file is checked for CVE identifiers in its filename and metadata. The result is a set of CVE IDs that have been addressed by patches applied in the recipe. + +### Step 3: Decode CVE status + +For each CVE, the class calls `oe.cve_check.decode_cve_status()` to extract the mapping (`Patched`, `Unpatched`, or `Ignored`), the detail string, and the description. For CVEs detected from patch files that do not have an explicit `CVE_STATUS` entry, the code falls back to a status of `Patched` with detail `fix-file-included`. + +### Step 4: Create SPDX Vulnerability and VEX elements + +For each CVE, the class creates a `security_Vulnerability` element with a unique SPDX ID based on the CVE identifier, and a VEX relationship element linking the vulnerability to the affected package: + +```python +if status == "Patched": + pkg_objset.new_vex_patched_relationship([spdx_cve._id], [spdx_package]) +elif status == "Unpatched": + pkg_objset.new_vex_unpatched_relationship([spdx_cve._id], [spdx_package]) +elif status == "Ignored": + spdx_vex = pkg_objset.new_vex_ignored_relationship( + [spdx_cve._id], [spdx_package] + ) +``` + +These correspond to the SPDX 3.0 security profile's `VexVulnAssessmentRelationship` subtypes: + +| Status | SPDX 3.0 Type | Meaning | +| ----------- | ------------------------------------------ | ----------------------------------------------------------- | +| `Patched` | `VexFixedVulnAssessmentRelationship` | The vulnerability has been patched in this package | +| `Unpatched` | `VexAffectedVulnAssessmentRelationship` | The vulnerability is unpatched and this package is affected | +| `Ignored` | `VexNotAffectedVulnAssessmentRelationship` | The vulnerability was evaluated and determined not to apply | + +Each VEX relationship carries the detail string and the human-readable description, giving downstream consumers the context they need to understand why a CVE has a particular status. + +## The Kernel: A Special Case for VEX + +The Linux kernel deserves special mention because it is by far the largest source of CVEs in any embedded Linux system. The kernel has its own CVE numbering authority (CNA), and the volume of CVEs is enormous. + +Yocto has a dedicated script, `improve_kernel_cve_report.py`, that enriches kernel CVE data using two techniques. + +It cross-references the Linux kernel CNA's vulnerability database (from `git.kernel.org`) to determine which CVEs affect specific kernel versions. And if SPDX source information is available (via `SPDX_INCLUDE_SOURCES` or `SPDX_INCLUDE_COMPILED_SOURCES`), it can check which source files were actually compiled into the kernel binary. A CVE that affects `drivers/mtd/nand/spi/core.c` is irrelevant if that file was never compiled due to kernel configuration. This technique alone can reduce kernel CVE false positives by 70–80%. + +To use this with the SPDX-based approach, you need to enable DWARF4 debug information in the kernel so BitBake can extract the list of compiled source files: + +```bash +KERNEL_EXTRA_FEATURES:append = " features/debug/debug-kernel.scc" +``` + +The output of this script feeds back into the CVE status data, which in turn flows into the VEX elements in the SPDX 3.0 output. This creates a tight loop where kernel configuration directly influences the vulnerability assessment in the SBOM. + +## VEX in Practice: What Shows Up in the Output + +In the final SPDX 3.0 document, each CVE shows up as a `security_Vulnerability` element (carrying the CVE identifier and any external references) plus one of the VEX relationship subtypes covered above (`VexFixedVulnAssessmentRelationship`, `VexAffectedVulnAssessmentRelationship`, or `VexNotAffectedVulnAssessmentRelationship`) linking that vulnerability to the affected package. The relationship element also carries the detail string and human-readable description originally captured from `CVE_STATUS`. + +The practical result: any tool capable of reading SPDX 3.0 can extract a complete picture of which CVEs affect your image, which have been patched, and which have been assessed and dismissed, all from a single document. + +## Contrast: VEX in SPDX 2.2 vs. SPDX 3.0 + +With SPDX 2.2, you get an SBOM that describes your software components, but vulnerability information must live elsewhere. You would typically run `cve-check` separately to produce a `cve-summary.json` file, and then correlate the two documents manually or with external tooling. There is no standard mechanism to embed VEX assessments in the SBOM itself. + +With SPDX 3.0, vulnerability assessments are first-class citizens in the SBOM. The security profile provides typed elements for vulnerabilities and VEX relationships, and the Yocto implementation populates these automatically from the same `CVE_STATUS` data that `cve-check` uses. The result is a single document that answers both "what is in my image?" and "which CVEs affect it, and what is their status?" + +For teams subject to regulatory requirements like the [EU Cyber Resilience Act](/compliance/eu-cra/), having integrated VEX data in the SBOM significantly simplifies compliance workflows. + +## A Note on Yocto 6.0 (Wrynose) + +Two changes are worth flagging for anyone tracking the current direction of the project: + +1. **`cve_check.bbclass` has been removed.** The `CVE_*` variables (`CVE_PRODUCT`, `CVE_VERSION`, `CVE_STATUS`, and friends) remain in the metadata and are still consumed by the SPDX class, so the recipe-level inputs described above continue to work as before. +2. **CVE scanning has moved to [`sbom-cve-check`](https://sbom-cve-check.readthedocs.io/en/latest/sbom.html).** Instead of scanning the build tree, the project now runs CVE checks against the generated SBOM itself. The result is a workflow where the SBOM is the source of truth for vulnerability assessment. + +--- + +**Series: How Yocto Generates SBOMs Behind the Scenes** + +- Part 1: [How Yocto Generates SBOMs Behind the Scenes](/2026/05/05/yocto-sbom-deep-dive-introduction/) +- Part 2: [A Deep Dive into Yocto's SPDX 2.2 Pipeline](/2026/05/12/yocto-spdx-2-2-pipeline/) +- Part 3: [SPDX 3.0 in Yocto: What Changed and Why It Matters](/2026/05/19/yocto-spdx-3-0-overview/) +- Part 4: VEX in the SBOM: How Yocto Embeds Vulnerability Assessments _(this post)_ +- Part 5: Yocto SBOM in Production: Configuration, Tooling, and What's Still Missing _(coming soon)_ diff --git a/hugo.toml b/hugo.toml index 4a6b6b5..6effe71 100644 --- a/hugo.toml +++ b/hugo.toml @@ -103,3 +103,9 @@ ignoreFiles = ["_config\\.yml$", "_config_development\\.yml$", "Gemfile", "Gemfi [build] [build.buildStats] enable = false + +# Hugo 0.162.0 tightened the default `security.allowContent` policy to disallow +# `text/html` content files. The homepage lives in `content/_index.html`, so we +# restore the prior permissive default. +[security] + allowContent = ['.*']