Skip to content

ci: build NVIDIA GPU confidential Kata UVM image from source#31

Open
alhassankhedr-cohere wants to merge 8 commits into
coherefrom
alhassankhedr/build-kata-uvm-cohere
Open

ci: build NVIDIA GPU confidential Kata UVM image from source#31
alhassankhedr-cohere wants to merge 8 commits into
coherefrom
alhassankhedr/build-kata-uvm-cohere

Conversation

@alhassankhedr-cohere
Copy link
Copy Markdown

@alhassankhedr-cohere alhassankhedr-cohere commented May 14, 2026

Summary

Add a workflow that builds the kata-containers nvidia-gpu-confidential UVM image with our cohere-fork guest-components (attestation-agent + api-server-rest) baked in at compile time, instead of post-hoc patching the stock NVIDIA image with losetup + veritysetup format (which is what fortress/scratch/oci-b200/k8s/06-patch-uvm.sh has been doing).

Mechanics

  1. Check out kata-containers @ inputs.kata_ref (default 3.30.0).
  2. Rewrite versions.yaml: point externals.coco-guest-components.url and .version at cohere-ai/guest-components @ <gc_ref> (resolved to a SHA via git ls-remote so the build is reproducible).
  3. make rootfs-image-nvidia-gpu-confidential-tarball — kata's existing build infrastructure clones our fork into the coco-guest-components builder container, statically builds AA + api-server-rest + CDH, and nvidia_rootfs.sh::coco_guest_components() copies them into the rootfs at /usr/local/bin/. From there the standard rootfs assembly + dm-verity formatting runs unchanged.
  4. Extract the .image and root_hash file from the tarball, surface dm-verity params (root_hash, salt, data_blocks, block sizes) and the image sha256 as a measurements.json layer.
  5. zstd -19 the .image, push to GHCR via oras as a 3-layer artifact with annotations covering build provenance + verity params.
  6. SLSA build provenance attestation.

Output

ghcr.io/cohere-ai/cloud-api-adaptor/kata-uvm-nvidia-gpu-confidential:<tag>

where <tag> is cohere-latest for branch pushes, kata-${kata_ref}-gc-${gc_ref} for workflow_dispatch, or the literal tag for kata-uvm-v* tag pushes.

Companion host-side script

fortress/scratch/oci-b200/k8s/08-install-uvm.sh (already pushed) consumes this artifact: it pulls, verifies sha256 against measurements.json, and rewrites kernel_verity_params in the kata config from the manifest. No host veritysetup needed. This replaces 06-patch-uvm.sh for production.

Why this is strictly better than the patch path

06-patch-uvm.sh (patch) build-kata-uvm-cohere + 08-install-uvm.sh (build)
Where binaries come from Built locally, copied in via losetup + bind mount Built from source by the same kata machinery as upstream, baked into the rootfs
dm-verity correctness Re-run veritysetup format on the host, write the new hash chain in place, then update kata config Verity computed once at build time, surfaced as annotations; host just pastes the values
Reproducibility Depends on what's in /tmp/attestation-agent on the host Pinned to kata_ref + gc_ref (resolved to SHA at build time)
Provenance None SLSA attestation + GHCR-signed
Host failure modes "ContainerCreating context deadline exceeded" if config gets out of sync; needs 05-fix-uvm-verity.sh recovery sha256 mismatch fails the install before touching anything live

Pre-merge cleanup required

The on.push.branches block temporarily includes alhassankhedr/build-kata-uvm-cohere so the workflow can auto-trigger on this PR's push for end-to-end validation. Please remove that branch entry before merging — only cohere should remain in the final form.

Test plan

  • actionlint clean
  • YAML parses
  • Workflow run on this PR completes successfully and produces a valid OCI artifact at ghcr.io/cohere-ai/cloud-api-adaptor/kata-uvm-nvidia-gpu-confidential:<tag>
  • 08-install-uvm.sh on a B200 host pulls the artifact, swaps the UVM image, and a Kata TDX pod boots
  • 07-test-ita-attestation.sh against that pod returns nvgpu_overall: true

Note

Medium Risk
Introduces a new release workflow that builds and publishes a bootable Kata UVM + paired kernel to GHCR; failures or misconfiguration could block releases or publish incompatible images, but it does not change runtime application code.

Overview
Adds a new GitHub Actions workflow (.github/workflows/build-kata-uvm-cohere.yaml) to build the Kata nvidia-gpu-confidential UVM image from source with Cohere’s guest-components baked in, instead of patching a stock image.

The workflow checks out kata-containers, rewrites versions.yaml to point at a pinned guest-components ref (optionally overriding the NVIDIA driver and NVAT SDK pins), builds rootfs-image-nvidia-gpu-confidential-tarball, and extracts the .image + dm-verity root_hash.

It then stages the paired kernel artifacts, generates a measurements.json (verity params + image/kernel hashes), compresses the image, pushes everything to GHCR as an OCI artifact via oras with annotations, and publishes SLSA provenance attestation.

Reviewed by Cursor Bugbot for commit a55b3ac. Bugbot is set up for automated code reviews on this repo. Configure here.

Add a workflow that builds the kata-containers
nvidia-gpu-confidential UVM image with our cohere-fork
guest-components (attestation-agent + api-server-rest) baked in
*at compile time*, instead of post-hoc patching the stock NVIDIA
image with `losetup` + `veritysetup format` (which is what
fortress/scratch/oci-b200/k8s/06-patch-uvm.sh has been doing).

Mechanics:
  1. Check out kata-containers @ inputs.kata_ref (default 3.30.0).
  2. Rewrite versions.yaml: point externals.coco-guest-components.url
     and .version at cohere-ai/guest-components @ <gc_ref> (resolved
     to a SHA via git ls-remote so the build is reproducible).
  3. `make rootfs-image-nvidia-gpu-confidential-tarball` — kata's
     existing build infrastructure clones our fork into the
     coco-guest-components builder container, statically builds AA +
     api-server-rest + CDH, and nvidia_rootfs.sh::coco_guest_components()
     copies them into the rootfs at /usr/local/bin/. From there the
     standard rootfs assembly + dm-verity formatting runs unchanged.
  4. Extract the .image and root_hash file from the tarball, surface
     dm-verity params (root_hash, salt, data_blocks, block sizes) and
     the image sha256 as a measurements.json layer.
  5. zstd -19 the .image, push to GHCR via oras as a 3-layer artifact
     with annotations covering build provenance + verity params.
  6. SLSA build provenance attestation.

Output: ghcr.io/cohere-ai/cloud-api-adaptor/kata-uvm-nvidia-gpu-confidential:<tag>
where <tag> is `cohere-latest` for branch pushes, `kata-${kata_ref}-gc-${gc_ref}`
for workflow_dispatch, or the literal tag for `kata-uvm-v*` tag pushes.

Companion host-side install script lives at
fortress/scratch/oci-b200/k8s/08-install-uvm.sh: it pulls this artifact,
verifies sha256 against measurements.json, and rewrites
kernel_verity_params in the kata config from the manifest. No host
veritysetup needed.

NOTE: this commit also temporarily adds
`alhassankhedr/build-kata-uvm-cohere` to `on.push.branches` so we can
validate end-to-end on the PR branch before merge. That entry must be
removed before this lands on cohere.
Copy link
Copy Markdown

@github-advanced-security github-advanced-security AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zizmor found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

# TEMPORARY: enable end-to-end validation of the workflow on the
# feature branch before merge. Remove this entry as part of the
# final review; only `cohere` should remain.
- "alhassankhedr/build-kata-uvm-cohere"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary feature branch trigger left in workflow

Medium Severity

The branch alhassankhedr/build-kata-uvm-cohere is included in the on.push.branches trigger for end-to-end validation during the PR. The PR description explicitly states "Please remove that branch entry before merging — only cohere should remain." If merged as-is, every push to that feature branch will trigger a full ~3-hour UVM build and push an artifact to GHCR.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2d80833. Configure here.

set -eux
git clone --depth 1 --branch "${{ needs.meta.outputs.kata_ref }}" \
"${{ inputs.kata_repo || 'https://github.com/kata-containers/kata-containers.git' }}" \
/tmp/kata
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHA input for kata_ref breaks shallow clone

Medium Severity

The kata_ref input is documented as accepting "tag, branch, or SHA", but git clone --depth 1 --branch only accepts branch and tag names — not commit SHAs. Providing a SHA causes git to error with "Remote branch not found in upstream origin", failing the entire build.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2d80833. Configure here.

jq -n \
--arg kata_ref "${{ needs.meta.outputs.kata_ref }}" \
--arg gc_repo "${{ needs.meta.outputs.gc_repo }}" \
--arg gc_ref "${{ needs.meta.outputs.gc_ref }}" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved guest-components SHA missing from provenance metadata

Medium Severity

The gc_ref is resolved to an immutable SHA via git ls-remote in the "Override coco-guest-components" step for build reproducibility, but this resolved SHA is never written to $GITHUB_OUTPUT. Both measurements.json and the OCI annotations record the original mutable ref (e.g., cohere) instead of the pinned SHA, undermining the reproducibility goal stated in the code comments.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2d80833. Configure here.

# TEMPORARY: enable end-to-end validation of the workflow on the
# feature branch before merge. Remove this entry as part of the
# final review; only `cohere` should remain.
- "alhassankhedr/build-kata-uvm-cohere"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Agentic Security Review
Severity: HIGH

The workflow still triggers on a temporary feature branch and branch pushes can publish to the stable cohere-latest image tag. That expands artifact publish authority beyond the intended protected branch and makes it possible to overwrite a trusted mutable tag from non-release branch pushes.

Impact: if an attacker can push to this branch, they can publish a malicious UVM image under cohere-latest, creating a supply-chain compromise risk for downstream consumers.

Companion to fortress's k8s/ script reordering. The CI workflow's
header comments and the GHCR step summary now point at the new
numbering (05-install-uvm.sh) and reference the legacy patch path
(08-patch-uvm.sh) by its new number too.
kata 3.30+ nvidia_chroot.sh runs with set -u and only assigns
driver_version when NVIDIA_GPU_STACK contains a literal `driver=<ver>`
component. Without it the rootfs-assembly stage dies at the very last
step with `driver_version: unbound variable`, after the runner has
already done ~45 minutes of work (agent, busybox, pause-image,
coco-guest-components, kernel-nvidia-gpu).

This is exactly how run 25877534335 failed. Fix: derive the driver
pin from .assets.nvidia.driver.version in kata's own versions.yaml
and prepend driver=<ver> to NVIDIA_GPU_STACK in the build step.
Auto-tracks kata_ref.
@alhassankhedr-cohere alhassankhedr-cohere force-pushed the alhassankhedr/build-kata-uvm-cohere branch from 426dcd9 to fccd4d7 Compare May 14, 2026 19:24
git make curl ca-certificates jq python3 python3-pip
# Ensure yq is present (kata's build scripts rely on it).
if ! command -v yq >/dev/null 2>&1; then
sudo curl -fsSL -o /usr/local/bin/yq \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Agentic Security Review
Severity: HIGH

This workflow downloads executable binaries (yq and oras) from GitHub release URLs and runs them without any integrity verification (checksum/signature/provenance check).

Impact: a compromised upstream release asset could execute arbitrary code in a job with packages: write and id-token: write, enabling malicious image publication or credential/token abuse.

- name: Checkout kata-containers @ ${{ needs.meta.outputs.kata_ref }}
run: |
set -eux
git clone --depth 1 --branch "${{ needs.meta.outputs.kata_ref }}" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Agentic Security Review
Severity: HIGH

workflow_dispatch inputs are injected directly into a shell run script via GitHub expression interpolation at clone time (${{ needs.meta.outputs.kata_ref }} and ${{ inputs.kata_repo }}). Because expressions are rendered before shell parsing, crafted values can trigger command substitution and execute arbitrary commands in this privileged job.

Impact: a caller who can dispatch this workflow can run attacker-controlled commands and publish malicious artifacts with trusted GHCR + provenance permissions (packages: write, id-token: write).

Two bugs in the "Extract" / "Surface verity params" steps that together
caused the workflow to abort with `jq: error ... Expected JSON value
(while parsing '')` (exit 5) and would also have produced a junk
artifact even if jq had not failed:

1. root_hash.txt is a SINGLE comma-separated line written by kata's
   osbuilder, not five newline-separated key=value lines. The previous
   `awk -F'=' '/^salt=/ {print $2}'` parsers therefore returned empty
   strings for everything except root_hash (and even that came out with
   a trailing ",salt"), which crashed jq's `tonumber` on data_blocks.
   Replace with a single comma-split + case dispatch, plus regex
   sanity checks so a future format change fails loudly.

2. The .img inside the tarball is a symlink to the versioned .image
   alongside it. The previous `mv` only relocated the symlink, then
   `rm -rf opt/` deleted the underlying file. Resolve via `readlink -f`
   and `cp` the real file before tearing the directory down. Add a
   minimum-size assertion (>100 MiB) so a dangling symlink is caught
   immediately rather than producing measurements.json with bytes=57.

Also tightens the shell with `set -euxo pipefail` and a `jq -e .`
validation of the produced measurements.json.
…iver

Kata 3.30.0 pins driver=595.58.03 in versions.yaml, but on 8x B200 OCI
hosts that driver hits a fabric-probe race where RmGpuFabricProbe times
out and fail-stops GPU init. The fix landed in 595.71.05 (which is also
the version present in the working mkosi-built images).

This adds an optional workflow_dispatch input `kata_nvidia_driver_ver`.
When set (e.g. to 595.71.05), the build:

- Rewrites .externals.nvidia.driver.version in kata's versions.yaml
  before the rootfs build, so the pin flows through to both
  open-gpu-kernel-modules (cloned from the GitHub tag) and the
  nvidia-driver-pinning-<ver> apt package.
- Surfaces the override in the OCI tag (kata-...-drv-<ver>), the
  com.cohere.kata-uvm.nvidia-driver annotation, measurements.json's
  new nvidia_driver.version field, and the job summary.

When unset, the build behaves exactly as before. measurements.json
always reflects the *actually baked-in* driver (read from the
post-rewrite versions.yaml) rather than the requested input, so it
stays truthful when the override is empty.

Mirrors the same mechanic in fortress/scratch/oci-b200/k8s/04-build-uvm-locally.sh.
--annotation "org.opencontainers.image.created=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--annotation "com.cohere.caa.commit=${GITHUB_SHA}" \
--annotation "com.cohere.kata.ref=${{ needs.meta.outputs.kata_ref }}" \
--annotation "com.cohere.guest-components.repo=${{ needs.meta.outputs.gc_repo }}" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Agentic Security Review
Severity: HIGH

workflow_dispatch inputs are interpolated directly into this shell run script via GitHub expressions. If a caller provides a value containing shell substitution syntax (for example $(...)) in gc_repo or gc_ref, it is rendered into the script before execution and can execute attacker-controlled commands.

Impact: a user able to dispatch this workflow can run arbitrary commands in a job with packages: write and id-token: write, enabling malicious image publication and provenance abuse.

The plain `cohere` branch of guest-components has a `count == 1` guard
in `nvidia-attester::detect_platform()` that silently disables the
attester on multi-GPU systems. Multi-GPU pods on 8x B200 boot fine but
`/aa/additional_evidence` returns empty, which looks like a build issue
but is actually the userspace attester refusing to register.

Upstream main has a complete rewrite of nvidia-attester on top of the
NVAT SDK (no `count == 1` check). PR #9 in cohere-ai/guest-components
syncs that rewrite into our fork. Until PR #9 merges into `cohere`,
default `gc_ref` to `alhassankhedr/sync-main-to-cohere` so kata UVM
builds out of this workflow have a working multi-GPU attester.

Switch back to `cohere` once PR #9 is merged.
Adds a `kata_nvat_ver` workflow_dispatch input (default 2026.03.02) that
rewrites `.externals.nvidia.nvat.{version,url,desc}` in kata's
versions.yaml before the rootfs build.

Why this matters: kata's
tools/packaging/static-build/coco-guest-components/build.sh forwards
NVAT_VERSION from versions.yaml to the GC builder Dockerfile. The
Dockerfile gates the entire libnvat clone+cmake+install behind
`if [ -n "${NVAT_VERSION}" ]`, and upstream kata 3.30.0 ships *without*
that key set. Net effect on the cohere fork's UVM:

* libnvat is never built into the GC builder image.
* build-static-coco-guest-components.sh's second AA build pass — the
  one that compiles `attestation-agent` with `nvidia-attester` against
  /usr/local/lib/libnvat.so and installs the result as
  /usr/local/bin/attestation-agent-nv — silently no-ops because the
  required system lib is missing.
* The rootfs ends up with only the standard, non-NVIDIA AA. Symbol
  fingerprint of the installed UVM confirms it: zero `nvmlDeviceGetCount`,
  zero `nv_attestation_sdk`, zero `libnvat`.
* `/aa/additional_evidence` returns empty on multi-GPU pods regardless
  of which guest-components branch we baked. ITA appraisal can never
  see `nvgpu_overall: true`.

Pins 2026.03.02 to match the version the podvm-mkosi side already
builds against (NVAT_TAG in cloud-api-adaptor's
Dockerfile.podvm_binaries.ubuntu).

Tag, measurements.json, and OCI annotations all surface the pin so the
binding is inspectable from the registry (`-nvat-<ver>` tag suffix,
`nvat_sdk.version` field, `com.cohere.kata-uvm.nvat-sdk` annotation).
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d5e0166. Configure here.

GC_REPO: ${{ inputs.gc_repo || 'https://github.com/cohere-ai/guest-components.git' }}
GC_REF: ${{ inputs.gc_ref || 'alhassankhedr/sync-main-to-cohere' }}
DRIVER_VER: ${{ inputs.kata_nvidia_driver_ver || '' }}
NVAT_VER: ${{ inputs.kata_nvat_ver || '2026.03.02' }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty kata_nvat_ver input silently overridden by fallback

Medium Severity

The input description for kata_nvat_ver says "Set "" to leave nvat unpinned" but the || operator on line 142 (${{ inputs.kata_nvat_ver || '2026.03.02' }}) treats empty strings as falsy, so an explicit "" gets replaced with '2026.03.02'. This makes it impossible to disable the NVAT SDK pin via workflow_dispatch, contradicting the documented behavior.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d5e0166. Configure here.

kata's kernel-nvidia-gpu build emits a fresh random
certs/signing_key.pem per invocation; the NVIDIA modules baked into
kata-static-kernel-nvidia-gpu-modules.tar.zst (and therefore into the
rootfs) are signed against THAT key. If the host launches our UVM
against a kernel from a different build (e.g. the kata-deploy-bundled
one), every NVIDIA .ko is rejected at first modprobe with "Loading of
unsigned module is rejected", NVRC panics in src/execute.rs:24:9, the
guest powers down, and pods sit in Pending forever. Verified
end-to-end on the B200 host on 2026-05-15 (README "Bug F").

The host-side fix lives in fortress's 05-install-uvm.sh, which
atomically installs both the rootfs symlink and the kernel binary. For
that to work, the OCI artifact has to ship the kernel. Mirror the
local build pipeline (04-build-uvm-locally.sh) here:

  * Force a clean kernel + modules + rootfs rebuild whenever
    kata_nvidia_driver_ver is overridden, so kata's make can't reuse a
    cached kernel-nvidia-gpu builddir whose embedded signing key
    doesn't match the new modules tarball.

  * After "Build rootfs", stage the locally-built vmlinuz (+ vmlinux,
    System.map, config) into /tmp/uvm-out alongside the rootfs and
    write kernel.basename as a single source of truth for the install
    side.

  * Add a defensive signing-key sanity check that extracts the SKID
    from kernel-nvidia-gpu/builddir/.../certs/signing_key.x509 and
    confirms it appears in the trailing PKCS#7 signature of nvidia.ko.
    Fails the build if the modules tarball is signed by a different
    key than the kernel embeds.

  * Extend measurements.json with .kernel.{filename,sha256} so
    05-install-uvm.sh can validate the kernel post-pull.

  * Push the kernel files (vmlinuz/vmlinux/System.map/config and
    kernel.basename) into the OCI artifact with media type
    application/vnd.cohere.kata-uvm.kernel+octet-stream, and surface
    the kernel-basename + kernel-sha256 as OCI annotations.

After this, the UVM artifact is self-contained: pulling and installing
it places a kernel and rootfs that share a signing key, so guest
modprobe of nvidia.ko / nvidia-uvm.ko / nvidia-modeset.ko / nvidia-drm.ko
/ nvidia-peermem.ko succeeds and NVRC boots cleanly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants