Skip to content

docs(evidence): add H100 GKE COS and GB200 EKS Ubuntu training attestations#1368

Merged
mchmarny merged 2 commits into
mainfrom
feat/nkx-12124-recipe-evidence
Jun 16, 2026
Merged

docs(evidence): add H100 GKE COS and GB200 EKS Ubuntu training attestations#1368
mchmarny merged 2 commits into
mainfrom
feat/nkx-12124-recipe-evidence

Conversation

@atif1996

Copy link
Copy Markdown
Contributor

Summary

Adds signed evidence pointers for two training recipes validated on real hardware:

  • h100-gke-cos-training: GKE H100 (europe-west4), GPU Operator v26.x, GKE COS
  • gb200-eks-ubuntu-training: EKS GB200 (us-east-1), Ubuntu 24.04, kernel 6.14

Both bundles are signed via Sigstore keyless signing and published to ghcr.io/atif1996/aicr-evidence.

Validation coverage

  • Deployment phase: operator-health, expected-resources, gpu-operator-version, nvidia-smi
  • Conformance phase: dra-support, gang-scheduling, accelerator-metrics, ai-service-metrics, pod-autoscaling, cluster-autoscaling, gpu-operator-health, platform-health
  • Performance phase (H100): nccl-all-reduce-bw; (GB200): nccl-all-reduce-bw-net + nccl-all-reduce-bw-nvls

Rekor log entries

Relates to #1354

@atif1996 atif1996 requested a review from a team as a code owner June 15, 2026 17:12
@atif1996 atif1996 changed the title feat(evidence): add H100 GKE COS and GB200 EKS Ubuntu training attestations docs(evidence): add H100 GKE COS and GB200 EKS Ubuntu training attestations Jun 15, 2026
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 2d086987-ee68-46cd-85c3-58072b203755

📥 Commits

Reviewing files that changed from the base of the PR and between 93f77a0 and bb15024.

📒 Files selected for processing (2)
  • recipes/evidence/gb200-eks-ubuntu-training.yaml
  • recipes/evidence/h100-gke-cos-training.yaml

📝 Walkthrough

Walkthrough

Two new YAML attestation manifests are added under recipes/evidence/: one for gb200-eks-ubuntu-training and one for h100-gke-cos-training. Each file uses schemaVersion: 1.0.0, declares the associated recipe name, and contains a single attestations entry. Each entry records an attestedAt timestamp, a bundle with an OCI image reference and sha256 digest and a predicateType URL, and signer metadata including identity, issuer, and rekorLogIndex.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding evidence attestation manifests for two training recipes (H100 GKE COS and GB200 EKS Ubuntu).
Description check ✅ Passed The description is directly related to the changeset, providing clear context about the evidence manifests, validation details, signing approach, and Rekor log references.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/nkx-12124-recipe-evidence

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Recipe evidence check

Affected leaf overlays: 2

Recipe Pointer Verify Digest match
gb200-eks-ubuntu-training ✅ present ✅ passed ⚠️ stale (9aeea19f5b75… vs current 07a8ff1e9d4b…)
h100-gke-cos-training ✅ present ✅ passed ⚠️ stale (b82cade51fd2… vs current 064f27fa945d…)

How to refresh evidence

Run on a cluster matching the recipe's criteria:

aicr snapshot -o snapshot.yaml
aicr validate \
  -r recipes/overlays/<slug>.yaml \
  -s snapshot.yaml \
  --emit-attestation ./out \
  --push ghcr.io/<your-fork>/aicr-evidence
cp ./out/pointer.yaml recipes/evidence/<slug>.yaml

This gate is warning-only and never blocks merge. See ADR-007 for the trust model.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Coverage Report ✅

Metric Value
Coverage 77.1%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-77.1%25-green)

No Go source files changed in this PR.

@atif1996 atif1996 self-assigned this Jun 15, 2026
…ations

Adds signed evidence pointers for:
- h100-gke-cos-training: validated on GKE H100 (europe-west4), GPU Operator v26.x
- gb200-eks-ubuntu-training: validated on EKS GB200 (us-east-1), Ubuntu 24.04, kernel 6.14

Both bundles signed via Sigstore keyless (Rekor log indices 1826548328, 1826568564).

Relates to #1354
@atif1996 atif1996 force-pushed the feat/nkx-12124-recipe-evidence branch from 93f77a0 to bb15024 Compare June 15, 2026 19:59

@mchmarny mchmarny left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointer files match the schema 1.0 contract in pkg/evidence/attestation/types.go cleanly. Two non-blocking notes: (1) the evidence gate flags both recipe digests as stale — the material-slice canonicalizer sees a recipe change since capture, so this evidence attests to an older recipe than what's merging; consider regenerating against current main or confirm the drift is fine. (2) Bundles live in a personal GHCR namespace — fine per ADR-007's signer-identity trust model, just confirm the package is public so verification stays reproducible. Nothing blocks merge.

Comment thread recipes/evidence/gb200-eks-ubuntu-training.yaml
@mchmarny mchmarny merged commit 9bd2268 into main Jun 16, 2026
137 checks passed
@mchmarny mchmarny deleted the feat/nkx-12124-recipe-evidence branch June 16, 2026 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants