Skip to content

Yousef/sync main to cohere#32

Open
yousef-cohere wants to merge 104 commits into
coherefrom
yousef/sync-main-to-cohere
Open

Yousef/sync main to cohere#32
yousef-cohere wants to merge 104 commits into
coherefrom
yousef/sync-main-to-cohere

Conversation

@yousef-cohere
Copy link
Copy Markdown

@yousef-cohere yousef-cohere commented May 21, 2026

This PR introduces BYOM (Bring Your Own Machine) e2e tests and enhances the e2e test workflows by adding support for peerpod-ctrl and webhook images.

  • Add BYOM e2e tests workflow
  • Add support for peerpod-ctrl and webhook images in e2e test workflows
  • Update e2e test workflows to include peerpod-ctrl and webhook images
  • Add BYOM e2e tests to the e2e run all workflow
  • Update podvm mkosi ubuntu workflow to build and push BYOM e2e image
  • Update peerpod-ctrl build and push workflow to support multiple architectures
  • Update publish images on push and release workflows to use new peerpod-ctrl build and push workflow
  • Update e2e aws and libvirt workflows to include peerpod-ctrl and webhook images
  • Update lib codeql workflow to use newer version
  • Update e2e docker workflow to include peerpod-ctrl and webhook images
  • Update e2e libvirt workflow to include peerpod-ctrl and webhook images
  • Update citation file to reflect new version and date released
  • Update Makefile to include byom in BUILTIN_CLOUD_PROVIDERS
  • Remove azure nightly build from README

Note

Medium Risk
Touches CI pipelines and e2e provisioning logic across AWS/libvirt/docker, plus changes metadata fetching (AWS IMDSv2) and TDX RTMR3 measurement; failures could break test coverage or boot/attestation paths but are mostly additive and guarded with fallbacks.

Overview
Adds BYOM e2e coverage by introducing a new callable e2e_byom.yaml, wiring it into e2e_run_all.yaml/e2e_on_pull.yaml, and extending existing AWS/libvirt/docker workflows to accept and pass through peerpod_ctrl_image and webhook_image overrides.

Refactors peerpod-ctrl image publishing to build per-architecture images + a multi-arch manifest (peerpod-ctrl_build_and_push.yaml + new peerpod-ctrl_build_and_push_all_arches.yaml) and updates push/release pipelines to use it; podvm_mkosi_ubuntu.yaml now also builds/outputs a BYOM e2e podvm container image.

Improves runtime/provisioning behavior: AWS userdata retrieval now prefers IMDSv2 token auth with IMDSv1 fallback (new tests), AWS e2e VPC provisioning chooses subnets in AZs that support the configured podvm_instance_type and registers AMIs with UEFI+TPM support, libvirt adds optional vCPU pinning via LIBVIRT_CPUSET, and PodVM systemd overrides move RTMR3 extension into a dedicated extend-rtmr3-initdata script.

Housekeeping: removes the Azure nightly build workflow and legacy Azure image build docs, bumps CodeQL action version, updates IBM SDK deps, updates Helm chart dependency version pinning, and updates CITATION.cff to 0.20.0.

Reviewed by Cursor Bugbot for commit 670cecd. Bugbot is set up for automated code reviews on this repo. Configure here.

ANJANA-A-R-K and others added 30 commits April 17, 2026 10:24
Enhance the test suite for the agent-protocol-forwarder component by adding new test cases across 5 test functions. The improvements include comprehensive tests for Config.Setup() method covering all command-line flag combinations, TLS configuration scenarios (enabled/disabled/skip-verify), edge cases for config file loading (empty files, null values, extra fields, permission errors), host interface configuration, default value handling, and error scenarios (missing files, invalid JSON).

Signed-off-by: Anjana A R K <anjana.a.r.k1@ibm.com>
When a pod uses initdata (e.g. KBS tests with cc_kbc), both kata-agent
and confidential-data-hub start after process-user-data completes.
The startup order is:

  process-user-data -> AA -> AA socket -> CDH (via CDH.path) -> CDH socket
                                      |
                                       -> kata-agent (direct enable)

kata-agent.path already exists to gate kata-agent on CDH socket
appearance (PathExists=/run/confidential-containers/cdh.sock). However,
kata-agent.service also had WantedBy=multi-user.target in its [Install]
section, causing systemd to activate it directly at boot without waiting
for the path unit condition to be satisfied.

Fix: remove [Install]/WantedBy=multi-user.target from kata-agent.service
so that systemd can only activate it via kata-agent.path.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
The preset file referenced 'attestation-protocol-forwarder.service'
which does not exist. The correct service name is
'agent-protocol-forwarder.service'.

This was a no-op in practice because agent-protocol-forwarder.service
has WantedBy=multi-user.target in its [Install] section, so systemd
enables it via the symlink in multi-user.target.wants/ regardless.
The stale preset name has been corrected.

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Add agent-protocol-forwarder.path (watches /run/kata-containers/agent.sock)
and scratch-storage.path (watches /run/peerpod/scratch-space.marker) so
both services start only when their prerequisite exists, eliminating the
Restart=on-failure polling loop for APF.

Remove redundant time-based After= deps from AA, CDH, and kata-agent that
are already implied by the path chain. Keep After=process-user-data.service
on AA and CDH: process-user-data writes aa.toml before cdh.toml, so without
this guard CDH can start before cdh.toml is written and lose its KBS config.
Keep After=kata-agent.service on api-server-rest so CDH finishes plugin
init before api-server-rest connects to it.

Remove [Install]/WantedBy=multi-user.target from kata-agent.service so
systemd can only activate it via kata-agent.path. Update 30-coco.preset
and multi-user.target.wants to enable the path units instead of the
services directly.

Activation chain:
  process-user-data -> aa.toml -> AA -> AA.sock -> CDH > CDH.sock
    -> kata-agent -> agent.sock -> APF -> setup-nat-for-imds
    -> api-server-rest (After=kata-agent)
  scratch-space.marker -> scratch-storage (kata-agent After= orders it)

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Bumps [github.com/moby/spdystream](https://github.com/moby/spdystream) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/moby/spdystream/releases)
- [Commits](moby/spdystream@v0.5.0...v0.5.1)

---
updated-dependencies:
- dependency-name: github.com/moby/spdystream
  dependency-version: 0.5.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
…tlptracehttp

Bumps [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp](https://github.com/open-telemetry/opentelemetry-go) from 1.21.0 to 1.43.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go@v1.21.0...v1.43.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp
  dependency-version: 1.43.0
  dependency-type: indirect
...
---

Updated modules with `go mod tidy`

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 6.0.0 to 6.1.0.
- [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases)
- [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/main/CHANGELOG.md)
- [Commits](aws-actions/configure-aws-credentials@8df5847...ec61189)

---
updated-dependencies:
- dependency-name: aws-actions/configure-aws-credentials
  dependency-version: 6.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 7.0.0 to 7.1.0.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](docker/build-push-action@d08e5c3...bcafcac)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-version: 7.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 7.0.0 to 7.0.1.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@bbbca2d...043fb46)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: 7.0.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [oras-project/setup-oras](https://github.com/oras-project/setup-oras) from 1.2.4 to 2.0.0.
- [Release notes](https://github.com/oras-project/setup-oras/releases)
- [Commits](oras-project/setup-oras@22ce207...38de303)

---
updated-dependencies:
- dependency-name: oras-project/setup-oras
  dependency-version: 2.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>

* github.com:confidential-containers/cloud-api-adaptor:
  build(deps): bump aws-actions/configure-aws-credentials
…updates

Bumps the google-cloud group with 1 update in the /src/cloud-api-adaptor directory: [cloud.google.com/go/compute](https://github.com/googleapis/google-cloud-go).
Bumps the google-cloud group with 3 updates in the /src/cloud-providers directory: [cloud.google.com/go/compute](https://github.com/googleapis/google-cloud-go), [cloud.google.com/go/resourcemanager](https://github.com/googleapis/google-cloud-go) and [cloud.google.com/go/auth](https://github.com/googleapis/google-cloud-go).

Updates `cloud.google.com/go/compute` from 1.58.0 to 1.59.0
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@compute/v1.58.0...compute/v1.59.0)

Updates `cloud.google.com/go/compute` from 1.58.0 to 1.59.0
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@compute/v1.58.0...compute/v1.59.0)

Updates `cloud.google.com/go/resourcemanager` from 1.11.0 to 1.12.0
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/documentai/CHANGES.md)
- [Commits](googleapis/google-cloud-go@iap/v1.11.0...iap/v1.12.0)

Updates `cloud.google.com/go/auth` from 0.18.2 to 0.20.0
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@auth/v0.18.2...v0.20.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/compute
  dependency-version: 1.59.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: google-cloud
- dependency-name: cloud.google.com/go/compute
  dependency-version: 1.59.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: google-cloud
- dependency-name: cloud.google.com/go/resourcemanager
  dependency-version: 1.12.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: google-cloud
- dependency-name: cloud.google.com/go/auth
  dependency-version: 0.20.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: google-cloud
...

Signed-off-by: dependabot[bot] <support@github.com>
The CI job for AWS has failed for a while now and we still don't know the
cause. Instead of disabling it completely, let's just ignore its status
because it is still worth running it (e.g. catch build/setup/infra issues).

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
iptables-wrapper-installer.sh was removed in
kubernetes-sigs/iptables-wrappers#14,
so call the binary directly.

Assisted-by: IBM Bob
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
There is an issue in setup-go that it lacks endian awareness, so port
the fix from caa_build_and_push_per_arch.yaml to the standard CAA build
workflow, to enable us to run the workflow on the ppc runner, rather than needing
to use emulation, which can be slow

Drive-by-fix of zizmor warning

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The difference between the caa_build_and_push and
caa_build_and_push_per_arch is confusing. I hope to address
this in this PR, but let's start by renaming for better clarity.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add a helper script that can create one or more multi-arch manifest
images for our three supported architectures giving a registry and
a list of one, or more tags.

Based on the release.sh script from kata-containers

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Currently we have multiple formats of image tags for dev images:
- <release>-dev-<arch> for released images
- dev-<sha>-<arch> for interim published images
- latest-<arch>-dev for daily e2e test images
- ci-pr<pr number>-dev (no arch) for the x86 only packer PR e2e test images
- ci-pr<pr number>-<arch>-dev for mkosi specific-arch PR e2e test images

This shows that we have multiple different code paths, or logic being
run to do the same task and we'd like to reduce duplication and increase
consistency, so let's move all to the release version:
<tag>-dev-<arch>

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Rather than having separate logic and builds for the multi-arch image
including the confusing upload/download of a tags file to drive things,
we can just swap and use the existing CAA build workflow, to build the
images for each arch and the new multi-arch publish to create the multi-arch
manifest.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Nothing should be calling the `image-with-arch` make target anymore
now that the process is unified, so remove it and the code that only it
called to simplify and remove duplication.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to kata-containers/kata-containers@a04df4f
disable the provenance and sbom for single arch images,
so that we can use them in a multi-arch image later

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
For legacy? reasons AWS is using the non-arch specific CAA image
build, but given that it's now the same as the x86 e2e image,
switch to that to reduce duplication

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The non-debug images have been published as debug by mistake.
inputs.debug is a boolean type and it should be handled as boolean.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Re-usable workflows inherit the workflow name from the caller, so
extend the concurrency group to make it unique to the instance.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We want to switch for the fedora-based mkosi podvm image,
to the ubuntu based one for stability and GPU support, so add
e2e tests, so see how it's working.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Bump components to match the kata 3.29.0 release

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Update to pick up the 3.29.0 release

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
`GetDiagnosticData` has gone into the agent protos,
so we need to add it in our implementations of this too.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Enhanced test coverage for interceptor_test.go forwarder_test.go with comprehensive subtests
- It covers mount errors, namespace handling, DNS configuration, TLS setup, daemon lifecycle, and edge cases.

Signed-off-by: Anjana A R K <anjana.a.r.k1@ibm.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.35.1 to 4.35.2.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@c10b806...95e58e9)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.35.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
wainersm and others added 11 commits May 19, 2026 15:40
Split the monolithic peerpod-ctrl_image.yaml into two workflows
following the same pattern as CAA:
- peerpod-ctrl_build_and_push.yaml: per-arch callable workflow
- peerpod-ctrl_build_and_push_all_arches.yaml: orchestrator + manifest

Assisted-by: Claude
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
After the kustomize-to-helm migration, nightly e2e tests deploy
peerpod-ctrl and webhook as sub-charts but use stale `latest` images
from quay.io. Since peerpod-ctrl shares the cloud-providers Go module
with caa, version skew can mask breaking changes.

Build both images alongside caa/podvm in the nightly pipeline and wire
their references through to the helm install via PEERPOD_CTRL_IMAGE and
WEBHOOK_IMAGE environment variables.

The subchart image override logic is centralized in Helm.ConfigureSubchartImages()
to avoid duplicating code across all provider implementations.

Assisted-by: Claude
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The packer-based podvm images for azure haven't been actively maintained
for years and are most likely insecure and not functional. To avoid
confusion we remoe the packer build-infra from the repo.

Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
The terraform based ci-infra folder was only used by azure, it makes
sense to move it to the azure subfolder for discoverability.

Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
AWSUserDataProvider currently issues a bare GET to
/latest/user-data (IMDSv1). On EC2 instances configured with
MetadataOptions.HttpTokens=required (IMDSv2-only), this returns 401
and peer-pod boot fails before kata-agent starts. Many enterprise
AWS organizations enforce IMDSv2 via an SCP, so the bare IMDSv1 path
is unusable in those environments, and AWS now defaults new EC2
launches to v2-only as well.

This change adds an IMDSv2 token PUT before the user-data GET and
attaches the returned session token via the X-aws-ec2-metadata-token
header. If the token PUT fails for any reason (network policy
blocks PUT, legacy IMDSv1-only configuration, transient error), the
helper returns nil headers so the existing IMDSv1 GET path is
preserved as a fallback. No existing flow regresses.

Validated on an AWS organization with SCP-enforced
HttpTokens=required: peer-pod boots end to end, /dev/sev-guest
reachable inside the SEV-SNP guest, attestation report retrieved.

Unit tests cover the success path, non-200 token response, the
returned headers shape, and an end-to-end fallback where the token
endpoint 401s and the user-data GET succeeds without the token
header.

Ref: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html
Signed-off-by: Abhishek Agrawal <abhishek.yours4@gmail.com>
Ensure mkosi debug builds use the active image-tree repart layout and include NVMe modules so GCP boot disks are visible during initrd root setup.

Co-authored-by: Cursor <cursoragent@cursor.com>
On AWS SEV-SNP enabled EC2 instances (the launch shape used for peer-pod
PodVMs when CpuOptions.AmdSevSnp=enabled is set), the mkosi-built
Fedora-based PodVM image consistently fails to boot. In my environment
the kernel panics during early init -- the EC2 serial console shows the
vmgenid driver loading and subsequent platform / device probing wedging
before systemd ever starts. Issue confidential-containers#2691 reports a related manifestation
on a different setup where the guest exits with
Client.InstanceInitiatedShutdown ~12 seconds into kernel init. The same
image boots normally on non-SEV-SNP instances; switching only the
AmdSevSnp CpuOption flips the build between booting and not booting.

Adding initcall_blacklist=vmgenid_plaform_driver_init to the kernel
command line skips the vmgenid platform driver registration, the early
init path no longer wedges, and the PodVM boots end to end. (The
'plaform' typo is intentional -- it matches the kernel symbol name in
drivers/virt/vmgenid.c, which declares
'static struct platform_driver vmgenid_plaform_driver'.)

This is the same one-line workaround posted by @bpradipt in issue confidential-containers#2691
on 2025-12-23, originally bundled into the Fedora 43 upgrade in confidential-containers#2729
but dropped when that PR was superseded by the slimmer confidential-containers#2914. The fix
was never re-extracted into its own PR. Per @mkulke's suggestion on
confidential-containers#2729 ("a PR would be better (if the fix is urgent), since there is
probably more work required for s390x"), this commit carries only the
kernel-cli line in isolation.

vmgenid is x86-only, so initcall_blacklist of vmgenid_plaform_driver_init
is a no-op on s390x kernels and should not require any s390x-specific
handling. Placed in the base mkosi.conf rather than a per-arch conf to
match the location of the patch originally posted on confidential-containers#2691.

Validated end to end on AWS SEV-SNP peer-pods (c6a.2xlarge in us-east-2):
PodVM boots, SEV-SNP is enabled (CpuOptions.AmdSevSnp=enabled), kata-agent
comes up and serves the agent endpoint over the vxlan tunnel from the
worker, and an AMD-signed SEV-SNP attestation report is retrieved from
inside the guest. The same kernel-cli workaround is currently being
hit cross-distro -- siderolabs/talos#13118 reports the equivalent boot
hang on Talos 1.12 on AWS, suggesting the underlying kernel fix
(https://www.spinics.net/lists/kernel/msg5976520.html) has not yet
propagated to widely-used distro kernels.

Refs:
  - confidential-containers#2691 (boot-hang issue, closed with workaround in comment)
  - confidential-containers#2729 (original F43 upgrade PR that bundled this fix, closed)
  - confidential-containers#2914 (slim F43 bump that superseded confidential-containers#2729, did not include this)
  - siderolabs/talos#13118 (same kernel issue on Talos / AWS SNP)
Signed-off-by: Abhishek Agrawal <abhishek.yours4@gmail.com>
@yousef-cohere yousef-cohere changed the base branch from main to cohere May 21, 2026 19:24
Comment thread .github/workflows/e2e_run_all.yaml
Comment thread .github/workflows/caa_build_and_push.yaml
yousef-cohere and others added 3 commits May 21, 2026 15:44
Co-authored-by: Cursor <cursoragent@cursor.com>
…ohere

Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	src/cloud-api-adaptor/install/charts/peerpods/values.yaml
Upstream merge brought in new transitive dependencies (otelgrpc,
otelhttp, google.golang.org/api, genproto, grpc) and bumped several
cloud.google.com/go modules. Tidy go.mod/go.sum to match.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread .github/workflows/podvm_publish.yaml
@yousef-cohere yousef-cohere changed the base branch from cohere to alhassankhedr/sync-main-to-cohere May 21, 2026 20:19
Base automatically changed from alhassankhedr/sync-main-to-cohere to cohere May 22, 2026 03:07
@alhassankhedr-cohere alhassankhedr-cohere dismissed their stale review May 22, 2026 03:07

The base branch was changed.

Comment thread src/peerpod-ctrl/chart/values.yaml
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 59340ba. Configure here.

Comment thread src/cloud-api-adaptor/podvm-mkosi/mkosi.images/system/mkosi.conf
Co-authored-by: Cursor <cursoragent@cursor.com>
The previous ExecStartPost pipeline in process-user-data.service relied on
/bin/sh + sed + printf to convert the hex digest in /run/peerpod/initdata.digest
into 48 raw bytes before piping into the rtmr3 sysfs node. On Ubuntu /bin/sh
is dash, whose printf does not honour \xHH escapes; combined with systemd
unit-file parsing collapsing \\\\x to \\x and GNU sed dropping the backslash
before the 'x' in its replacement, the bytes that actually reached the
kernel were the ASCII string "xHHxHHxHH..." for the first 16 hex pairs of
the digest. RTMR3 was therefore extended with garbage that bound to only
~128 bits of the digest and was sensitive to upstream dash/sed/systemd
parsing changes, so SHA384(0 || digest) predictions never matched.

Move the hex-to-binary conversion into a small dedicated bash helper at
/usr/local/bin/extend-rtmr3-initdata, which uses bash parameter expansion
and printf to write the raw 48 bytes. Verified on a live podvm peer pod
that the resulting RTMR3 matches SHA384(prev || digest) bit-for-bit.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.