You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are experiencing a recurring firmware measurement mismatch at index 9 on H100 GPUs running in Confidential VMs (Intel TDX) on GCP. The issue occurs after VM reboots (both manual and GCP-initiated) and can only be resolved by performing a full stop + start of the VM instance from the GCP console.
This issue was previously discussed in #90, where @steven-bellock suggested opening a dedicated issue.
After a fresh stop + start from GCP console → attestation passes consistently.
After a reboot (either sudo reboot, automatic GCP reboot, or host maintenance) → attestation fails with measurement mismatch at index 9.
Subsequent reboots do not fix the issue. Only a full stop + start resolves it.
We have a daily auto start/stop schedule on our Spot instances. The issue appears to be triggered when the VM reboots without a full hardware deallocation/reallocation cycle.
Failing attestation output
Command:
nvattest attest --device gpu --verifier local --format json --nonce <64-char-hex-nonce>
Result:
{
"result_code": 12,
"result_message": "Overall Attestation Result is False"
}
Only index 9 fails. All other verification checks pass (signature, cert chain, RIM, nonce, driver/VBIOS version match).
The goldenValue and runtimeValue are identical across occurrences (same values on Feb 19 and Mar 2), suggesting a deterministic mismatch rather than random corruption.
Summary
We are experiencing a recurring firmware measurement mismatch at index 9 on H100 GPUs running in Confidential VMs (Intel TDX) on GCP. The issue occurs after VM reboots (both manual and GCP-initiated) and can only be resolved by performing a full stop + start of the VM instance from the GCP console.
This issue was previously discussed in #90, where @steven-bellock suggested opening a dedicated issue.
Environment
a3-highgpu-1g(Spot instance)--confidential-compute-type=TDX)GH100 A01 GSP BROM)580.126.09(open kernel module,nvidia-driver-580-server-open)96.00.CF.00.016.17.0-1008-gcp(Ubuntu 24.04)1.1.1.1770245582-1ON(Production mode, not DevTools)TERMINATE/etc/modprobe.d/nvidia-lkca.conf)Behavior
Timeline of occurrences
measres: fail, index 9,result_code: 12result_code: 0)measres: fail, index 9,result_code: 12Reproducible pattern
sudo reboot, automatic GCP reboot, or host maintenance) → attestation fails with measurement mismatch at index 9.Failing attestation output
Command:
Result:
{ "result_code": 12, "result_message": "Overall Attestation Result is False" }Claims (relevant fields)
{ "hwmodel": "GH100 A01 GSP BROM", "measres": "fail", "oemid": "5703", "secboot": null, "dbgstat": null, "x-nvidia-gpu-driver-version": "580.126.09", "x-nvidia-gpu-vbios-version": "96.00.CF.00.01", "x-nvidia-gpu-attestation-report-signature-verified": true, "x-nvidia-gpu-attestation-report-nonce-match": true, "x-nvidia-gpu-driver-rim-version-match": true, "x-nvidia-gpu-driver-rim-signature-verified": true, "x-nvidia-gpu-vbios-rim-version-match": true, "x-nvidia-gpu-vbios-rim-signature-verified": true, "x-nvidia-gpu-arch-check": true, "x-nvidia-gpu-attestation-report-cert-chain-fwid-match": true, "x-nvidia-mismatch-measurement-records": [ { "index": 9, "measurementSource": "Firmware", "goldenSize": 48, "goldenValue": "4b3ed0f834d10fef95e61615edc5b4e98ec78cff39323993b3218f0cd62507978cf64e4487520bc7e560fde71ea0fc75", "runtimeSize": 48, "runtimeValue": "c80a9b62ce0d41184bb1ad0f6334d9400a2d2514ef92003b1c043410f91b7309144325a3e01c58b8bd6e198f5dda3b9b" } ] }Key observations
goldenValueandruntimeValueare identical across occurrences (same values on Feb 19 and Mar 2), suggesting a deterministic mismatch rather than random corruption.secbootisnullanddbgstatisnull(known issue per NVIDIA internal issue5916701, as mentioned by @steven-bellock in Measurement mismatch in idx 9 #90).goodOCSP status.Configuration verification
We followed the GCP Confidential VM with GPU guide and verified all steps:
Questions
Firmware) correspond to? Is there documentation on the semantics of each measurement index?Related issues
goldenValue/runtimeValue, same index 9)