Skip to content

Latest commit

 

History

History
196 lines (130 loc) · 9.7 KB

File metadata and controls

196 lines (130 loc) · 9.7 KB

Cross-Host Shared Key Demo: Observations & Discrepancies

What We Built

Three CVM instances sharing one Ethereum signing key, deployed across two platforms:

CVM Platform Guest Image KMS instanceId signerAddress
shared-key-demo-1 Phala Cloud (prod5) dstack-0.5.4 kms.dstack-base-prod5.phala.network (v0.5.4) a67d32cf... 0x307c8457...
shared-key-demo-2 Phala Cloud (prod5) dstack-0.5.4 kms.dstack-base-prod5.phala.network (v0.5.4) c8678d67... 0x307c8457...
shared-key-demo-selfhost hosted.dstack.info dstack-0.5.7 kms.dstack-base-prod5.phala.network (v0.5.4, remote) cb7b6f8c... 0x307c8457...

All three derive identical keys and produce byte-identical signatures for the same message.

  • On-chain identity: AppAuth contract 0xCAa3a03a6e34E78e71d669E41961Ca8c95FD1611 on Base
  • Deployer: 0x5A370b73385085091de23E0fD21B54F2724EAD8D
  • KMS contract: 0x2f83172A49584C017F2B256F0FB2Dca14126Ba9C on Base
  • Key path: getKey("/shared-demo", "ethereum")

Issue 1: Compose Hash Divergence

The docker_compose_file content is identical across all three CVMs (docker_compose_hash: 089a2789...). But the compose_hash (hash of the full manifest) differs for each deployment because each platform/deployment path injects different metadata fields.

Three different compose hashes for the same app:

CVM compose_hash docker_compose_hash
demo-1 (Phala Cloud, CLI deploy) 46e2aa51... 089a2789...
demo-2 (Phala Cloud, API deploy) f1a4f314... 089a2789...
selfhost (hosted.dstack.info) d24f2633... 089a2789...

Manifest field comparison (docker_compose_file and pre_launch_script excluded):

Field demo-1 (CLI) demo-2 (API) selfhost
allowed_envs ["DSTACK_AUTHORIZED_KEYS"] [] []
features ["kms","tproxy-net"] ["kms"] (absent)
gateway_enabled true true false (intentional)
kms_enabled true true true
local_key_provider_enabled false false false
manifest_version 2 2 2
name "" "shared-key-demo-2" ""
no_instance_id false false false
pre_launch_script 12328 chars 12328 chars (absent)
public_logs true true true
public_sysinfo true true true
public_tcbinfo true true (absent)
runner "docker-compose" "docker-compose" "docker-compose"
secure_time false false false
storage_fs "zfs" "zfs" (absent)
tproxy_enabled true false (absent)
key_provider_id (absent) (absent) ""

Key observations:

  1. demo-1 vs demo-2 differ even on Phala Cloud. The CLI deploy path (phala deploy) and the API deploy path (POST /cvms/provision + POST /cvms) produce different manifests. Differences: name (empty vs populated), allowed_envs, features list, tproxy_enabled.

  2. Phala Cloud injects fields the user didn't specify: pre_launch_script (12KB bash script for SSH, Docker cleanup, etc.), storage_fs: "zfs", public_tcbinfo: true, features: ["kms","tproxy-net"], allowed_envs: ["DSTACK_AUTHORIZED_KEYS"].

  3. Self-hosted VMM passes through the manifest mostly as-is, only adding key_provider_id: "". No pre_launch_script, no storage_fs, no features.

  4. Each unique compose_hash must be whitelisted on-chain via addComposeHash() on the AppAuth contract. For our 3 deployments, we needed 3 separate on-chain transactions (the first was auto-whitelisted during contract deployment).

Impact:

Updating the app (changing docker-compose.yaml) requires whitelisting a new compose_hash per deployment path per platform. For N platforms with M deployment methods, that's up to N×M hashes to whitelist per release.


Issue 2: --custom-app-id CLI Flag Doesn't Work with Base KMS

The phala deploy CLI has a --custom-app-id flag, but when used with --kms base, the resulting CVM gets a new app_id, ignoring the flag.

phala deploy --name shared-key-demo-2 \
  --kms base --private-key ... --rpc-url ... \
  --custom-app-id caa3a03a6e34e78e71d669e41961ca8c95fd1611
# Result: app_id = 7cc4a84c... (DIFFERENT, flag ignored)

The CLI help says --custom-app-id "requires --nonce for PHALA KMS", suggesting it may only be implemented for Phala KMS, not Base/Ethereum KMS.

Workaround:

Used the 2-step API directly (see deploy_replica.py):

  1. POST /cvms/provision — allocates resources, returns compose_hash
  2. POST /cvms — creates the CVM with an explicit app_id field

This works but produces a different manifest (and compose_hash) than the CLI path.


Issue 3: allowAnyDevice Defaults to False

The AppAuth contract deployed by phala deploy --kms base sets allowAnyDevice = false and only registers the deploying node's device_id. When deploying to a different physical machine (hosted.dstack.info), the KMS rejects with "Boot denied: Device not allowed".

Fix applied:

cast send $APP_AUTH_ADDR "setAllowAnyDevice(bool)" true --rpc-url https://mainnet.base.org --private-key $KEY

Recommendation:

The CLI could offer a --allow-any-device flag, or document that cross-node replication requires this on-chain change.


Issue 4: Version Mismatch (0.5.7 Guest vs 0.5.4 KMS)

hosted.dstack.info runs dstack-0.5.7 (only available guest image). Phala Cloud prod5 KMS runs v0.5.4.

  • Key derivation works cross-version. The 0.5.7 guest successfully derives app keys from the 0.5.4 KMS.
  • Gateway cert signing fails cross-version. The 0.5.7 guest sends API v2 CSR requests; the 0.5.4 KMS rejects with "Unsupported API version: 2".

Error from boot logs:

Failed to request cert: Failed to sign the CSR:
Request failed with status=400 Bad Request,
error={"error": "Unsupported API version: 2"}

Workaround applied:

Deployed the self-hosted CVM with gateway_enabled: false. The app derives keys and runs, but is only accessible via SSH port forward (curl localhost:9400 on the host), not via HTTPS gateway.

Available Phala Cloud KMS versions (Base chain):

All Base-chain KMS instances are v0.5.4:

KMS slug Version URL
kms-base-prod5 v0.5.4 kms.dstack-base-prod5.phala.network
kms-base-prod6 v0.5.4 kms.dstack-base-prod6.phala.network
kms-base-prod7 v0.5.4 kms.dstack-base-prod7.phala.network
kms-base-prod9 v0.5.4 kms.dstack-base-prod9.phala.network
kms-base-prod10 v0.5.4 kms.dstack-base-prod10.phala.network
kms-base-use1 v0.5.4 kms.dstack-base-use1.phala.network

Phala-chain KMS instances are at v0.5.6, but those use a different chain (Phala, not Base/Ethereum) and cannot share our Base app_id.

Guest images up to 0.5.7 are available on all nodes (prod5, prod9, etc.), but the guest image version doesn't matter — the bottleneck is the KMS version.

Conclusion: No Phala Cloud node can run a 0.5.7 guest with gateway + Base KMS without hitting the API v2 CSR error. The fix must come from upgrading the Base KMS instances to v0.5.6+.


Issue 5: No Public Access for Self-Hosted CVM

With gateway_enabled: false, the CVM doesn't register with the dstack gateway and has no public HTTPS URL. The host port (9400) is not exposed through the firewall.

Currently verified via:

ssh ubuntu@hosted.dstack.info curl -s localhost:9400

Possible fixes:

  1. Upgrade Phala Cloud Base KMS to v0.5.6+ — all Base-chain KMS instances are stuck at v0.5.4. Phala-chain KMS is already at v0.5.6. This is the real fix.

  2. Install dstack-0.5.4 guest image on hosted.dstack.info — images live at /var/lib/dstack/images/. A 0.5.4 guest would send API v1 CSR which the remote 0.5.4 KMS accepts. But the CVM can only have one kms_urls setting, and hosted's local KMS is 0.5.7 (different root key).

  3. Backport API v2 CSR support to 0.5.4 KMS — or make 0.5.7 guest fall back to API v1 for CSR.

  4. Expose via host-level reverse proxy — use nginx/socat on hosted.dstack.info to proxy port 9400 to a public port. No TDX gateway guarantees, but provides access.

  5. Deploy a standalone gateway on hosted.dstack.info — would need its own certs (not KMS-signed). The gateway is a Docker container (dstack-gateway) with WireGuard + TLS proxy, but it still needs KMS for cert signing, hitting the same version mismatch.


Summary of On-Chain Transactions Required

For this demo, we executed 4 on-chain transactions on Base:

  1. Contract deployment (auto by phala deploy): Deployed AppAuth contract with demo-1's compose_hash
  2. addComposeHash: Whitelisted demo-2's compose_hash (f1a4f314...)
  3. addComposeHash: Whitelisted self-hosted compose_hash (d24f2633...)
  4. setAllowAnyDevice(true): Allowed any TDX device to use this app_id

Files

  • docker-compose-demo.yaml — The demo app (Node.js, derives key, exposes / and /sign)
  • deploy_replica.py — 2-step API deploy script (workaround for --custom-app-id not working)

URLs