This README captures the thinking behind the first version of the compute interface and some of the decisions that shaped it. It's meant to provide context as we move toward more general, multi-cloud support.
The first version of this interface was designed around AWS EC2. At the time, AWS was the only provider we were supporting, so we built around its APIs and assumptions. The design has evolved, however you will notice AWS's momentum in the API.
- The difference between
Location,SubLocation, andAvailableAzsis unclear. - Some providers don’t expose availability zones or don’t map cleanly to this model.
- Tag updates are assumed to work everywhere.
- Many providers lack first-class tag support.
- LifecycleStatus enums are based on AWS terms.
- Relies on manual string formatting (e.g.,
location-subLoc-type).
- Uses a few top-level errors (e.g.,
ErrOutOfQuota) with no structured data. - Makes it hard to reason about retryability or provider-specific failure modes.
The terminology around instance-attached storage is one of the more confusing parts of the v1 design. The interface uses three overlapping terms:
- Disk: Used in Instance and CreateInstanceAttrs (AdditionalDisks, DiskSize)
- Volume: Used in other methods
- Storage: Used in capabilities (SupportedStorage), and types (StorageFilters)
- DiskSize appears both in CreateInstanceAttrs and UpdateInstanceRetireableArgs, but it’s unclear if it applies only to the root volume or all attached volumes.
- AdditionalDisks allows specifying multiple disks, but there’s no clear linkage to volume IDs or mount behavior post-creation.
- Some clouds (e.g. AWS) treat root and attached volumes differently (with separate APIs).
- Others (e.g. Lambda) don’t expose volumes at all — only a total storage value.
- Elastic volumes, ephemeral disks, and NVMe local storage are not modeled cleanly in v1.
- The v1 design is fundamentally instance-centric and not conducive to cluster support.
- No abstractions for cluster-level operations, networking, or orchestration.
- Instance management is treated as individual resources rather than as part of a larger distributed system.
- Missing concepts like cluster membership, inter-instance communication, shared state, or cluster lifecycle management.
- For support to be added we may need to more fomally implement networks/vpcs or instance groups.