Summary
mops workspace up currently reads resource settings from two sources:
- A legacy workspace YAML passed via
--config
- The unified
~/.modelops/modelops.yaml (when no --config is given)
The unified approach is meant to work with the workspace config (a subset of unified config options), and was added later to make it easier to spin up infra from a single YAML config file. However the design here needs to be cleaned up.
Because these paths are different, there is no warning when a custom YAML overrides the unified defaults. In dev we deployed a workspace-medmem.yaml with 12 Gi/4 CPU worker requests, even though the cluster’s node pool only has 4 Gi/4 CPU nodes. Pulumi sat for 10 minutes and failed with MinimumReplicasUnavailable; the CLI never surfaced the true cause
(“Insufficient cpu/memory”) or suggested checking the node pool.
Problems
- Two configuration sources drift silently (YAML examples vs
WorkspaceSpec). Users may think editing modelops.yaml is enough, but --config bypasses it entirely. This might be okay with better built in doc, etc.
- No validation that requested worker resources fit the referenced cluster. Scheduler errors only appear after a long timeout.
- Poor UX: Pulumi error doesn’t translate to “your pods can’t schedule.” No fast feedback or guidance.
Summary
mops workspace upcurrently reads resource settings from two sources:--config~/.modelops/modelops.yaml(when no--configis given)The unified approach is meant to work with the workspace config (a subset of unified config options), and was added later to make it easier to spin up infra from a single YAML config file. However the design here needs to be cleaned up.
Because these paths are different, there is no warning when a custom YAML overrides the unified defaults. In dev we deployed a
workspace-medmem.yamlwith 12 Gi/4 CPU worker requests, even though the cluster’s node pool only has 4 Gi/4 CPU nodes. Pulumi sat for 10 minutes and failed withMinimumReplicasUnavailable; the CLI never surfaced the true cause(“Insufficient cpu/memory”) or suggested checking the node pool.
Problems
WorkspaceSpec). Users may think editing modelops.yaml is enough, but--configbypasses it entirely. This might be okay with better built in doc, etc.