Skip to content

Workspace config sources drift and allow invalid resource specs #12

@vsbuffalo

Description

@vsbuffalo

Summary

mops workspace up currently reads resource settings from two sources:

  • A legacy workspace YAML passed via --config
  • The unified ~/.modelops/modelops.yaml (when no --config is given)

The unified approach is meant to work with the workspace config (a subset of unified config options), and was added later to make it easier to spin up infra from a single YAML config file. However the design here needs to be cleaned up.

Because these paths are different, there is no warning when a custom YAML overrides the unified defaults. In dev we deployed a workspace-medmem.yaml with 12 Gi/4 CPU worker requests, even though the cluster’s node pool only has 4 Gi/4 CPU nodes. Pulumi sat for 10 minutes and failed with MinimumReplicasUnavailable; the CLI never surfaced the true cause
(“Insufficient cpu/memory”) or suggested checking the node pool.

Problems

  • Two configuration sources drift silently (YAML examples vs WorkspaceSpec). Users may think editing modelops.yaml is enough, but --config bypasses it entirely. This might be okay with better built in doc, etc.
  • No validation that requested worker resources fit the referenced cluster. Scheduler errors only appear after a long timeout.
  • Poor UX: Pulumi error doesn’t translate to “your pods can’t schedule.” No fast feedback or guidance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions