AIR CLI Integration: air run Command Pt. 1 - Add GPU accelerator type and compute config model#5602
Open
riddhibhagwat-db wants to merge 1 commit into
Open
AIR CLI Integration: air run Command Pt. 1 - Add GPU accelerator type and compute config model#5602riddhibhagwat-db wants to merge 1 commit into
air run Command Pt. 1 - Add GPU accelerator type and compute config model#5602riddhibhagwat-db wants to merge 1 commit into
Conversation
Add compute.go: the gpuType model and compute-block validation the upcoming `air run` config layer depends on. Defines the canonical GPU_* accelerator types, parseGPUType (exact, case-sensitive), gpusPerNode (partition counts), and computeConfig.validate (positive count, multiple-of-per-node, mutually exclusive node_pool_id/pool_name). Co-authored-by: Isaac
Contributor
Waiting for approvalCould not determine reviewers from git history. Eligible reviewers: Suggestions based on git history. See OWNERS for ownership rules. |
Collaborator
Integration test reportCommit: b73298d
22 interesting tests: 15 SKIP, 7 RECOVERED
Top 33 slowest tests (at least 2 minutes):
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Adds
experimental/air/cmd/compute.go, which is thegpuTypemodel andcomputewhich is the block validation that theair runconfiguration layer depends on.Specifically:
GPU_1xA10,GPU_8xH100,GPU_1xH100)parseGPUTyperesolves a YAML accelerator type stringgpusPerNodeis the per node partition count based on the type namecomputeConfigandvalidate()are the port of the pythonComputeConfigvalidatorsWhy
This is the first, leaf-most piece of the
air runport for the AIR CLI and the root of the config validation layer dependencies. This piece for compute does not depend on anything else so it lands first as a small and fully unit-tested unit.Note that we also use exact case sensitive parsing since a potential typo in the user's YAML could misroute the run. Additionally, we only support
GPU_*training service types (legacy MAPI types (eg.h100_80gb) are no longer supported and intentionally deprecated in this port. However, they still have their own display map for historical runs to be able to be displayed (but no new runs can use the MAPI path). Rendering them in get is unaffected since format.go keeps its own display map for historical runs.Tests
Table-driven unit tests in compute_test.go: parseGPUType for valid types and rejected inputs (wrong casing, legacy types, unknown, empty); gpusPerNode counts plus its invalid-type error; and computeConfig.validate across valid configs and every failure mode (unknown/legacy type, non-positive count, non-multiple count, dual-pool conflict). go build, go test, and golangci-lint are clean.