feat(cli,api): migrate CLI and REST server to consume aicr.Client#1104
feat(cli,api): migrate CLI and REST server to consume aicr.Client#1104hkii wants to merge 41 commits into
Conversation
… ResolveRecipeFromCriteria, SelectFromRecipe
LoadRecipe loads a recipe file (or cm:// ConfigMap URI) through the Client's own DataProvider, hydrating overlay inputs against that provider and returning an owner-stamped facade *RecipeResult ready for ValidateState / BundleComponents.
Add WithValidationPhases to restrict a validation run to a subset of phases (default unchanged: all phases in order). ValidateState now also threads the Client's DataProvider into the validator so the catalog loads from this Client's recipe source instead of the package global.
…l, list formatting
Add (*RecipeResult).Resolved() so internal consumers can reach the full pkg/recipe.RecipeResult behind the lossy facade shape (constraints, validation config, deployment order) needed for phase warnings and evidence emission. Add three ValidateOptions translating to validator options: WithValidationCommit, WithValidationImageRegistryOverride, and WithValidationImageTagOverride. Empty string is the unset sentinel, matching the validator's own empty-as-noop handling. Reject unknown WithValidationPhases values with ErrCodeInvalidRequest before any cluster work so a typed Phase typo fails closed rather than degrading to an empty/skipped run.
Add coverage proving catalog.Load reads the validator catalog from the DataProvider it is handed rather than the package global, and update the v1 README examples to the catalog.Load(dp, version, commit) signature. This is the catalog-side half of letting the validate evidence path use the command's data provider instead of the recipe package global, so a --data overlay resolves the validator catalog against the command's source.
Drive the validate command through the aicr.Client facade. The command builds a Client from the resolved data source (--data flag, else spec.recipe.data; filesystem-layered or embedded), loads the recipe via client.LoadRecipe, and runs the phases via client.ValidateState with the resolved CLI knobs mapped to WithValidation* options (namespace, run ID, cleanup, image-pull-secrets, no-cluster, tolerations, node selector, commit, image registry/tag overrides, and phases when explicitly selected). This removes the initDataProvider call from the validate Action, so validate no longer mutates the recipe package global: the Client owns its DataProvider. The same provider is built once and threaded into evidence emission (emitRecipeEvidence -> catalog.Load) so a --data overlay resolves the attestation catalog against the command's source. The phase-warning helper and evidence emission need the full pkg/recipe.RecipeResult, reached via the new RecipeResult.Resolved() accessor. LoadRecipe wraps loaded recipes with a nil-criteria-tolerant constructor so already-hydrated and bare RecipeResult files load the same way recipe.LoadFromFile accepted them.
Add Client.MakeBundle plus BundleOptions / BundleConfig / BundleAttester / BundleArtifact to the facade. MakeBundle wraps bundler.New + Make so callers get the same full deployer-mode bundle artifact the CLI bundle command and REST /v1/bundle handler produce today, with the Client's own DataProvider (via recipe.Resolved()) and allowlist fencing applied.
Route the bundle command through the aicr.Client facade: build a Client from the resolved data source (--data / spec.recipe.data), load the recipe via client.LoadRecipe, and generate the bundle via client.MakeBundle. Drops the process-global initDataProvider and the direct bundler.New / Make calls; output is unchanged.
Route mirror list's --recipe branch through the aicr.Client facade: a Client built from the command's data source loads the recipe via client.LoadRecipe, and rec.Resolved() yields the raw *recipe.RecipeResult the Lister needs. The criteria branch (buildRecipeFromCmdWithConfig) and initDataProvider remain unchanged because they still depend on the process-global criteria registry, de-globalized by NVIDIA#987 (Stage 4).
…rver.go Add a pkg/api bundleHandler that reproduces bundler.HandleBundles exactly — same method gate, body decode, allowlist pre-check, query-param parsing, and zip response — but routes through the aicr.Client facade (AdoptRecipe + MakeBundle) instead of constructing a bundler directly. server.go now builds no recipe.Builder and no bundler.Bundler, completing NVIDIA#1077 criterion NVIDIA#2. Supporting changes: - recipe.RecipeResult.BindDataProvider: exported binder so the facade can adopt an externally-decoded RecipeResult onto the Client's provider. - aicr.Client.AdoptRecipe / adoptRecipe: wrap a raw RecipeResult into a Client-owned, provider-bound, owner-stamped result for MakeBundle. - bundler.ParseBundleConfig / StreamZipResponse: exported boundaries so the facade-backed handler shares one config-construction and zip-streaming site with the legacy handler (no drift). - Parity test asserts identical status, content headers, and zip layout vs the legacy bundler.HandleBundles across deployer modes.
MakeBundle previously wrapped the context with a fixed defaults.BundleHandlerTimeout (~60s), which the CLI bundle command inherited — a regression for large bundles, --vendor-charts, and attestation/signing that can each exceed 60s. Add an opt-in BundleOptions.Timeout field: > 0 caps the run (bounded by the smaller of it and any tighter caller deadline); 0 imposes no facade-level cap and runs under the caller's ctx as-is. The REST /v1/bundle handler sets Timeout = defaults.BundleHandlerTimeout to keep its 60s request boundary; the CLI leaves it unset for uncapped bundles.
A prior change turned the exported catalog.Load(version, commit) into Load(dp, version, commit), breaking the exported API. Rename the provider-aware form to LoadWithDataProvider(dp, version, commit) and re-add Load(version, commit) as a thin back-compat wrapper that passes a nil provider (global fallback, preserving original behavior). Internal provider-aware callers (validator.go, validate_evidence.go) use LoadWithDataProvider; simple-path tests and README examples keep Load(version, commit).
Replace errors.Wrap(ErrCodeInternal, ...) with PropagateOrWrap at three validate-path sites so an inner *StructuredError's code survives instead of being overwritten with ErrCodeInternal: - validate_evidence.go: catalog.LoadWithDataProvider error - validate.go: client.ValidateState error - validate.go: aicr.NewClient error
- types.go: (*RecipeResult).Resolved() returns nil for a nil receiver instead of panicking. - aicr_test.go: allowlist-rejection test now asserts the structured error carries ErrCodeInvalidRequest, mirroring the invalid-phase test. - docs/integrator/go-library.md: reword the kubeconfig sentence to disambiguate which "path" the ConfigMap-URI condition refers to.
adoptRecipe bound the Client's DataProvider directly onto the caller-supplied *Recipe, mutating caller-owned state. A caller reusing one *Recipe across two Clients would have the second adopt overwrite the first's binding. Add RecipeResult.DeepCopy (deep-copies all exported fields, leaves the unexported provider nil for BindDataProvider) and adopt against the copy so neither the caller's recipe nor a sibling Client's binding is contaminated.
…ons, opt-in timeout - Thread the Client's version into the validator so :latest images are rewritten to the release tag and AICR_CLI_VERSION is populated (the pre-facade CLI passed validator.WithVersion). - Track tolerationsSet so WithValidationTolerations(nil) is an explicit override that clears the validator's default tolerate-all, instead of being dropped as "unset". - Make the facade validation deadline opt-in via WithValidationTimeout: nil keeps the 60m default (controllers), 0 imposes no cap (per-validator timeouts govern), >0 sets an explicit cap — matching MakeBundle.
Pass aicr.WithValidationTimeout(0) so CLI validation is uncapped at the facade level (per-validator timeouts govern), and rely on the now-always -emitted WithValidationTolerations override to clear the validator's default tolerate-all — restoring the pre-facade CLI behavior for all-phase runs that include the long inference-perf check.
…irror Explain that validate's second DataProvider exists for evidence emission (resolving against the command source, not the global), and log the external data directory in mirror's --recipe path for observability parity with validate/bundle.
De-globalize the criteria registry so it is cached per-DataProvider, mirroring the existing component-registry pattern. The package-global path stays working as a back-compat shim. - Add criteriaRegistryCache (sync.Map keyed by DataProvider) plus GetCriteriaRegistryFor / EvictCachedCriteriaRegistry and the _ForTesting helpers. DefaultRegistry now delegates to GetCriteriaRegistryFor(GetDataProvider()) so the global and embedded-provider paths share one registry. - seedCriteriaRegistry takes a DataProvider and registers into that provider's registry; buildMetadataStore threads its provider through, so loading a provider's metadata store seeds THAT provider's criteria registry. - Lift the parse + All*Types cores onto *CriteriaRegistry methods (ParseService/Accelerator/Intent/OS/Platform, All*Types). The package-level ParseCriteria*Type / AllCriteria*Types / BuildCriteria delegate to DefaultRegistry() so existing callers compile unchanged. - Add the registry-aware build path the CLI will consume: BuildCriteriaWithRegistry + WithServiceRegistry/... options that resolve against an explicit registry.
Add (*Client).LoadCatalog to eagerly seed THIS Client's per-provider criteria registry (via recipe.LoadMetadataStoreFor) and (*Client).CriteriaRegistry to return that registry, so CLI/library callers parse and validate criteria against the same DataProvider the Client resolves with. Includes a CriteriaRegistry type alias. Tests cover embedded-vs-filesystem registry isolation, strict-mode hiding of external values, and the nil-context/closed-Client guards.
Add (*RecipeSpec).ResolveCriteriaWithRegistry so config-sourced recipe criteria validate against a caller-supplied per-provider criteria registry instead of the package global. ResolveCriteria stays as a shim over DefaultRegistry() for existing callers. This lets a --data overlay's non-OSS criteria value supplied via spec.recipe.criteria validate the same way the CLI flag path does.
…ider Migrate the recipe, query, and mirror (criteria path) commands off the process-global recipe data provider and global criteria registry onto a per-command aicr.Client with a per-provider criteria registry. - Add recipeClientFromCmd in root.go: builds an aicr.Client bound to the command's resolved data source (--data / spec.recipe.data, else embedded), replacing initDataProvider, which is removed. - Thread a *recipe.CriteriaRegistry through buildRecipeFromCmdWithConfig, mergeCriteriaFromCmdAndConfig, buildCriteriaFromCmd, applyCriteriaOverrides, and applyCriteriaFromConfig so every criteria parse resolves against the Client's own provider registry (via BuildCriteriaWithRegistry / reg.Parse* / ResolveCriteriaWithRegistry). - recipe/query/mirror each build a Client, defer Close, and call client.LoadCatalog to seed their provider registry before parsing; query now seeds explicitly (fixing a latent lazy-global ordering bug). Strict mode applies to the Client's registry. Snapshot path resolves through client.ResolveRecipeFromSnapshot, criteria path through client.ResolveRecipeFromCriteria. - mirror builds one Client for both --recipe (LoadRecipe) and criteria paths; initDataProvider removed from mirror entirely. Retarget the former TestInitDataProvider_* tests at recipeClientFromCmd and pass DefaultRegistry() at the unit-test criteria-helper call sites (OSS canonical values resolve via the registry fast path). No initDataProvider/SetDataProvider references remain in pkg/cli.
There was a problem hiding this comment.
Thanks for taking this on, @hkii — you've been carrying the facade thread since #1072 and this PR is the logical next step. Direction is right and matches the project principle that the facade should be the single aggregation point. A few architectural shape questions before line-level review starts; flagging them now since the PR is still draft.
1. Two parallel public-API tiers on the Client.
types.go introduces transparent aliases (type Recipe = recipe.RecipeResult) alongside the existing facade-owned shape (type RecipeResult struct {…}), and the surface has parallel resolve methods returning each (ResolveRecipe → *RecipeResult vs ResolveRecipeFromCriteria/FromSnapshot → *Recipe). The Stability note on Snapshot already concedes the alias tier "inherits pkg/snapshotter's Public (evolving) tier rather than the facade's stable tier — field-shape changes propagate without notice," and a comment points to #1078 as the future wrap. The risk: this PR ships the alias surface that #1078 will need to wrap, and any caller built against *Recipe between now and #1078 has to migrate twice. Suggest landing #1078 first so the facade-owned wrapper protects external Go importers; then this PR consumes the stable types.
2. The PR bundles a wiring refactor with several behavior changes.
Title says "migrate CLI and API to consume aicr.Client" but the implementation notes also include: deep-copy fix in AdoptRecipe + ComponentRef.Overrides (fixes a prior aliasing bug — i.e., behavior change), validation timeout time.Duration → *time.Duration with the CLI passing 0 to uncap (real behavior change for CLI users), tolerations explicit-set flag (semantic change), and validate propagating structured codes via PropagateOrWrap. Each of those is independently bisectable and reviewable. Bundling them with a 4,900-line refactor across 41 files in 38 commits makes regression hunting much harder if something breaks downstream. The "preserves behavioral parity" framing also fights the substance — uncapping validation timeout in the CLI is a deliberate change worth calling out plainly.
Suggested split:
- Pre-PR: each correctness fix on its own commit with a regression test (
AdoptRecipedeep-copy is a particularly good standalone candidate — small, testable, ships behavior improvement without coupling to wiring). - Pre-PR: per-
DataProvidercriteria registry (replaces the process global) with the back-compat shim's deprecation marker and a tracking comment pointing at #987 for removal. - This PR (slimmed): only the CLI/API migration onto the now-stable Client surface. Should drop 1,500–2,000 lines once the above ship separately.
3. Back-compat shim sunset plan.
"Per-DataProvider criteria registry with a global shim for back-compat" — shims tend to be sticky, and #987 is the sibling that removes the global entirely. Worth confirming the shim is flagged with //nolint:staticcheck + a Deprecated: doc tag pointing at #987, and that there's a test asserting no in-tree caller still touches the global. Otherwise we end up with a permanent dual code path.
4. PR mechanics. None of this is substance; flagging for the squash-and-rebase pass:
- Squash: 38 commits → 1 per logical change (after splitting per concern #2 above)
- Rebase:
mergeable_state: dirtyafter #1100/#1101/#1102 landed - Sign:
git commit -S(checkbox is unchecked) make qualifycheckbox: unchecked — confirm pass locally before flipping to ready-for-review
What's clearly good and worth keeping: the direction (single integration surface, per-instance state), the docs updates landing in the same PR (go-library.md, public-api.md), the per-file test coverage discipline, and the Close() + per-Client guarantee documented in rollout notes. The architectural boundaries are respected — no kubectl orchestration, no in-cluster controllers, no new deployers.
Happy to pair on any of the splits, especially the #1078 sequencing question. Thanks again for the work.
|
Caution Review failedFailed to post review comments 📝 WalkthroughWalkthroughAdds per-Client version and allowlist fields to the aicr Client; threads version into recipe builders; implements LoadCatalog and CriteriaRegistry for per-provider registries; centralizes allowlist enforcement and shared resolveCriteria for criteria/snapshot resolution; binds data providers into loaded RecipeResults and adds DeepCopy/Bind helpers; adds selected-phase validation and provider-aware validator/catalog loading; introduces Client.AdoptRecipe/MakeBundle and bundler streaming/config helpers; migrates CLI and server to construct/use per-command/server Clients; and adds extensive tests and documentation updates. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/cli/bundle.go`:
- Around line 605-626: The client construction block duplicates logic in
recipeClientFromCmd; replace the inline data-dir resolution and aicr.NewClient
call with a single call to recipeClientFromCmd(cmd, cfg), capture the returned
client and error, and keep the same defer to close the client (defer func() { _
= client.Close() }()). Ensure you remove the duplicated variables/dataDir logic
and still return the wrapped error from the recipeClientFromCmd failure path so
behavior and logging (including the "initializing external data provider"
message) remain identical.
In `@pkg/cli/validate.go`:
- Around line 747-771: This code duplicates client construction logic from
recipeClientFromCmd; replace the block that builds dataDir/source and calls
aicr.NewClient with a single call to recipeClientFromCmd (passing the command
context/flags and cfg and version as required), handle its returned client and
error the same way, and keep the defer client.Close() cleanup; ensure you remove
the local dataDir/source logic so the shared recipeClientFromCmd is the single
source of truth.
In `@pkg/recipe/criteria_registry.go`:
- Around line 235-258: The cache currently keys only by DataProvider identity
(criteriaRegistryCache and GetCriteriaRegistryFor), which preserves stale
registries when a mutable provider's internal state changes; change the cache
key to include the provider's generation counter so a provider state change
creates a new entry. Concretely, in GetCriteriaRegistryFor compute a composite
key (e.g., struct{dp DataProvider; gen uint64} or a string like
fmt.Sprintf("%p-%d", dp, gen)) where gen is obtained from the provider's
generation accessor (e.g., dp.Generation() or the appropriate
GetGeneration/GetVersion method), then LoadOrStore using that composite key and
keep the rest (criteriaRegistryCacheEntry, once, newCriteriaRegistry) unchanged;
apply the same keying change to the other cache use sites noted around the
299-301 region.
In `@pkg/validator/validator.go`:
- Around line 182-185: The catalog.LoadWithDataProvider call currently always
rewraps any error as errors.ErrCodeInternal (in the block around
catalog.LoadWithDataProvider(v.dataProvider, v.Version, v.Commit)), which hides
upstream structured codes; change the handling to propagate coded errors
unmodified and only wrap unknown errors: after the call, detect if the returned
err already carries a structured code (the package's coded/error type), and if
so return it directly, otherwise wrap with errors.Wrap(errors.ErrCodeInternal,
"failed to load validator catalog", err). Apply the same propagate-or-wrap
change to the other occurrence around lines 240-243 so existing catalog error
codes are preserved.
In `@query.go`:
- Around line 29-32: The current handling of errors from recipe.HydrateResult
always wraps the error with errors.ErrCodeInternal, which can overwrite existing
structured error codes; replace the return logic to use errors.PropagateOrWrap
(or the library's propagate helper) so that if the hydrated error already
carries a code it is propagated unchanged, otherwise it is wrapped with a
contextual message like "hydrate recipe" and ErrCodeInternal—update the call
site where recipe.HydrateResult(r) is checked to use errors.PropagateOrWrap(err,
errors.ErrCodeInternal, "hydrate recipe") instead of errors.Wrap.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 0648c27e-82ec-46d1-8f17-7c3aa625303d
📒 Files selected for processing (41)
aicr.goaicr_internal_test.goaicr_test.gobundle.gobundle_test.godocs/integrator/go-library.mddocs/integrator/public-api.mdoptions.gopkg/api/bundle_handler.gopkg/api/bundle_handler_test.gopkg/api/recipe_handler.gopkg/api/recipe_handler_test.gopkg/api/server.gopkg/api/server_test.gopkg/bundler/handler.gopkg/cli/bundle.gopkg/cli/config_integration_test.gopkg/cli/mirror.gopkg/cli/query.gopkg/cli/recipe.gopkg/cli/recipe_test.gopkg/cli/root.gopkg/cli/validate.gopkg/cli/validate_evidence.gopkg/config/resolve.gopkg/recipe/criteria.gopkg/recipe/criteria_registry.gopkg/recipe/criteria_registry_test.gopkg/recipe/handler_query.gopkg/recipe/handler_query_test.gopkg/recipe/loader.gopkg/recipe/loader_provider_test.gopkg/recipe/metadata.gopkg/recipe/metadata_store.gopkg/validator/catalog/catalog.gopkg/validator/catalog/catalog_test.gopkg/validator/options.gopkg/validator/types.gopkg/validator/validator.goquery.gotypes.go
| // Build the Client from the resolved data source (--data flag, else | ||
| // spec.recipe.data). A filesystem source layers the external dir over | ||
| // the embedded data; an empty data dir uses the embedded data only. The | ||
| // Client owns its DataProvider — LoadRecipe and MakeBundle thread it | ||
| // through, replacing the old process-global data provider. | ||
| dataDir := cmd.String("data") | ||
| if dataDir == "" { | ||
| dataDir = cfg.Recipe().DataDir() | ||
| } | ||
| source := aicr.EmbeddedSource() | ||
| if dataDir != "" { | ||
| slog.Info("initializing external data provider", "directory", dataDir) | ||
| source = aicr.FilesystemSource(dataDir) | ||
| } | ||
| client, err := aicr.NewClient( | ||
| aicr.WithRecipeSource(source), | ||
| aicr.WithVersion(version), | ||
| ) | ||
| if err != nil { | ||
| return errors.Wrap(errors.ErrCodeInternal, "failed to initialize data provider", err) | ||
| } | ||
| defer func() { _ = client.Close() }() |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Extract shared client construction to avoid duplication.
This client construction block duplicates recipeClientFromCmd in root.go (same data-dir resolution, source selection, logging message, and error wrapping). Consider calling recipeClientFromCmd(cmd, cfg) here instead of reimplementing the logic inline.
♻️ Suggested refactor
- // Build the Client from the resolved data source (--data flag, else
- // spec.recipe.data). A filesystem source layers the external dir over
- // the embedded data; an empty data dir uses the embedded data only. The
- // Client owns its DataProvider — LoadRecipe and MakeBundle thread it
- // through, replacing the old process-global data provider.
- dataDir := cmd.String("data")
- if dataDir == "" {
- dataDir = cfg.Recipe().DataDir()
- }
- source := aicr.EmbeddedSource()
- if dataDir != "" {
- slog.Info("initializing external data provider", "directory", dataDir)
- source = aicr.FilesystemSource(dataDir)
- }
- client, err := aicr.NewClient(
- aicr.WithRecipeSource(source),
- aicr.WithVersion(version),
- )
- if err != nil {
- return errors.Wrap(errors.ErrCodeInternal, "failed to initialize data provider", err)
- }
+ client, err := recipeClientFromCmd(cmd, cfg)
+ if err != nil {
+ return err
+ }
defer func() { _ = client.Close() }()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Build the Client from the resolved data source (--data flag, else | |
| // spec.recipe.data). A filesystem source layers the external dir over | |
| // the embedded data; an empty data dir uses the embedded data only. The | |
| // Client owns its DataProvider — LoadRecipe and MakeBundle thread it | |
| // through, replacing the old process-global data provider. | |
| dataDir := cmd.String("data") | |
| if dataDir == "" { | |
| dataDir = cfg.Recipe().DataDir() | |
| } | |
| source := aicr.EmbeddedSource() | |
| if dataDir != "" { | |
| slog.Info("initializing external data provider", "directory", dataDir) | |
| source = aicr.FilesystemSource(dataDir) | |
| } | |
| client, err := aicr.NewClient( | |
| aicr.WithRecipeSource(source), | |
| aicr.WithVersion(version), | |
| ) | |
| if err != nil { | |
| return errors.Wrap(errors.ErrCodeInternal, "failed to initialize data provider", err) | |
| } | |
| defer func() { _ = client.Close() }() | |
| client, err := recipeClientFromCmd(cmd, cfg) | |
| if err != nil { | |
| return err | |
| } | |
| defer func() { _ = client.Close() }() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/cli/bundle.go` around lines 605 - 626, The client construction block
duplicates logic in recipeClientFromCmd; replace the inline data-dir resolution
and aicr.NewClient call with a single call to recipeClientFromCmd(cmd, cfg),
capture the returned client and error, and keep the same defer to close the
client (defer func() { _ = client.Close() }()). Ensure you remove the duplicated
variables/dataDir logic and still return the wrapped error from the
recipeClientFromCmd failure path so behavior and logging (including the
"initializing external data provider" message) remain identical.
| // Build the Client from the resolved data source (--data flag, | ||
| // else spec.recipe.data). A filesystem source layers the external | ||
| // dir over the embedded data; an empty data dir uses the embedded | ||
| // data only. The Client owns its DataProvider, replacing the old | ||
| // process-global data provider. dataProvider is the | ||
| // same provider, built once so it can also be threaded into evidence | ||
| // emission (so --data overlays resolve the validator catalog against | ||
| // the command's source rather than the package global). | ||
| dataDir := cmd.String("data") | ||
| if dataDir == "" { | ||
| dataDir = cfg.Recipe().DataDir() | ||
| } | ||
| source := aicr.EmbeddedSource() | ||
| if dataDir != "" { | ||
| slog.Info("initializing external data provider", "directory", dataDir) | ||
| source = aicr.FilesystemSource(dataDir) | ||
| } | ||
| client, err := aicr.NewClient( | ||
| aicr.WithRecipeSource(source), | ||
| aicr.WithVersion(version), | ||
| ) | ||
| if err != nil { | ||
| return errors.PropagateOrWrap(err, errors.ErrCodeInternal, "failed to initialize data provider") | ||
| } | ||
| defer func() { _ = client.Close() }() |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Consolidate client construction with recipeClientFromCmd.
This block duplicates the client construction logic from recipeClientFromCmd in root.go. Consider reusing that helper to avoid drift.
♻️ Suggested refactor
- // Build the Client from the resolved data source (--data flag,
- // else spec.recipe.data). A filesystem source layers the external
- // dir over the embedded data; an empty data dir uses the embedded
- // data only. The Client owns its DataProvider, replacing the old
- // process-global data provider. dataProvider is the
- // same provider, built once so it can also be threaded into evidence
- // emission (so --data overlays resolve the validator catalog against
- // the command's source rather than the package global).
- dataDir := cmd.String("data")
- if dataDir == "" {
- dataDir = cfg.Recipe().DataDir()
- }
- source := aicr.EmbeddedSource()
- if dataDir != "" {
- slog.Info("initializing external data provider", "directory", dataDir)
- source = aicr.FilesystemSource(dataDir)
- }
- client, err := aicr.NewClient(
- aicr.WithRecipeSource(source),
- aicr.WithVersion(version),
- )
- if err != nil {
- return errors.PropagateOrWrap(err, errors.ErrCodeInternal, "failed to initialize data provider")
- }
+ client, err := recipeClientFromCmd(cmd, cfg)
+ if err != nil {
+ return err
+ }
defer func() { _ = client.Close() }()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/cli/validate.go` around lines 747 - 771, This code duplicates client
construction logic from recipeClientFromCmd; replace the block that builds
dataDir/source and calls aicr.NewClient with a single call to
recipeClientFromCmd (passing the command context/flags and cfg and version as
required), handle its returned client and error the same way, and keep the defer
client.Close() cleanup; ensure you remove the local dataDir/source logic so the
shared recipeClientFromCmd is the single source of truth.
| var criteriaRegistryCache sync.Map // map[DataProvider]*criteriaRegistryCacheEntry | ||
|
|
||
| // GetCriteriaRegistryFor returns the criteria registry for the supplied | ||
| // DataProvider, constructing it lazily on first access. Concurrent callers | ||
| // with the same provider observe the same singleton; distinct providers | ||
| // populate distinct cache entries and never share state. Each per-provider | ||
| // registry honors AICR_CRITERIA_STRICT at construction, exactly like the | ||
| // global default. | ||
| // | ||
| // A nil provider falls back to GetDataProvider() so the legacy global path — | ||
| // DefaultRegistry and the package-level ParseCriteria*Type shims — continues | ||
| // to work transparently and shares a single registry with the embedded | ||
| // provider. | ||
| func GetCriteriaRegistryFor(dp DataProvider) *CriteriaRegistry { | ||
| if dp == nil { | ||
| dp = GetDataProvider() //nolint:staticcheck // back-compat fallback for pre-WithDataProvider callers (#983 Stage 2) | ||
| } | ||
| e, _ := criteriaRegistryCache.LoadOrStore(dp, &criteriaRegistryCacheEntry{}) | ||
| entry := e.(*criteriaRegistryCacheEntry) | ||
| entry.once.Do(func() { | ||
| entry.registry = newCriteriaRegistry() | ||
| }) | ||
| return entry.registry | ||
| } |
There was a problem hiding this comment.
Key criteria-registry cache entries by provider generation, not only provider identity.
Caching only on DataProvider identity can preserve stale registries when a settable provider’s underlying state changes while identity remains stable. Include the provider generation in the cache key so late-bound configuration reliably rebuilds.
As per coding guidelines, "Sync.Once caching — Key cache by generation counter when state depends on settable global (e.g., DataProvider-tied registry); recompute on miss so late-bound config takes effect".
Also applies to: 299-301
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/recipe/criteria_registry.go` around lines 235 - 258, The cache currently
keys only by DataProvider identity (criteriaRegistryCache and
GetCriteriaRegistryFor), which preserves stale registries when a mutable
provider's internal state changes; change the cache key to include the
provider's generation counter so a provider state change creates a new entry.
Concretely, in GetCriteriaRegistryFor compute a composite key (e.g., struct{dp
DataProvider; gen uint64} or a string like fmt.Sprintf("%p-%d", dp, gen)) where
gen is obtained from the provider's generation accessor (e.g., dp.Generation()
or the appropriate GetGeneration/GetVersion method), then LoadOrStore using that
composite key and keep the rest (criteriaRegistryCacheEntry, once,
newCriteriaRegistry) unchanged; apply the same keying change to the other cache
use sites noted around the 299-301 region.
| cat, err := catalog.LoadWithDataProvider(v.dataProvider, v.Version, v.Commit) | ||
| if err != nil { | ||
| return nil, errors.Wrap(errors.ErrCodeInternal, "failed to load validator catalog", err) | ||
| } |
There was a problem hiding this comment.
Preserve structured catalog-load error codes instead of forcing ErrCodeInternal.
Both call sites currently rewrap catalog-load errors with ErrCodeInternal, which masks upstream structured codes (for example malformed catalog input). Propagate-or-wrap instead so existing coded errors stay intact.
Suggested patch
- cat, err := catalog.LoadWithDataProvider(v.dataProvider, v.Version, v.Commit)
+ cat, err := catalog.LoadWithDataProvider(v.dataProvider, v.Version, v.Commit)
if err != nil {
- return nil, errors.Wrap(errors.ErrCodeInternal, "failed to load validator catalog", err)
+ return nil, errors.PropagateOrWrap(err, errors.ErrCodeInternal, "failed to load validator catalog")
}
@@
- cat, err := catalog.LoadWithDataProvider(v.dataProvider, v.Version, v.Commit)
+ cat, err := catalog.LoadWithDataProvider(v.dataProvider, v.Version, v.Commit)
if err != nil {
- return nil, errors.Wrap(errors.ErrCodeInternal, "failed to load validator catalog", err)
+ return nil, errors.PropagateOrWrap(err, errors.ErrCodeInternal, "failed to load validator catalog")
}As per coding guidelines: “Don't double-wrap errors that already have proper codes; propagate as-is to preserve error code.”
Also applies to: 240-243
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/validator/validator.go` around lines 182 - 185, The
catalog.LoadWithDataProvider call currently always rewraps any error as
errors.ErrCodeInternal (in the block around
catalog.LoadWithDataProvider(v.dataProvider, v.Version, v.Commit)), which hides
upstream structured codes; change the handling to propagate coded errors
unmodified and only wrap unknown errors: after the call, detect if the returned
err already carries a structured code (the package's coded/error type), and if
so return it directly, otherwise wrap with errors.Wrap(errors.ErrCodeInternal,
"failed to load validator catalog", err). Apply the same propagate-or-wrap
change to the other occurrence around lines 240-243 so existing catalog error
codes are preserved.
| hydrated, err := recipe.HydrateResult(r) | ||
| if err != nil { | ||
| return nil, errors.Wrap(errors.ErrCodeInternal, "hydrate recipe", err) | ||
| } |
There was a problem hiding this comment.
Preserve structured error codes from hydration failures.
recipe.HydrateResult errors may already carry a structured code; forcing ErrCodeInternal here can clobber it. Use errors.PropagateOrWrap to preserve existing codes.
Suggested fix
hydrated, err := recipe.HydrateResult(r)
if err != nil {
- return nil, errors.Wrap(errors.ErrCodeInternal, "hydrate recipe", err)
+ return nil, errors.PropagateOrWrap(err, errors.ErrCodeInternal, "hydrate recipe")
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@query.go` around lines 29 - 32, The current handling of errors from
recipe.HydrateResult always wraps the error with errors.ErrCodeInternal, which
can overwrite existing structured error codes; replace the return logic to use
errors.PropagateOrWrap (or the library's propagate helper) so that if the
hydrated error already carries a code it is propagated unchanged, otherwise it
is wrapped with a contextual message like "hydrate recipe" and
ErrCodeInternal—update the call site where recipe.HydrateResult(r) is checked to
use errors.PropagateOrWrap(err, errors.ErrCodeInternal, "hydrate recipe")
instead of errors.Wrap.
# Conflicts: # aicr_test.go # pkg/cli/recipe.go
The criteria-registry seeding/strict tests construct per-provider registries that honor AICR_CRITERIA_STRICT at build time. make test sets it to 1 (the CI gate), which hid external --data criteria values and broke the seeding/precondition assertions. Clear the env via t.Setenv so each test's freshly-built registry starts non-strict; strict behavior is still exercised explicitly via SetStrict.
|
@hkii this PR now has merge conflicts with |
Thanks, this is a useful read and most of it I agree with. Going point by point:
And you caught the right thing on "parity": the validation-timeout uncap is a real behavior change, not parity. The old CLI was bounded only
The #1078 ordering is the one with real consequences, so I'd like to settle that before I cut PR-1. Happy to pair on it. |
|
🌿 Preview your docs: https://nvidia-preview-feat-cli-api-consume-client.docs.buildwithfern.com/aicr |
|
Closing in favor of the 2-PR split @hkii landed:
Thanks @hkii for splitting this. The 3 correctness fixes called out in earlier review (deep-copy Epic tracking moves to #1107 → #1108, then #1078 + #1080 to close #1076. |
Summary
Route the CLI and API server entirely through the public
aicr.Clientfacade, and replace the process-globalDataProvider/criteria registry with per-Clientstate. This makes the facade the single integration surface for both internal consumers and external Go importers, and lets multipleClients resolve recipes concurrently without clobbering shared globals.Motivation / Context
AICR maintained three independent component-initialization paths: the CLI used the deprecated
recipe.SetDataProviderprocess-global; the REST server directly constructedrecipe.Builder,bundler.Bundler, andvalidator.Validator; and the facade used the new per-instanceaicr.Client. Every change to wiring policy had to be applied in all three places. Per the project principle that "business logic lives in functional packages so CLI and API can both use it," the facade is the proper single aggregation point.This PR migrates the CLI and REST server to consume
aicr.Clientas thin presentation layers, retires the legacy global-state and direct-construction wiring, and extends the facade (rather than bypassing it) where new capabilities were needed — preserving behavioral parity across all CLI flags and REST endpoints.Fixes: #1077
Related: N/A
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)cmd/aicrd,pkg/api,pkg/server)pkg/recipe)pkg/bundler,pkg/component/*)pkg/validator)docs/,examples/)aicr.go,bundle.go,query.go,options.go,types.go)Implementation Notes
Facade surface (new):
Client.LoadRecipe(provider-bound file load),MakeBundle(full deployer-mode bundle),ResolveRecipeFromSnapshot(constraint eval vs a snapshot),ResolveRecipeFromCriteria,SelectFromRecipe(hydrate + select),LoadCatalog,Resolved(), plusCriteriaRegistryaccessor.WithVersion,WithAllowLists+ParseAllowListsFromEnv(with enforcement),EmbeddedSourcefor embedded-only data, and validation optionsWithValidationPhases/Timeout/Commit/ImageRegistryOverride/ImageTagOverride.Recipe/AllowLists/Criteria).Per-
Clientisolation (replaces process global):pkg/recipegains a per-DataProvidercriteria registry with a global shim for back-compat;LoadFromFileWithProviderandcatalog.Loadnow take aDataProviderinstead of reading a hidden global.pkg/configaddsResolveCriteriaWithRegistryfor per-provider validation.initDataProvideris dropped;recipe/query/mirror/bundle/validateand API/v1/recipe/v1/query/v1/bundleall consumeaicr.Client.bundler.Newis removed fromserver.go.Correctness fixes folded in:
ComponentRef.Overridesnested values in clone, and deep-copy the recipe inAdoptRecipe, to preserveClientisolation (no slice/map aliasing across overlay merges).validatepreserves structured error codes viaPropagateOrWrap.*time.Duration): the CLI passes0to uncap so an all-phase run isn't cut short by a fixed cap; controllers keep the default. Tolerations use an explicittolerationsSetflag so a nil/empty override correctly clears the validator's default tolerate-all instead of being dropped.Testing
New/expanded coverage:
aicr_test.go,aicr_internal_test.go,bundle_test.go,pkg/api/{recipe,bundle}_handler_test.go,pkg/recipe/{criteria_registry,loader_provider}_test.go,pkg/validator/catalog/catalog_test.go.Risk Assessment
DataProvider/criteria registry is wired. Behavior is preserved via a global shim and defensive copies; no external API contract changes.Rollout notes: No migration required for external importers — the
aicr.Clientsurface is additive and the prior process-global path retains a back-compat shim. Concurrent-Clientisolation is a v0.12+ guarantee (documented inaicr.go); long-running consumers should retain and reuseClients and callClose()to avoid monotonic cache growth.Checklist
make testwith-race)make lint)docs/integrator/go-library.md,docs/integrator/public-api.md)git commit -S)