Skip to content

Add azure deployment skill#842

Open
shrek wants to merge 7 commits into
NVIDIA:mainfrom
shrek:add-azure-skill
Open

Add azure deployment skill#842
shrek wants to merge 7 commits into
NVIDIA:mainfrom
shrek:add-azure-skill

Conversation

@shrek
Copy link
Copy Markdown
Collaborator

@shrek shrek commented May 1, 2026

Earth2Studio Pull Request

Description

Add a skill that helps deployment of an Earth2Studio inference container on azure. It helps user navigate the process of building the container, deploying it as an azure ml online endpoint, and then test out the inference. In my tests, I found it a useful helper to navigate azure cli, and to debug errors.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.
  • Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

@shrek shrek requested review from NickGeneva and swbg May 1, 2026 16:42
@shrek shrek marked this pull request as ready for review May 4, 2026 21:53
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 4, 2026

Greptile Summary

This PR adds a Claude skill and a set of Azure ML deployment runbooks, YAML templates, and a smoke-test request to help users deploy the Earth2Studio inference server as an Azure ML managed online endpoint. The documentation (runbooks, SKILL.md, smoke request) is well-structured and the operational guidance is accurate, but the canonical YAML templates carry real-looking internal resource identifiers instead of placeholders.

  • foundry_fcn3.deployment.yml hardcodes e2scontainerregistry.azurecr.io/e2s-scicomp:981d7bc2, azureml:e2s_fcn3_stormscope:1, and a date-specific endpoint name. In a public repo these expose internal Azure identifiers, and any external user who starts from this template will fail immediately because the image tag and model registration don't exist in their environment. These should use bracketed placeholders matching the style of the markdown runbooks.

Confidence Score: 3/5

Safe to merge after replacing hardcoded internal resource identifiers in the YAML templates with bracketed placeholders.

One P1 security/usability finding in the deployment YAML template (real internal ACR URL, model registration, and date-specific endpoint name committed to a public repo). The rest of the documentation is accurate and well-written. Score is held at 3 due to the P1 with a security dimension.

serve/server/deployment/azure/foundry_fcn3.deployment.yml and serve/server/deployment/azure/foundry_fcn3.endpoint.yml need their hardcoded internal resource values replaced with placeholders.

Security Review

  • Infrastructure identifier exposure (serve/server/deployment/azure/foundry_fcn3.deployment.yml): The file committed to this public repo contains a real ACR hostname (e2scontainerregistry.azurecr.io), a specific git-SHA image tag, and an internal Azure ML model registration (azureml:e2s_fcn3_stormscope:1). These identifiers are enumerable by anyone with access to the repo and could be used to probe or target the underlying Azure infrastructure.

Important Files Changed

Filename Overview
.claude/skills/deploy-earth2studio-azure/SKILL.md New Claude skill for Azure ML deployment, with clear workflow steps, IAM/networking guidance, and iterative improvement instructions. Well structured.
serve/server/deployment/azure/foundry_fcn3.deployment.yml Deployment template contains hardcoded internal NVIDIA ACR URL, specific git-SHA image tag, and internal Azure ML model registration — not placeholder values — which exposes internal infra identifiers in a public repo and will fail for any external consumer.
serve/server/deployment/azure/foundry_fcn3.endpoint.yml Minimal endpoint definition with key auth and a date-embedded name; functional but the specific name mirrors the deployment YAML issue.
serve/server/deployment/azure/azure-ml-managed-online.md Clear operational runbook for creating endpoints, assigning roles, and getting logs; all placeholder values are properly bracketed.
serve/server/deployment/azure/earth2studio-serving.md Build/push instructions and container behavior notes; placeholder values are bracketed and paths are correct.
serve/server/deployment/azure/inference-and-results.md Comprehensive inference testing guide covering CLI, direct HTTP, and xarray access patterns; consolidated=True is correct since the server calls zarr.consolidate_metadata before upload.
serve/server/deployment/azure/requests/foundry_fcn3_smoke.json Minimal smoke-test request with sensible defaults (n_steps=1, n_samples=1); looks correct.

Reviews (1): Last reviewed commit: "Merge branch 'main' into add-azure-skill" | Re-trigger Greptile

Comment thread serve/server/deployment/azure/foundry_fcn3.deployment.yml Outdated
Comment thread serve/server/deployment/azure/foundry_fcn3.endpoint.yml Outdated
@shrek shrek force-pushed the add-azure-skill branch from 1b44781 to c02002b Compare May 5, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant