This guide covers deploying AI applications to production using Azure Developer CLI (azd) and infrastructure as code.
The fastest path to production using azd:
# Initialize project
azd init
# Deploy everything
azd up
# Deploy only infrastructure
azd provision
# Deploy only application
azd deployFor enterprise deployments with full governance:
# Clone with submodules
git clone --recurse-submodules \
https://github.com/microsoft/Deploy-Your-AI-Application-In-Production.git
cd Deploy-Your-AI-Application-In-Production
# Deploy (~45 minutes)
azd upWhat you get:
- Azure AI Foundry with OpenAI models
- Microsoft Fabric with lakehouses
- Azure AI Search with vector search
- Microsoft Purview governance
- Private networking throughout
Deploy the foundation separately:
# Bicep
cd AI-Landing-Zones/bicep
az deployment sub create -l eastus -f main.bicep
# Terraform
cd AI-Landing-Zones/terraform
terraform init
terraform apply| Requirement | How to Check |
|---|---|
| Azure Subscription | az account show |
| Sufficient quota | Check Azure Portal → Quotas |
| Required permissions | Owner or Contributor + UAA |
| Azure CLI installed | az --version (≥2.61.0) |
| Azure Developer CLI | azd version (≥1.15.0) |
# List current quota
az cognitiveservices account list-skus \
--location eastus \
--query "[?contains(name, 'OpenAI')]"| Role | Scope | Purpose |
|---|---|---|
| Contributor | Subscription | Create resources |
| User Access Administrator | Subscription | Assign roles |
| Azure AI User | Resource Group | Use AI services |
infra/
├── main.bicep # Entry point
├── main.bicepparam # Parameters
├── modules/
│ ├── ai-foundry.bicep # AI Foundry resources
│ ├── networking.bicep # VNets, private endpoints
│ ├── security.bicep # Key Vault, RBAC
│ └── monitoring.bicep # Log Analytics, App Insights
└── scripts/
└── post-deploy.ps1 # Post-deployment configuration
// main.bicep
targetScope = 'resourceGroup'
param location string = resourceGroup().location
param environmentName string
param principalId string
module aiFoundry 'modules/ai-foundry.bicep' = {
name: 'ai-foundry'
params: {
location: location
name: 'aif-${environmentName}'
principalId: principalId
}
}
module privateEndpoints 'modules/networking.bicep' = {
name: 'private-endpoints'
params: {
aiFoundryId: aiFoundry.outputs.id
vnetId: networking.outputs.vnetId
}
}# azure.yaml
name: my-ai-app
metadata:
template: azd-ai-starter
services:
api:
project: ./src/api
language: python
host: containerapp
web:
project: ./src/web
language: typescript
host: staticwebapp
infra:
provider: bicep
path: ./infraAll production deployments should use private endpoints:
resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-05-01' = {
name: 'pe-${serviceName}'
location: location
properties: {
subnet: {
id: subnetId
}
privateLinkServiceConnections: [
{
name: 'plsc-${serviceName}'
properties: {
privateLinkServiceId: serviceId
groupIds: [groupId]
}
}
]
}
}resource managedIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
name: 'id-${environmentName}'
location: location
}
resource roleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(resourceGroup().id, managedIdentity.id, 'Cognitive Services User')
properties: {
roleDefinitionId: subscriptionResourceId(
'Microsoft.Authorization/roleDefinitions',
'a97b65f3-24c7-4388-baec-2e87135dc908' // Cognitive Services User
)
principalId: managedIdentity.properties.principalId
principalType: 'ServicePrincipal'
}
}# Python application
from azure.monitor.opentelemetry import configure_azure_monitor
from opentelemetry import trace
configure_azure_monitor(
connection_string=os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"]
)
tracer = trace.get_tracer(__name__)// Agent performance metrics
customMetrics
| where name startswith "agent."
| summarize
avg_latency = avg(value),
p95_latency = percentile(value, 95),
call_count = count()
by name, bin(timestamp, 1h)
| order by timestamp desc# .github/workflows/deploy.yml
name: Deploy to Azure
on:
push:
branches: [main]
pull_request:
branches: [main]
permissions:
id-token: write
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install azd
uses: Azure/setup-azd@v1
- name: Log in with Azure (Federated Credentials)
run: |
azd auth login \
--client-id "${{ secrets.AZURE_CLIENT_ID }}" \
--federated-credential-provider github \
--tenant-id "${{ secrets.AZURE_TENANT_ID }}"
- name: Provision Infrastructure
run: azd provision --no-prompt
env:
AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
AZURE_ENV_NAME: ${{ github.ref == 'refs/heads/main' && 'prod' || 'staging' }}
- name: Deploy Application
run: azd deploy --no-prompt# Check service health
curl https://your-app.azurewebsites.net/health
# Verify AI endpoint
curl -X POST https://your-ai-foundry.openai.azure.com/openai/deployments/gpt-4o/chat/completions \
-H "Authorization: Bearer $(az account get-access-token --resource https://cognitiveservices.azure.com --query accessToken -o tsv)" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello"}]}'import pytest
import httpx
@pytest.mark.smoke
async def test_agent_endpoint_responds():
async with httpx.AsyncClient() as client:
response = await client.post(
f"{BASE_URL}/api/agent/chat",
json={"message": "Hello"},
timeout=30.0
)
assert response.status_code == 200
assert "response" in response.json()