This template provides a complete foundation for building intelligent AI agents using Microsoft Semantic Kernel SDK for Python with Azure AI Foundry integration. The solution features a two-tier architecture with a FastAPI-based backend agent and a web-based frontend, all deployed on Azure App Service.
The solution also implements the AI Gateway patterns and capabilities of Azure API Management service, including secure load-balanced backend AI models access and policies for requests, token quotas and limits.
The deployment follows a multi-resource group design with infrastructure-as-code using Azure Bicep modules, providing enterprise-grade security, monitoring, and scalability.
- Features - Detailed overview of AI agent backend, frontend, infrastructure services and security features
- Getting Started - Setup options including GitHub Codespaces, VS Code Dev Containers, and local environment
- Quickstart - Provisioning, local development, extending the solution, and cleanup
- Guidance - Region availability, quotas, dependencies, configuration, monitoring, security and performance
- User enters a question in the frontend chat interface
- Frontend sends POST request to backend
/chatendpoint with session ID and user input - Backend agent retrieves conversation history from Cosmos DB
- Semantic Kernel processes the input using configured plugins
- Agent may invoke MCP plugins (Microsoft Learn docs, weather data) as needed
- AI model response is generated via APIM-proxied Azure AI Foundry endpoints
- Response is persisted to Cosmos DB with tool usage tracking
- Backend returns response with answer, used tools, and token metrics
- Frontend displays the response in the chat interface
API Management service acts as an AI gateway and intelligent load balancer for Azure OpenAI model deployments:
- Round-Robin Distribution: Requests are distributed across multiple model deployment instances
- Retry Logic: Automatic retry on transient failures (429, 503 errors)
- Backend Selection: Policy-based routing to available model endpoints
- Monitoring: Full telemetry through Application Insights
| Component | Description |
|---|---|
| APIM Load Balancing | Load balancing types, configuration options, and traffic distribution strategies for Azure OpenAI model deployments |
| APIM Load Balancing Examples | Bicep configuration examples for round-robin, weighted, and priority-based load balancing scenarios |
| APIM Policies | APIM policy definitions for managed identity authentication, rate limiting, token quotas, and security controls |
| APIM Application Insights | Application Insights integration setup for API-level logging, sampling, and monitoring configuration |
| APIM Azure Monitor | Azure Monitor integration setup for API-level logging, sampling, and monitoring configuration including LLM messages |
The agent extends its capabilities through Model Context Protocol (MCP) servers, available for the AI Agent as tools. When used, tools name and parameters are returned and displayed in the user interface after the model response.
For a complete guide to MCP server integration, available servers, and adding new plugins, check the MCP Servers Guide.
Example of a model response including the Weather MCP tool result:
You can estimate the cost of this project's architecture with Azure's pricing calculator.
Key cost components:
- Azure OpenAI/AI Foundry: Pay-per-token pricing based on model and usage
- API Management: Developer SKU for development/testing
- App Service: Linux-based plan (shared across frontend and backend)
- Cosmos DB: Request Units (RU/s) based on conversation activity
- Application Insights: Data ingestion and retention
- Key Vault: Transaction-based pricing
- Storage Account: Minimal cost for blob/table/queue storage
Cost optimization tips:
- Use GPT-4.1-mini for lower token costs when appropriate
- Limit conversation history (max_items) to reduce Cosmos DB RU consumption
- Use APIM Standard SKU for production (caching, higher limits)
- Configure Application Insights sampling in high-traffic scenarios
Microsoft Semantic Kernel:
- Semantic Kernel Documentation
- Semantic Kernel Python SDK
- Model Context Protocol (MCP)
- MCP Server Integration Guide
Azure Services:
- Azure Developer CLI (azd)
- Azure AI Foundry
- Azure OpenAI Service
- Azure API Management
- Azure App Service (Python)
- Azure Cosmos DB
- Azure Application Insights
Development Tools:

