Azure FastAPI Wrapper over Azure OpenAI & Azure AI Agent Service

A FastAPI-based wrapper service for Azure OpenAI and Azure AI Agent Service with health monitoring, designed to run on Azure Container Apps with API Management load balancing. This service can be exposed as an MCP (Model Context Protocol) server through APIM.

Features

✅ FastAPI wrapper for Azure OpenAI completion and chat endpoints
✅ Azure AI Agent Service wrapper with Bing grounding capabilities
✅ Structured JSON responses with citations from grounded agents
✅ MCP Server deployment via Azure API Management
✅ Health check endpoint with Azure OpenAI connectivity verification
✅ Returns proper HTTP status codes (200, 401, 429, 503)
✅ Ready for Azure Container Apps deployment
✅ APIM policies for load balancing with session affinity
✅ Circuit breaker pattern for automatic failover
✅ Automatic backend recovery on health restoration
✅ Extensible agent architecture using Abstract Base Classes

Architecture Overview

┌──────────────────────────────────────────────────────────────┐
│                    [CLIENT APPLICATIONS]                      │
│              (MCP Clients, Web Apps, APIs)                   │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│              [AZURE API MANAGEMENT]                          │
│              • MCP Server Endpoint                           │
│              • Load Balancing                                │
│              • Circuit Breaker                               │
│              • Session Affinity                              │
└─────────────────────────┬────────────────────────────────────┘
                          │
         ┌────────────────┼────────────────┐
         │                │                │
         ▼                ▼                ▼
┌─────────────────┐ ┌─────────────────┐ ...
│  [CONTAINER     │ │  [CONTAINER     │
│   APP #1]       │ │   APP #2]       │
│                 │ │                 │
│  FastAPI Server │ │  FastAPI Server │
│  ┌───────────┐  │ │  ┌───────────┐  │
│  │ OpenAI    │  │ │  │ OpenAI    │  │
│  │ Wrapper   │  │ │  │ Wrapper   │  │
│  └───────────┘  │ │  └───────────┘  │
│  ┌───────────┐  │ │  ┌───────────┐  │
│  │ AI Agent  │  │ │  │ AI Agent  │  │
│  │ Wrapper   │  │ │  │ Wrapper   │  │
│  └─────┬─────┘  │ │  └─────┬─────┘  │
└────────┼────────┘ └────────┼────────┘
         │                   │
         └───────────┬───────┘
                     │
                     ▼
         ┌─────────────────────────┐
         │ [AZURE AI AGENT SERVICE]│
         │  • Bing Grounding       │
         │  • Citation Extraction  │
         │  • Thread Management    │
         └─────────────────────────┘

Getting Started

Local Development

Create virtual environment
```
_env_create.bat
```
Activate virtual environment
```
_env_activate.bat
```
Install dependencies
```
_install.bat
```

Configure environment variables

Copy env.sample to .env

Fill in your Azure OpenAI credentials:

OPENAI_ENDPOINT="https://your-instance.openai.azure.com/"
OPENAI_API_KEY="your-api-key"
OPENAI_API_VERSION="2025-01-01-preview"
OPENAI_MODEL_DEPLOYMENT_NAME="gpt-4"
OPENAI_PROMPT="You are a helpful assistant."

Start the server
```
_run_server.bat
```

The API will be available at http://localhost:8000

API Endpoints

GET /health

Health check endpoint that verifies Azure OpenAI connectivity.

Response Codes:

200 - Service healthy, Azure OpenAI connected
401 - Azure OpenAI authentication failed
429 - Azure OpenAI rate limit exceeded
503 - Azure OpenAI service unavailable or connection error
500 - Unexpected error

Example:

curl http://localhost:8000/health

Success Response:

{
  "status": "ok",
  "azure_openai": "connected"
}

Error Response (503):

{
  "status": "error",
  "error": "service_unavailable",
  "message": "Azure OpenAI service unavailable",
  "details": "..."
}

GET /completion

Simple completion endpoint with a single query parameter.

Parameters:

query (string, optional) - Default: "how are you?"

Example:

curl "http://localhost:8000/completion?query=Tell me a joke"

POST /chat

Chat endpoint supporting message history.

Request Body:

{
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
}

Example:

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is Azure?"}
    ]
  }'

POST /bing-grounding

Azure AI Agent wrapper endpoint with Bing grounding and citation support. This endpoint wraps an Azure AI Agent Service agent that uses Bing search for grounded responses.

Parameters:

query (string, required) - The user query to process

Response: JSON with structured content and citations

Example:

curl -X POST "http://localhost:8000/bing-grounding?query=What happened in finance today?"

Success Response:

{
  "content": "Today in finance, the U.S. stock market saw a sharp decline, with the Dow Jones Industrial Average plunging almost 800 points (down 1.6%), and both the Nasdaq and S&P 500 also posting significant losses...",
  "citations": [
    {
      "id": 1,
      "type": "url",
      "url": "https://www.marketwatch.com/...",
      "title": "Stock Market News Today"
    },
    {
      "id": 2,
      "type": "url",
      "url": "https://www.cnbc.com/...",
      "title": "Federal Reserve Commentary"
    }
  ]
}

Features:

✅ Grounded responses using Bing search
✅ Automatic citation extraction and formatting
✅ Clean content (inline citation markers removed)
✅ Structured JSON response

Azure AI Agent Wrapper

This service provides a FastAPI wrapper around Azure AI Agent Service, enabling you to expose AI agents as REST APIs that can be consumed by any application or deployed as an MCP server.

Agent Architecture

The wrapper uses an Abstract Base Class (ABC) pattern for extensibility:

agents/
├── base_agent.py              # Abstract base class for all agents
└── agent_bing_grounding.py    # Bing grounding agent implementation

BaseAgent (ABC)

class BaseAgent(ABC):
    """Abstract base class for all agents"""
    
    def __init__(self, endpoint: str = None, agent_id: str = None):
        self.endpoint = endpoint
        self.agent_id = agent_id
    
    @abstractmethod
    def chat(self, message: str) -> str:
        """Process a message and return response"""
        pass

BingGroundingAgent

Concrete implementation that:

Connects to Azure AI Agent Service
Creates conversation threads
Extracts and formats citations from Bing-grounded responses
Returns structured JSON with content and citations

Configuration

Add these environment variables to .env:

# Azure AI Agent Configuration
AZURE_AI_PROJECT_ENDPOINT="https://your-project.services.ai.azure.com/api/projects/yourProject"
AZURE_AI_AGENT_ID="asst_xxxxxxxxxxxxx"

Extending with New Agents

To add a new agent type, simply:

Create a new agent class that inherits from BaseAgent
Implement the chat() method
Add appropriate configuration to .env

Example:

class CustomAgent(BaseAgent):
    def __init__(self):
        endpoint = os.getenv("CUSTOM_AGENT_ENDPOINT")
        agent_id = os.getenv("CUSTOM_AGENT_ID")
        super().__init__(endpoint=endpoint, agent_id=agent_id)
    
    def chat(self, message: str) -> str:
        # Your custom implementation
        pass

MCP Server via APIM

This FastAPI service can be deployed as an MCP (Model Context Protocol) server through Azure API Management, enabling AI applications to consume your Azure AI agents through a standardized protocol.

MCP Server Architecture

┌─────────────────────────────────────────┐
│      [MCP CLIENT APPLICATIONS]          │
│   (Claude Desktop, IDEs, AI Tools)      │
└──────────────────┬──────────────────────┘
                   │ MCP Protocol
                   ▼
┌─────────────────────────────────────────┐
│      [AZURE API MANAGEMENT]             │
│      • MCP Endpoint Mapping             │
│      • Authentication                   │
│      • Rate Limiting                    │
│      • Load Balancing                   │
└──────────────────┬──────────────────────┘
                   │ HTTPS
                   ▼
┌─────────────────────────────────────────┐
│  [AZURE CONTAINER APPS - FastAPI]       │
│  • /bing-grounding → AI Agent Wrapper   │
│  • /completion → OpenAI Wrapper         │
│  • /chat → OpenAI Chat Wrapper          │
│  • /health → Health Check               │
└─────────────────────────────────────────┘

MCP Server Benefits

Standardized Protocol - MCP clients can discover and use your agents automatically
Enterprise Security - APIM handles authentication, authorization, and rate limiting
Scalability - Load balance across multiple container instances
Monitoring - Centralized logging and analytics through APIM
Version Management - Deploy multiple versions side-by-side

MCP Server Deployment

Deploy FastAPI to Azure Container Apps (see Production Deployment section)
Configure APIM to expose MCP endpoints:
- Map MCP protocol operations to FastAPI endpoints
- Configure CORS for web-based MCP clients
- Set up authentication (API keys, OAuth, etc.)
Register with MCP clients:
- Provide APIM endpoint URL
- Configure authentication credentials
- MCP clients will auto-discover available agents

Example MCP Client Configuration

{
  "mcpServers": {
    "azure-ai-agents": {
      "url": "https://your-apim.azure-api.net",
      "apiKey": "your-apim-subscription-key",
      "endpoints": {
        "bing-grounding": "/bing-grounding",
        "completion": "/completion",
        "chat": "/chat"
      }
    }
  }
}

Azure API Management Setup

Architecture

┌──────────────────────────────────────────────────────────────┐
│                      [INTERNET]                              │
│                       Clients                                │
└─────────────────────────┬────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────────┐
│              [AZURE API MANAGEMENT]                          │
│                                                              │
│  ┌────────────────────────────────────────────────────┐    │
│  │        Load Balancer + Circuit Breaker              │    │
│  │  • Session Affinity (Sticky Sessions)               │    │
│  │  • Health-Based Routing                             │    │
│  │  • Auto Failover & Recovery                         │    │
│  └─────────────────┬────────────────────────────────────┘    │
└────────────────────┼──────────────────────────────────────────┘
                     │
         ┌───────────┼───────────┐
         │           │           │
         ▼           ▼           ▼
┌─────────────┐ ┌─────────────┐ ... (5 instances)
│ ✅ HEALTHY  │ │ ❌ UNHEALTHY│
│ Container   │ │ Container   │
│ App #1      │ │ App #2      │
│ [ACTIVE]    │ │ [REMOVED]   │
└─────────────┘ └─────────────┘

Legend:
  ✅ [HEALTHY]   - Backend available in pool, receiving traffic
  ❌ [UNHEALTHY] - Backend removed from pool, no traffic
  ⏱️ [UNKNOWN]   - Backend status being evaluated

Load Balancing Features

Session Affinity (Sticky Sessions) - Clients stick to the same backend via cookies
Circuit Breaker - Unhealthy backends automatically removed from pool
Auto-Recovery - Backends rejoin when returning 200 OK
Health-Aware Routing - Only route to healthy instances

APIM Policy Details

Main Policy (apim-policy.xml)

This policy provides intelligent load balancing across 5 Azure Container App instances.

<!--
    Azure API Management Policy for Load Balancing with Session Affinity and Circuit Breaker
    
    Features:
    - Cookie-based session affinity (sticky sessions)
    - Automatic circuit breaking based on backend health
    - Failover to healthy instances when backend returns 500, 429, or 401
    - Automatic recovery when backends return 200 OK
    
    Apply this policy at the API level for your main endpoints
-->
<policies>
    <inbound>
        <base />
        
        <!-- Check for healthy backends from cache -->
        <set-variable name="healthyBackends" value="@{
            var allBackends = new[] { "0", "1", "2", "3", "4" };
            var healthyList = new System.Collections.Generic.List<string>();
            
            foreach (var id in allBackends)
            {
                string cacheKey = "backend-health-" + id;
                string healthStatus;
                
                if (context.Cache.TryGetValue(cacheKey, out healthStatus))
                {
                    if (healthStatus == "healthy")
                    {
                        healthyList.Add(id);
                    }
                }
                else
                {
                    healthyList.Add(id);
                }
            }
            
            return healthyList.Count > 0 ? healthyList.ToArray() : allBackends;
        }" />
        
        <!-- Session affinity with health check -->
        <choose>
            <when condition="@(context.Request.Headers.GetValueOrDefault("Cookie","").Contains("APIM-Backend-Instance"))">
                <set-variable name="backendInstance" value="@{
                    string cookie = context.Request.Headers.GetValueOrDefault("Cookie","");
                    var match = System.Text.RegularExpressions.Regex.Match(cookie, @"APIM-Backend-Instance=(\d+)");
                    string requestedId = match.Success ? match.Groups[1].Value : null;
                    var healthyBackends = (string[])context.Variables["healthyBackends"];
                    
                    if (requestedId != null && healthyBackends.Contains(requestedId))
                    {
                        return requestedId;
                    }
                    
                    var random = new Random();
                    return healthyBackends[random.Next(0, healthyBackends.Length)];
                }" />
            </when>
            <otherwise>
                <set-variable name="backendInstance" value="@{
                    var healthyBackends = (string[])context.Variables["healthyBackends"];
                    var random = new Random();
                    return healthyBackends[random.Next(0, healthyBackends.Length)];
                }" />
            </otherwise>
        </choose>
        
        <!-- Set backend URL based on instance ID -->
        <set-backend-service base-url="@{
            string id = context.Variables.GetValueOrDefault<string>("backendInstance", "0");
            var backends = new System.Collections.Generic.Dictionary<string, string> {
                { "0", "https://your-app-instance-1.azurecontainerapps.io" },
                { "1", "https://your-app-instance-2.azurecontainerapps.io" },
                { "2", "https://your-app-instance-3.azurecontainerapps.io" },
                { "3", "https://your-app-instance-4.azurecontainerapps.io" },
                { "4", "https://your-app-instance-5.azurecontainerapps.io" }
            };
            return backends.ContainsKey(id) ? backends[id] : backends["0"];
        }" />
        
        <set-header name="X-APIM-Correlation-Id" exists-action="skip">
            <value>@(Guid.NewGuid().ToString())</value>
        </set-header>
        
        <set-header name="X-Backend-Instance" exists-action="override">
            <value>@(context.Variables.GetValueOrDefault<string>("backendInstance", "0"))</value>
        </set-header>
    </inbound>
    
    <backend>
        <base />
    </backend>
    
    <outbound>
        <base />
        
        <!-- Circuit breaker: Update health status based on response -->
        <choose>
            <when condition="@(context.Response.StatusCode >= 500 || context.Response.StatusCode == 429 || context.Response.StatusCode == 401)">
                <!-- Mark backend as unhealthy for 30 seconds on errors -->
                <cache-store-value key="@("backend-health-" + context.Variables.GetValueOrDefault<string>("backendInstance"))" value="unhealthy" duration="30" />
            </when>
            <when condition="@(context.Response.StatusCode == 200)">
                <!-- Mark backend as healthy on 200 OK response -->
                <cache-store-value key="@("backend-health-" + context.Variables.GetValueOrDefault<string>("backendInstance"))" value="healthy" duration="30" />
            </when>
        </choose>
        
        <!-- Set session affinity cookie -->
        <set-header name="Set-Cookie" exists-action="append">
            <value>@{
                string instance = context.Variables.GetValueOrDefault<string>("backendInstance", "0");
                return $"APIM-Backend-Instance={instance}; Path=/; Max-Age=86400; HttpOnly; Secure; SameSite=Lax";
            }</value>
        </set-header>
        
        <set-header name="X-Served-By-Instance" exists-action="override">
            <value>@(context.Variables.GetValueOrDefault<string>("backendInstance", "0"))</value>
        </set-header>
    </outbound>
    
    <on-error>
        <base />
        
        <!-- Circuit breaker: Mark backend as unhealthy on connection errors -->
        <cache-store-value key="@("backend-health-" + context.Variables.GetValueOrDefault<string>("backendInstance", "0"))" value="unhealthy" duration="30" />
        
        <set-header name="X-Error-Backend-Instance" exists-action="override">
            <value>@(context.Variables.GetValueOrDefault<string>("backendInstance", "unknown"))</value>
        </set-header>
    </on-error>
</policies>

Setup Steps

Update Backend URLs

In the policy XML, replace the placeholder URLs:

var backends = new System.Collections.Generic.Dictionary<string, string> {
    { "0", "https://your-app-instance-1.azurecontainerapps.io" },
    { "1", "https://your-app-instance-2.azurecontainerapps.io" },
    { "2", "https://your-app-instance-3.azurecontainerapps.io" },
    { "3", "https://your-app-instance-4.azurecontainerapps.io" },
    { "4", "https://your-app-instance-5.azurecontainerapps.io" }
};

Apply Policy in Azure Portal
- Navigate to your APIM service
- Go to your API → Design tab
- Click "All operations" (or specific operations)
- In "Inbound processing", click the code editor (</>)
- Paste the policy XML
- Click Save
Deploy Container Apps
- Deploy 5 instances of this application to Azure Container Apps
- Ensure each has a unique URL
- Verify /health endpoint is accessible

Health Check & Circuit Breaker

How It Works

❌ UNHEALTHY - Marking Backends Unhealthy

A backend is marked [UNHEALTHY] (removed from pool for 30 seconds) when:

Returns 500, 502, 503, 504 (Server errors)
Returns 429 (Rate limit exceeded)
Returns 401 (Authentication failed)
Connection timeout or failure

✅ HEALTHY - Automatic Recovery

A backend is marked [HEALTHY] (rejoins pool) when:

Returns 200 OK
Health status cache expires (after 30 seconds)

Health Status Flow

┌─────────────────────────┐
│   [START] Request       │
│   Incoming              │
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│ [CHECK] Read Cache      │
│ Get Healthy Backends    │
│ (Instances 0-4)         │
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│ [DECISION] Does client  │
│ have session cookie?    │
└───────────┬─────────────┘
            │
      ┌─────┴─────┐
      │           │
   [YES]        [NO]
      │           │
      ▼           ▼
┌─────────────┐  ┌──────────────────┐
│[CHECK] Is   │  │[ASSIGN] Pick     │
│cookie's     │  │random healthy    │
│backend      │  │backend (0-4)     │
│healthy?     │  └────────┬─────────┘
└──────┬──────┘           │
       │                  │
   ┌───┴───┐              │
   │       │              │
 [YES]   [NO]             │
   │       │              │
   │       └──────┬───────┘
   │              │
   │              ▼
   │       ┌──────────────────┐
   │       │[REASSIGN] Pick   │
   │       │different healthy │
   │       │backend           │
   │       └─────────┬────────┘
   │                 │
   └────────┬────────┘
            │
            ▼
┌─────────────────────────┐
│ [ROUTE] Forward to      │
│ Selected Backend        │
│ Instance (0-4)          │
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│ [RESPONSE] Backend      │
│ Returns Status Code     │
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│ [EVALUATE] Check Status │
│ Code from Backend       │
└───────────┬─────────────┘
            │
      ┌─────┴─────┐
      │           │
 [200 OK]   [ERROR: 401/429/500+]
      │           │
      ▼           ▼
┌─────────────┐  ┌──────────────────┐
│✅ HEALTHY   │  │❌ UNHEALTHY       │
│Cache as     │  │Cache as          │
│"healthy"    │  │"unhealthy"       │
│TTL: 30s     │  │TTL: 30s          │
│[AVAILABLE]  │  │[REMOVED FROM     │
│             │  │ POOL]            │
└─────────────┘  └──────────────────┘

Behavior Table

Backend Response	Circuit Breaker Action	Status Symbol	Duration	Client Impact
`200 OK`	Mark healthy	✅ [HEALTHY]	30s cache	Continues routing
`401 Unauthorized`	Mark unhealthy, remove from pool	❌ [UNHEALTHY]	30s	Route to different backend
`429 Rate Limit`	Mark unhealthy, remove from pool	❌ [UNHEALTHY]	30s	Route to different backend
`500+ Server Error`	Mark unhealthy, remove from pool	❌ [UNHEALTHY]	30s	Route to different backend
Connection Error	Mark unhealthy, remove from pool	❌ [UNHEALTHY]	30s	Route to different backend
Cache Expired	Re-evaluate on next request	⏱️ [UNKNOWN]	N/A	May retry backend

Monitoring & Debugging

Response Headers

The APIM policy adds several headers for monitoring:

Request Headers (Added by APIM)

X-APIM-Correlation-Id: Unique request ID for tracing
X-Backend-Instance: Which backend (0-4) will handle the request

Response Headers

X-Served-By-Instance: Which backend actually served the response
X-Error-Backend-Instance: (On errors) Which backend caused the error

Cookies

APIM-Backend-Instance: Session affinity cookie (value 0-4, 24hr TTL)

Testing Session Affinity

# First request - receives a backend assignment
curl -i https://your-apim.azure-api.net/completion

# Check the Set-Cookie header for: APIM-Backend-Instance=X

# Subsequent requests with cookie go to same backend
curl -i https://your-apim.azure-api.net/completion \
  -H "Cookie: APIM-Backend-Instance=0"

Testing Circuit Breaker

Simulate Failure

# Stop one Container App instance or cause it to return 500s

Observe Failover

# Requests automatically route to healthy instances
curl -i https://your-apim.azure-api.net/health | grep X-Served-By-Instance

Test Recovery

# Restart the instance, wait 30 seconds
# It automatically rejoins the pool on first 200 response

Monitor Backend Health

Check which backends are currently healthy:

# Make requests and check which instances respond
for i in {1..10}; do
  curl -s https://your-apim.azure-api.net/completion \
    -i | grep "X-Served-By-Instance"
done

Production Deployment

Prerequisites

Azure subscription
Azure API Management instance
5 Azure Container App instances
Container registry (Azure Container Registry recommended)

Container App Deployment

Build Docker image

docker build -t your-registry.azurecr.io/openai-wrapper:latest .

Push to registry

docker push your-registry.azurecr.io/openai-wrapper:latest

Deploy to Container Apps

az containerapp create \
  --name openai-wrapper-1 \
  --resource-group your-rg \
  --environment your-env \
  --image your-registry.azurecr.io/openai-wrapper:latest \
  --target-port 8000 \
  --ingress external \
  --env-vars \
    OPENAI_ENDPOINT="https://your-instance.openai.azure.com/" \
    OPENAI_API_KEY="your-key" \
    OPENAI_API_VERSION="2025-01-01-preview" \
    OPENAI_MODEL_DEPLOYMENT_NAME="gpt-4"

Repeat for instances 2-5 with different names.

APIM Configuration

Enable Internal Cache (Required for circuit breaker)
- Navigate to APIM → Caching
- Enable built-in cache
Import API
- Create or import your OpenAI wrapper API
- Add operations: /health, /completion, /chat
Apply Policy
- Use the policy XML from apim-policy.xml
- Update backend URLs
- Apply at API level or operation level

Production Checklist

Production Considerations

Cache TTL: 30 seconds is default, adjust based on recovery time needs
Monitoring: Set up Application Insights for both APIM and Container Apps
Alerts: Create alerts when >50% of backends are unhealthy
Scaling: Configure Container Apps autoscaling based on CPU/memory
Security: Use Azure Key Vault for storing OpenAI API keys
Rate Limits: Configure APIM rate limiting policies
Quotas: Set appropriate quotas per client/subscription

Troubleshooting

Issue: All requests go to same instance

Fix: Verify APIM cache is enabled
Fix: Check Set-Cookie header is being sent
Fix: Test without cookies to verify random distribution

Issue: Backends not marked unhealthy on failures

Fix: Verify /health endpoint returns correct status codes
Fix: Check APIM diagnostic logs
Fix: Ensure cache is properly configured

Issue: Circuit breaker not recovering

Fix: Wait 30 seconds for cache expiration
Fix: Ensure backend returns 200 OK
Fix: Check X-Served-By-Instance header

Issue: High latency on health checks

Fix: Health checks are passive (based on regular traffic)
Fix: Consider implementing active health monitoring

Project Structure

simple_openai_api_wrapper/
├── agents/                         # AI Agent implementations
│   ├── __init__.py
│   ├── base_agent.py              # Abstract base class for all agents
│   └── agent_bing_grounding.py    # Bing grounding agent with citation extraction
├── ai/                             # Azure OpenAI integration
│   ├── __init__.py
│   └── azure_openai_client.py     # Azure OpenAI client wrapper
├── app/                            # FastAPI application
│   ├── __init__.py
│   ├── chat_completion.py         # Completion and chat logic
│   ├── create_table.py            # Database table creation (optional)
│   └── main.py                    # FastAPI endpoints (/health, /completion, /chat, /bing-grounding)
├── models/                         # Data models
│   ├── __init__.py
│   └── model.py                   # Pydantic models (Messages, etc.)
├── apim-policy.xml                # Main APIM policy (load balancing + circuit breaker)
├── apim-policy-with-healthcheck.xml  # APIM policy with enhanced health monitoring
├── apim-healthcheck-monitor.xml   # Optional active health monitoring policy
├── docker-compose.yaml            # Local development with Docker
├── dockerfile                     # Container image definition
├── env.sample                     # Environment variable template
├── main.py                        # Application entry point
├── requirements.txt               # Python dependencies (openai, fastapi, azure-ai-projects, etc.)
├── _env_activate.bat              # Windows: Activate virtual environment
├── _env_create.bat                # Windows: Create virtual environment
├── _install.bat                   # Windows: Install dependencies
├── _run_server.bat                # Windows: Run FastAPI server locally
├── _up.bat                        # Windows: Start Docker Compose
├── _down.bat                      # Windows: Stop Docker Compose
└── README.md                      # This file

Environment Variables

Variable	Description	Example
Azure OpenAI Configuration
`OPENAI_ENDPOINT`	Azure OpenAI endpoint URL	`https://your-instance.openai.azure.com/`
`OPENAI_API_KEY`	Azure OpenAI API key	`your-api-key`
`OPENAI_API_VERSION`	API version	`2025-01-01-preview`
`OPENAI_MODEL_DEPLOYMENT_NAME`	Deployment name	`gpt-4` or `o1`
`OPENAI_PROMPT`	Default system prompt	`You are a helpful assistant.`
Azure AI Agent Configuration
`AZURE_AI_PROJECT_ENDPOINT`	Azure AI Project endpoint	`https://your-project.services.ai.azure.com/api/projects/yourProject`
`AZURE_AI_AGENT_ID`	Azure AI Agent ID	`asst_xxxxxxxxxxxxx`

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
agents		agents
ai		ai
app		app
models		models
.gitignore		.gitignore
FAQ.md		FAQ.md
README.md		README.md
_down.bat		_down.bat
_env_activate.bat		_env_activate.bat
_env_create.bat		_env_create.bat
_install.bat		_install.bat
_run_server.bat		_run_server.bat
_up.bat		_up.bat
apim-healthcheck-monitor.xml		apim-healthcheck-monitor.xml
apim-policy-with-healthcheck.xml		apim-policy-with-healthcheck.xml
apim-policy.xml		apim-policy.xml
docker-compose.yaml		docker-compose.yaml
dockerfile		dockerfile
env.sample		env.sample
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Azure FastAPI Wrapper over Azure OpenAI & Azure AI Agent Service

Table of Contents

Features

Architecture Overview

Getting Started

Local Development

API Endpoints

GET /health

GET /completion

POST /chat

POST /bing-grounding

Azure AI Agent Wrapper

Agent Architecture

BaseAgent (ABC)

BingGroundingAgent

Configuration

Extending with New Agents

MCP Server via APIM

MCP Server Architecture

MCP Server Benefits

MCP Server Deployment

Example MCP Client Configuration

Azure API Management Setup

Architecture

Load Balancing Features

APIM Policy Details

Main Policy (apim-policy.xml)

Setup Steps

Health Check & Circuit Breaker

How It Works

❌ UNHEALTHY - Marking Backends Unhealthy

✅ HEALTHY - Automatic Recovery

Health Status Flow

Behavior Table

Monitoring & Debugging

Response Headers

Request Headers (Added by APIM)

Response Headers

Cookies

Testing Session Affinity

Testing Circuit Breaker

Monitor Backend Health

Production Deployment

Prerequisites

Container App Deployment

APIM Configuration

Production Checklist

Production Considerations

Troubleshooting

Project Structure

Environment Variables

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages