Skip to content

drewelewis/simple_openai_api_wrapper

Repository files navigation

Azure FastAPI Wrapper over Azure OpenAI & Azure AI Agent Service

A FastAPI-based wrapper service for Azure OpenAI and Azure AI Agent Service with health monitoring, designed to run on Azure Container Apps with API Management load balancing. This service can be exposed as an MCP (Model Context Protocol) server through APIM.

Table of Contents

Features

βœ… FastAPI wrapper for Azure OpenAI completion and chat endpoints
βœ… Azure AI Agent Service wrapper with Bing grounding capabilities
βœ… Structured JSON responses with citations from grounded agents
βœ… MCP Server deployment via Azure API Management
βœ… Health check endpoint with Azure OpenAI connectivity verification
βœ… Returns proper HTTP status codes (200, 401, 429, 503)
βœ… Ready for Azure Container Apps deployment
βœ… APIM policies for load balancing with session affinity
βœ… Circuit breaker pattern for automatic failover
βœ… Automatic backend recovery on health restoration
βœ… Extensible agent architecture using Abstract Base Classes

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    [CLIENT APPLICATIONS]                      β”‚
β”‚              (MCP Clients, Web Apps, APIs)                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              [AZURE API MANAGEMENT]                          β”‚
β”‚              β€’ MCP Server Endpoint                           β”‚
β”‚              β€’ Load Balancing                                β”‚
β”‚              β€’ Circuit Breaker                               β”‚
β”‚              β€’ Session Affinity                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                β”‚                β”‚
         β–Ό                β–Ό                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” ...
β”‚  [CONTAINER     β”‚ β”‚  [CONTAINER     β”‚
β”‚   APP #1]       β”‚ β”‚   APP #2]       β”‚
β”‚                 β”‚ β”‚                 β”‚
β”‚  FastAPI Server β”‚ β”‚  FastAPI Server β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚ β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ OpenAI    β”‚  β”‚ β”‚  β”‚ OpenAI    β”‚  β”‚
β”‚  β”‚ Wrapper   β”‚  β”‚ β”‚  β”‚ Wrapper   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚ β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ AI Agent  β”‚  β”‚ β”‚  β”‚ AI Agent  β”‚  β”‚
β”‚  β”‚ Wrapper   β”‚  β”‚ β”‚  β”‚ Wrapper   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚ β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                   β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚ [AZURE AI AGENT SERVICE]β”‚
         β”‚  β€’ Bing Grounding       β”‚
         β”‚  β€’ Citation Extraction  β”‚
         β”‚  β€’ Thread Management    β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Getting Started

Local Development

  1. Create virtual environment

    _env_create.bat
  2. Activate virtual environment

    _env_activate.bat
  3. Install dependencies

    _install.bat
  4. Configure environment variables

    • Copy env.sample to .env
    • Fill in your Azure OpenAI credentials:
      OPENAI_ENDPOINT="https://your-instance.openai.azure.com/"
      OPENAI_API_KEY="your-api-key"
      OPENAI_API_VERSION="2025-01-01-preview"
      OPENAI_MODEL_DEPLOYMENT_NAME="gpt-4"
      OPENAI_PROMPT="You are a helpful assistant."
  5. Start the server

    _run_server.bat

The API will be available at http://localhost:8000

API Endpoints

GET /health

Health check endpoint that verifies Azure OpenAI connectivity.

Response Codes:

  • 200 - Service healthy, Azure OpenAI connected
  • 401 - Azure OpenAI authentication failed
  • 429 - Azure OpenAI rate limit exceeded
  • 503 - Azure OpenAI service unavailable or connection error
  • 500 - Unexpected error

Example:

curl http://localhost:8000/health

Success Response:

{
  "status": "ok",
  "azure_openai": "connected"
}

Error Response (503):

{
  "status": "error",
  "error": "service_unavailable",
  "message": "Azure OpenAI service unavailable",
  "details": "..."
}

GET /completion

Simple completion endpoint with a single query parameter.

Parameters:

  • query (string, optional) - Default: "how are you?"

Example:

curl "http://localhost:8000/completion?query=Tell me a joke"

POST /chat

Chat endpoint supporting message history.

Request Body:

{
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
}

Example:

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is Azure?"}
    ]
  }'

POST /bing-grounding

Azure AI Agent wrapper endpoint with Bing grounding and citation support. This endpoint wraps an Azure AI Agent Service agent that uses Bing search for grounded responses.

Parameters:

  • query (string, required) - The user query to process

Response: JSON with structured content and citations

Example:

curl -X POST "http://localhost:8000/bing-grounding?query=What happened in finance today?"

Success Response:

{
  "content": "Today in finance, the U.S. stock market saw a sharp decline, with the Dow Jones Industrial Average plunging almost 800 points (down 1.6%), and both the Nasdaq and S&P 500 also posting significant losses...",
  "citations": [
    {
      "id": 1,
      "type": "url",
      "url": "https://www.marketwatch.com/...",
      "title": "Stock Market News Today"
    },
    {
      "id": 2,
      "type": "url",
      "url": "https://www.cnbc.com/...",
      "title": "Federal Reserve Commentary"
    }
  ]
}

Features:

  • βœ… Grounded responses using Bing search
  • βœ… Automatic citation extraction and formatting
  • βœ… Clean content (inline citation markers removed)
  • βœ… Structured JSON response

Azure AI Agent Wrapper

This service provides a FastAPI wrapper around Azure AI Agent Service, enabling you to expose AI agents as REST APIs that can be consumed by any application or deployed as an MCP server.

Agent Architecture

The wrapper uses an Abstract Base Class (ABC) pattern for extensibility:

agents/
β”œβ”€β”€ base_agent.py              # Abstract base class for all agents
└── agent_bing_grounding.py    # Bing grounding agent implementation

BaseAgent (ABC)

class BaseAgent(ABC):
    """Abstract base class for all agents"""
    
    def __init__(self, endpoint: str = None, agent_id: str = None):
        self.endpoint = endpoint
        self.agent_id = agent_id
    
    @abstractmethod
    def chat(self, message: str) -> str:
        """Process a message and return response"""
        pass

BingGroundingAgent

Concrete implementation that:

  • Connects to Azure AI Agent Service
  • Creates conversation threads
  • Extracts and formats citations from Bing-grounded responses
  • Returns structured JSON with content and citations

Configuration

Add these environment variables to .env:

# Azure AI Agent Configuration
AZURE_AI_PROJECT_ENDPOINT="https://your-project.services.ai.azure.com/api/projects/yourProject"
AZURE_AI_AGENT_ID="asst_xxxxxxxxxxxxx"

Extending with New Agents

To add a new agent type, simply:

  1. Create a new agent class that inherits from BaseAgent
  2. Implement the chat() method
  3. Add appropriate configuration to .env

Example:

class CustomAgent(BaseAgent):
    def __init__(self):
        endpoint = os.getenv("CUSTOM_AGENT_ENDPOINT")
        agent_id = os.getenv("CUSTOM_AGENT_ID")
        super().__init__(endpoint=endpoint, agent_id=agent_id)
    
    def chat(self, message: str) -> str:
        # Your custom implementation
        pass

MCP Server via APIM

This FastAPI service can be deployed as an MCP (Model Context Protocol) server through Azure API Management, enabling AI applications to consume your Azure AI agents through a standardized protocol.

MCP Server Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      [MCP CLIENT APPLICATIONS]          β”‚
β”‚   (Claude Desktop, IDEs, AI Tools)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚ MCP Protocol
                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      [AZURE API MANAGEMENT]             β”‚
β”‚      β€’ MCP Endpoint Mapping             β”‚
β”‚      β€’ Authentication                   β”‚
β”‚      β€’ Rate Limiting                    β”‚
β”‚      β€’ Load Balancing                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚ HTTPS
                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  [AZURE CONTAINER APPS - FastAPI]       β”‚
β”‚  β€’ /bing-grounding β†’ AI Agent Wrapper   β”‚
β”‚  β€’ /completion β†’ OpenAI Wrapper         β”‚
β”‚  β€’ /chat β†’ OpenAI Chat Wrapper          β”‚
β”‚  β€’ /health β†’ Health Check               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

MCP Server Benefits

  1. Standardized Protocol - MCP clients can discover and use your agents automatically
  2. Enterprise Security - APIM handles authentication, authorization, and rate limiting
  3. Scalability - Load balance across multiple container instances
  4. Monitoring - Centralized logging and analytics through APIM
  5. Version Management - Deploy multiple versions side-by-side

MCP Server Deployment

  1. Deploy FastAPI to Azure Container Apps (see Production Deployment section)
  2. Configure APIM to expose MCP endpoints:
    • Map MCP protocol operations to FastAPI endpoints
    • Configure CORS for web-based MCP clients
    • Set up authentication (API keys, OAuth, etc.)
  3. Register with MCP clients:
    • Provide APIM endpoint URL
    • Configure authentication credentials
    • MCP clients will auto-discover available agents

Example MCP Client Configuration

{
  "mcpServers": {
    "azure-ai-agents": {
      "url": "https://your-apim.azure-api.net",
      "apiKey": "your-apim-subscription-key",
      "endpoints": {
        "bing-grounding": "/bing-grounding",
        "completion": "/completion",
        "chat": "/chat"
      }
    }
  }
}

Azure API Management Setup

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      [INTERNET]                              β”‚
β”‚                       Clients                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              [AZURE API MANAGEMENT]                          β”‚
β”‚                                                              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚        Load Balancer + Circuit Breaker              β”‚    β”‚
β”‚  β”‚  β€’ Session Affinity (Sticky Sessions)               β”‚    β”‚
β”‚  β”‚  β€’ Health-Based Routing                             β”‚    β”‚
β”‚  β”‚  β€’ Auto Failover & Recovery                         β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚           β”‚           β”‚
         β–Ό           β–Ό           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” ... (5 instances)
β”‚ βœ… HEALTHY  β”‚ β”‚ ❌ UNHEALTHYβ”‚
β”‚ Container   β”‚ β”‚ Container   β”‚
β”‚ App #1      β”‚ β”‚ App #2      β”‚
β”‚ [ACTIVE]    β”‚ β”‚ [REMOVED]   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Legend:
  βœ… [HEALTHY]   - Backend available in pool, receiving traffic
  ❌ [UNHEALTHY] - Backend removed from pool, no traffic
  ⏱️ [UNKNOWN]   - Backend status being evaluated

Load Balancing Features

  1. Session Affinity (Sticky Sessions) - Clients stick to the same backend via cookies
  2. Circuit Breaker - Unhealthy backends automatically removed from pool
  3. Auto-Recovery - Backends rejoin when returning 200 OK
  4. Health-Aware Routing - Only route to healthy instances

APIM Policy Details

Main Policy (apim-policy.xml)

This policy provides intelligent load balancing across 5 Azure Container App instances.

<!--
    Azure API Management Policy for Load Balancing with Session Affinity and Circuit Breaker
    
    Features:
    - Cookie-based session affinity (sticky sessions)
    - Automatic circuit breaking based on backend health
    - Failover to healthy instances when backend returns 500, 429, or 401
    - Automatic recovery when backends return 200 OK
    
    Apply this policy at the API level for your main endpoints
-->
<policies>
    <inbound>
        <base />
        
        <!-- Check for healthy backends from cache -->
        <set-variable name="healthyBackends" value="@{
            var allBackends = new[] { "0", "1", "2", "3", "4" };
            var healthyList = new System.Collections.Generic.List<string>();
            
            foreach (var id in allBackends)
            {
                string cacheKey = "backend-health-" + id;
                string healthStatus;
                
                if (context.Cache.TryGetValue(cacheKey, out healthStatus))
                {
                    if (healthStatus == "healthy")
                    {
                        healthyList.Add(id);
                    }
                }
                else
                {
                    healthyList.Add(id);
                }
            }
            
            return healthyList.Count > 0 ? healthyList.ToArray() : allBackends;
        }" />
        
        <!-- Session affinity with health check -->
        <choose>
            <when condition="@(context.Request.Headers.GetValueOrDefault("Cookie","").Contains("APIM-Backend-Instance"))">
                <set-variable name="backendInstance" value="@{
                    string cookie = context.Request.Headers.GetValueOrDefault("Cookie","");
                    var match = System.Text.RegularExpressions.Regex.Match(cookie, @"APIM-Backend-Instance=(\d+)");
                    string requestedId = match.Success ? match.Groups[1].Value : null;
                    var healthyBackends = (string[])context.Variables["healthyBackends"];
                    
                    if (requestedId != null && healthyBackends.Contains(requestedId))
                    {
                        return requestedId;
                    }
                    
                    var random = new Random();
                    return healthyBackends[random.Next(0, healthyBackends.Length)];
                }" />
            </when>
            <otherwise>
                <set-variable name="backendInstance" value="@{
                    var healthyBackends = (string[])context.Variables["healthyBackends"];
                    var random = new Random();
                    return healthyBackends[random.Next(0, healthyBackends.Length)];
                }" />
            </otherwise>
        </choose>
        
        <!-- Set backend URL based on instance ID -->
        <set-backend-service base-url="@{
            string id = context.Variables.GetValueOrDefault<string>("backendInstance", "0");
            var backends = new System.Collections.Generic.Dictionary<string, string> {
                { "0", "https://your-app-instance-1.azurecontainerapps.io" },
                { "1", "https://your-app-instance-2.azurecontainerapps.io" },
                { "2", "https://your-app-instance-3.azurecontainerapps.io" },
                { "3", "https://your-app-instance-4.azurecontainerapps.io" },
                { "4", "https://your-app-instance-5.azurecontainerapps.io" }
            };
            return backends.ContainsKey(id) ? backends[id] : backends["0"];
        }" />
        
        <set-header name="X-APIM-Correlation-Id" exists-action="skip">
            <value>@(Guid.NewGuid().ToString())</value>
        </set-header>
        
        <set-header name="X-Backend-Instance" exists-action="override">
            <value>@(context.Variables.GetValueOrDefault<string>("backendInstance", "0"))</value>
        </set-header>
    </inbound>
    
    <backend>
        <base />
    </backend>
    
    <outbound>
        <base />
        
        <!-- Circuit breaker: Update health status based on response -->
        <choose>
            <when condition="@(context.Response.StatusCode >= 500 || context.Response.StatusCode == 429 || context.Response.StatusCode == 401)">
                <!-- Mark backend as unhealthy for 30 seconds on errors -->
                <cache-store-value key="@("backend-health-" + context.Variables.GetValueOrDefault<string>("backendInstance"))" value="unhealthy" duration="30" />
            </when>
            <when condition="@(context.Response.StatusCode == 200)">
                <!-- Mark backend as healthy on 200 OK response -->
                <cache-store-value key="@("backend-health-" + context.Variables.GetValueOrDefault<string>("backendInstance"))" value="healthy" duration="30" />
            </when>
        </choose>
        
        <!-- Set session affinity cookie -->
        <set-header name="Set-Cookie" exists-action="append">
            <value>@{
                string instance = context.Variables.GetValueOrDefault<string>("backendInstance", "0");
                return $"APIM-Backend-Instance={instance}; Path=/; Max-Age=86400; HttpOnly; Secure; SameSite=Lax";
            }</value>
        </set-header>
        
        <set-header name="X-Served-By-Instance" exists-action="override">
            <value>@(context.Variables.GetValueOrDefault<string>("backendInstance", "0"))</value>
        </set-header>
    </outbound>
    
    <on-error>
        <base />
        
        <!-- Circuit breaker: Mark backend as unhealthy on connection errors -->
        <cache-store-value key="@("backend-health-" + context.Variables.GetValueOrDefault<string>("backendInstance", "0"))" value="unhealthy" duration="30" />
        
        <set-header name="X-Error-Backend-Instance" exists-action="override">
            <value>@(context.Variables.GetValueOrDefault<string>("backendInstance", "unknown"))</value>
        </set-header>
    </on-error>
</policies>

Setup Steps

  1. Update Backend URLs

    In the policy XML, replace the placeholder URLs:

    var backends = new System.Collections.Generic.Dictionary<string, string> {
        { "0", "https://your-app-instance-1.azurecontainerapps.io" },
        { "1", "https://your-app-instance-2.azurecontainerapps.io" },
        { "2", "https://your-app-instance-3.azurecontainerapps.io" },
        { "3", "https://your-app-instance-4.azurecontainerapps.io" },
        { "4", "https://your-app-instance-5.azurecontainerapps.io" }
    };
  2. Apply Policy in Azure Portal

    • Navigate to your APIM service
    • Go to your API β†’ Design tab
    • Click "All operations" (or specific operations)
    • In "Inbound processing", click the code editor (</>)
    • Paste the policy XML
    • Click Save
  3. Deploy Container Apps

    • Deploy 5 instances of this application to Azure Container Apps
    • Ensure each has a unique URL
    • Verify /health endpoint is accessible

Health Check & Circuit Breaker

How It Works

❌ UNHEALTHY - Marking Backends Unhealthy

A backend is marked [UNHEALTHY] (removed from pool for 30 seconds) when:

  • Returns 500, 502, 503, 504 (Server errors)
  • Returns 429 (Rate limit exceeded)
  • Returns 401 (Authentication failed)
  • Connection timeout or failure

βœ… HEALTHY - Automatic Recovery

A backend is marked [HEALTHY] (rejoins pool) when:

  • Returns 200 OK
  • Health status cache expires (after 30 seconds)

Health Status Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   [START] Request       β”‚
β”‚   Incoming              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ [CHECK] Read Cache      β”‚
β”‚ Get Healthy Backends    β”‚
β”‚ (Instances 0-4)         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ [DECISION] Does client  β”‚
β”‚ have session cookie?    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
      β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
      β”‚           β”‚
   [YES]        [NO]
      β”‚           β”‚
      β–Ό           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚[CHECK] Is   β”‚  β”‚[ASSIGN] Pick     β”‚
β”‚cookie's     β”‚  β”‚random healthy    β”‚
β”‚backend      β”‚  β”‚backend (0-4)     β”‚
β”‚healthy?     β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜           β”‚
       β”‚                  β”‚
   β”Œβ”€β”€β”€β”΄β”€β”€β”€β”              β”‚
   β”‚       β”‚              β”‚
 [YES]   [NO]             β”‚
   β”‚       β”‚              β”‚
   β”‚       β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚              β”‚
   β”‚              β–Ό
   β”‚       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚       β”‚[REASSIGN] Pick   β”‚
   β”‚       β”‚different healthy β”‚
   β”‚       β”‚backend           β”‚
   β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   β”‚                 β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ [ROUTE] Forward to      β”‚
β”‚ Selected Backend        β”‚
β”‚ Instance (0-4)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ [RESPONSE] Backend      β”‚
β”‚ Returns Status Code     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ [EVALUATE] Check Status β”‚
β”‚ Code from Backend       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
      β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
      β”‚           β”‚
 [200 OK]   [ERROR: 401/429/500+]
      β”‚           β”‚
      β–Ό           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚βœ… HEALTHY   β”‚  β”‚βŒ UNHEALTHY       β”‚
β”‚Cache as     β”‚  β”‚Cache as          β”‚
β”‚"healthy"    β”‚  β”‚"unhealthy"       β”‚
β”‚TTL: 30s     β”‚  β”‚TTL: 30s          β”‚
β”‚[AVAILABLE]  β”‚  β”‚[REMOVED FROM     β”‚
β”‚             β”‚  β”‚ POOL]            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Behavior Table

Backend Response Circuit Breaker Action Status Symbol Duration Client Impact
200 OK Mark healthy βœ… [HEALTHY] 30s cache Continues routing
401 Unauthorized Mark unhealthy, remove from pool ❌ [UNHEALTHY] 30s Route to different backend
429 Rate Limit Mark unhealthy, remove from pool ❌ [UNHEALTHY] 30s Route to different backend
500+ Server Error Mark unhealthy, remove from pool ❌ [UNHEALTHY] 30s Route to different backend
Connection Error Mark unhealthy, remove from pool ❌ [UNHEALTHY] 30s Route to different backend
Cache Expired Re-evaluate on next request ⏱️ [UNKNOWN] N/A May retry backend

Monitoring & Debugging

Response Headers

The APIM policy adds several headers for monitoring:

Request Headers (Added by APIM)

  • X-APIM-Correlation-Id: Unique request ID for tracing
  • X-Backend-Instance: Which backend (0-4) will handle the request

Response Headers

  • X-Served-By-Instance: Which backend actually served the response
  • X-Error-Backend-Instance: (On errors) Which backend caused the error

Cookies

  • APIM-Backend-Instance: Session affinity cookie (value 0-4, 24hr TTL)

Testing Session Affinity

# First request - receives a backend assignment
curl -i https://your-apim.azure-api.net/completion

# Check the Set-Cookie header for: APIM-Backend-Instance=X

# Subsequent requests with cookie go to same backend
curl -i https://your-apim.azure-api.net/completion \
  -H "Cookie: APIM-Backend-Instance=0"

Testing Circuit Breaker

  1. Simulate Failure

    # Stop one Container App instance or cause it to return 500s
  2. Observe Failover

    # Requests automatically route to healthy instances
    curl -i https://your-apim.azure-api.net/health | grep X-Served-By-Instance
  3. Test Recovery

    # Restart the instance, wait 30 seconds
    # It automatically rejoins the pool on first 200 response

Monitor Backend Health

Check which backends are currently healthy:

# Make requests and check which instances respond
for i in {1..10}; do
  curl -s https://your-apim.azure-api.net/completion \
    -i | grep "X-Served-By-Instance"
done

Production Deployment

Prerequisites

  • Azure subscription
  • Azure API Management instance
  • 5 Azure Container App instances
  • Container registry (Azure Container Registry recommended)

Container App Deployment

  1. Build Docker image

    docker build -t your-registry.azurecr.io/openai-wrapper:latest .
  2. Push to registry

    docker push your-registry.azurecr.io/openai-wrapper:latest
  3. Deploy to Container Apps

    az containerapp create \
      --name openai-wrapper-1 \
      --resource-group your-rg \
      --environment your-env \
      --image your-registry.azurecr.io/openai-wrapper:latest \
      --target-port 8000 \
      --ingress external \
      --env-vars \
        OPENAI_ENDPOINT="https://your-instance.openai.azure.com/" \
        OPENAI_API_KEY="your-key" \
        OPENAI_API_VERSION="2025-01-01-preview" \
        OPENAI_MODEL_DEPLOYMENT_NAME="gpt-4"

    Repeat for instances 2-5 with different names.

APIM Configuration

  1. Enable Internal Cache (Required for circuit breaker)

    • Navigate to APIM β†’ Caching
    • Enable built-in cache
  2. Import API

    • Create or import your OpenAI wrapper API
    • Add operations: /health, /completion, /chat
  3. Apply Policy

    • Use the policy XML from apim-policy.xml
    • Update backend URLs
    • Apply at API level or operation level

Production Checklist

  • Internal cache enabled in APIM
  • All 5 Container Apps deployed and running
  • Health endpoints returning 200 OK
  • Backend URLs updated in APIM policy
  • Policy applied and tested
  • Session affinity tested with cookies
  • Circuit breaker tested with simulated failures
  • Monitoring/alerts configured (Application Insights)
  • Security: APIM subscription keys configured
  • Security: Container Apps ingress restricted to APIM (if needed)

Production Considerations

  1. Cache TTL: 30 seconds is default, adjust based on recovery time needs
  2. Monitoring: Set up Application Insights for both APIM and Container Apps
  3. Alerts: Create alerts when >50% of backends are unhealthy
  4. Scaling: Configure Container Apps autoscaling based on CPU/memory
  5. Security: Use Azure Key Vault for storing OpenAI API keys
  6. Rate Limits: Configure APIM rate limiting policies
  7. Quotas: Set appropriate quotas per client/subscription

Troubleshooting

Issue: All requests go to same instance

  • Fix: Verify APIM cache is enabled
  • Fix: Check Set-Cookie header is being sent
  • Fix: Test without cookies to verify random distribution

Issue: Backends not marked unhealthy on failures

  • Fix: Verify /health endpoint returns correct status codes
  • Fix: Check APIM diagnostic logs
  • Fix: Ensure cache is properly configured

Issue: Circuit breaker not recovering

  • Fix: Wait 30 seconds for cache expiration
  • Fix: Ensure backend returns 200 OK
  • Fix: Check X-Served-By-Instance header

Issue: High latency on health checks

  • Fix: Health checks are passive (based on regular traffic)
  • Fix: Consider implementing active health monitoring

Project Structure

simple_openai_api_wrapper/
β”œβ”€β”€ agents/                         # AI Agent implementations
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ base_agent.py              # Abstract base class for all agents
β”‚   └── agent_bing_grounding.py    # Bing grounding agent with citation extraction
β”œβ”€β”€ ai/                             # Azure OpenAI integration
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── azure_openai_client.py     # Azure OpenAI client wrapper
β”œβ”€β”€ app/                            # FastAPI application
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ chat_completion.py         # Completion and chat logic
β”‚   β”œβ”€β”€ create_table.py            # Database table creation (optional)
β”‚   └── main.py                    # FastAPI endpoints (/health, /completion, /chat, /bing-grounding)
β”œβ”€β”€ models/                         # Data models
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── model.py                   # Pydantic models (Messages, etc.)
β”œβ”€β”€ apim-policy.xml                # Main APIM policy (load balancing + circuit breaker)
β”œβ”€β”€ apim-policy-with-healthcheck.xml  # APIM policy with enhanced health monitoring
β”œβ”€β”€ apim-healthcheck-monitor.xml   # Optional active health monitoring policy
β”œβ”€β”€ docker-compose.yaml            # Local development with Docker
β”œβ”€β”€ dockerfile                     # Container image definition
β”œβ”€β”€ env.sample                     # Environment variable template
β”œβ”€β”€ main.py                        # Application entry point
β”œβ”€β”€ requirements.txt               # Python dependencies (openai, fastapi, azure-ai-projects, etc.)
β”œβ”€β”€ _env_activate.bat              # Windows: Activate virtual environment
β”œβ”€β”€ _env_create.bat                # Windows: Create virtual environment
β”œβ”€β”€ _install.bat                   # Windows: Install dependencies
β”œβ”€β”€ _run_server.bat                # Windows: Run FastAPI server locally
β”œβ”€β”€ _up.bat                        # Windows: Start Docker Compose
β”œβ”€β”€ _down.bat                      # Windows: Stop Docker Compose
└── README.md                      # This file

Environment Variables

Variable Description Example
Azure OpenAI Configuration
OPENAI_ENDPOINT Azure OpenAI endpoint URL https://your-instance.openai.azure.com/
OPENAI_API_KEY Azure OpenAI API key your-api-key
OPENAI_API_VERSION API version 2025-01-01-preview
OPENAI_MODEL_DEPLOYMENT_NAME Deployment name gpt-4 or o1
OPENAI_PROMPT Default system prompt You are a helpful assistant.
Azure AI Agent Configuration
AZURE_AI_PROJECT_ENDPOINT Azure AI Project endpoint https://your-project.services.ai.azure.com/api/projects/yourProject
AZURE_AI_AGENT_ID Azure AI Agent ID asst_xxxxxxxxxxxxx

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors