A FastAPI-based wrapper service for Azure OpenAI and Azure AI Agent Service with health monitoring, designed to run on Azure Container Apps with API Management load balancing. This service can be exposed as an MCP (Model Context Protocol) server through APIM.
- Features
- Architecture Overview
- Getting Started
- API Endpoints
- Azure AI Agent Wrapper
- MCP Server via APIM
- Azure API Management Setup
- APIM Policy Details
- Health Check & Circuit Breaker
- Monitoring & Debugging
- Production Deployment
β
FastAPI wrapper for Azure OpenAI completion and chat endpoints
β
Azure AI Agent Service wrapper with Bing grounding capabilities
β
Structured JSON responses with citations from grounded agents
β
MCP Server deployment via Azure API Management
β
Health check endpoint with Azure OpenAI connectivity verification
β
Returns proper HTTP status codes (200, 401, 429, 503)
β
Ready for Azure Container Apps deployment
β
APIM policies for load balancing with session affinity
β
Circuit breaker pattern for automatic failover
β
Automatic backend recovery on health restoration
β
Extensible agent architecture using Abstract Base Classes
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β [CLIENT APPLICATIONS] β
β (MCP Clients, Web Apps, APIs) β
βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β [AZURE API MANAGEMENT] β
β β’ MCP Server Endpoint β
β β’ Load Balancing β
β β’ Circuit Breaker β
β β’ Session Affinity β
βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ ...
β [CONTAINER β β [CONTAINER β
β APP #1] β β APP #2] β
β β β β
β FastAPI Server β β FastAPI Server β
β βββββββββββββ β β βββββββββββββ β
β β OpenAI β β β β OpenAI β β
β β Wrapper β β β β Wrapper β β
β βββββββββββββ β β βββββββββββββ β
β βββββββββββββ β β βββββββββββββ β
β β AI Agent β β β β AI Agent β β
β β Wrapper β β β β Wrapper β β
β βββββββ¬ββββββ β β βββββββ¬ββββββ β
ββββββββββΌβββββββββ ββββββββββΌβββββββββ
β β
βββββββββββββ¬ββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β [AZURE AI AGENT SERVICE]β
β β’ Bing Grounding β
β β’ Citation Extraction β
β β’ Thread Management β
βββββββββββββββββββββββββββ
-
Create virtual environment
_env_create.bat
-
Activate virtual environment
_env_activate.bat
-
Install dependencies
_install.bat
-
Configure environment variables
- Copy
env.sampleto.env - Fill in your Azure OpenAI credentials:
OPENAI_ENDPOINT="https://your-instance.openai.azure.com/" OPENAI_API_KEY="your-api-key" OPENAI_API_VERSION="2025-01-01-preview" OPENAI_MODEL_DEPLOYMENT_NAME="gpt-4" OPENAI_PROMPT="You are a helpful assistant."
- Copy
-
Start the server
_run_server.bat
The API will be available at http://localhost:8000
Health check endpoint that verifies Azure OpenAI connectivity.
Response Codes:
200- Service healthy, Azure OpenAI connected401- Azure OpenAI authentication failed429- Azure OpenAI rate limit exceeded503- Azure OpenAI service unavailable or connection error500- Unexpected error
Example:
curl http://localhost:8000/healthSuccess Response:
{
"status": "ok",
"azure_openai": "connected"
}Error Response (503):
{
"status": "error",
"error": "service_unavailable",
"message": "Azure OpenAI service unavailable",
"details": "..."
}Simple completion endpoint with a single query parameter.
Parameters:
query(string, optional) - Default: "how are you?"
Example:
curl "http://localhost:8000/completion?query=Tell me a joke"Chat endpoint supporting message history.
Request Body:
{
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
]
}Example:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is Azure?"}
]
}'Azure AI Agent wrapper endpoint with Bing grounding and citation support. This endpoint wraps an Azure AI Agent Service agent that uses Bing search for grounded responses.
Parameters:
query(string, required) - The user query to process
Response: JSON with structured content and citations
Example:
curl -X POST "http://localhost:8000/bing-grounding?query=What happened in finance today?"Success Response:
{
"content": "Today in finance, the U.S. stock market saw a sharp decline, with the Dow Jones Industrial Average plunging almost 800 points (down 1.6%), and both the Nasdaq and S&P 500 also posting significant losses...",
"citations": [
{
"id": 1,
"type": "url",
"url": "https://www.marketwatch.com/...",
"title": "Stock Market News Today"
},
{
"id": 2,
"type": "url",
"url": "https://www.cnbc.com/...",
"title": "Federal Reserve Commentary"
}
]
}Features:
- β Grounded responses using Bing search
- β Automatic citation extraction and formatting
- β Clean content (inline citation markers removed)
- β Structured JSON response
This service provides a FastAPI wrapper around Azure AI Agent Service, enabling you to expose AI agents as REST APIs that can be consumed by any application or deployed as an MCP server.
The wrapper uses an Abstract Base Class (ABC) pattern for extensibility:
agents/
βββ base_agent.py # Abstract base class for all agents
βββ agent_bing_grounding.py # Bing grounding agent implementationclass BaseAgent(ABC):
"""Abstract base class for all agents"""
def __init__(self, endpoint: str = None, agent_id: str = None):
self.endpoint = endpoint
self.agent_id = agent_id
@abstractmethod
def chat(self, message: str) -> str:
"""Process a message and return response"""
passConcrete implementation that:
- Connects to Azure AI Agent Service
- Creates conversation threads
- Extracts and formats citations from Bing-grounded responses
- Returns structured JSON with content and citations
Add these environment variables to .env:
# Azure AI Agent Configuration
AZURE_AI_PROJECT_ENDPOINT="https://your-project.services.ai.azure.com/api/projects/yourProject"
AZURE_AI_AGENT_ID="asst_xxxxxxxxxxxxx"To add a new agent type, simply:
- Create a new agent class that inherits from
BaseAgent - Implement the
chat()method - Add appropriate configuration to
.env
Example:
class CustomAgent(BaseAgent):
def __init__(self):
endpoint = os.getenv("CUSTOM_AGENT_ENDPOINT")
agent_id = os.getenv("CUSTOM_AGENT_ID")
super().__init__(endpoint=endpoint, agent_id=agent_id)
def chat(self, message: str) -> str:
# Your custom implementation
passThis FastAPI service can be deployed as an MCP (Model Context Protocol) server through Azure API Management, enabling AI applications to consume your Azure AI agents through a standardized protocol.
βββββββββββββββββββββββββββββββββββββββββββ
β [MCP CLIENT APPLICATIONS] β
β (Claude Desktop, IDEs, AI Tools) β
ββββββββββββββββββββ¬βββββββββββββββββββββββ
β MCP Protocol
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β [AZURE API MANAGEMENT] β
β β’ MCP Endpoint Mapping β
β β’ Authentication β
β β’ Rate Limiting β
β β’ Load Balancing β
ββββββββββββββββββββ¬βββββββββββββββββββββββ
β HTTPS
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β [AZURE CONTAINER APPS - FastAPI] β
β β’ /bing-grounding β AI Agent Wrapper β
β β’ /completion β OpenAI Wrapper β
β β’ /chat β OpenAI Chat Wrapper β
β β’ /health β Health Check β
βββββββββββββββββββββββββββββββββββββββββββ
- Standardized Protocol - MCP clients can discover and use your agents automatically
- Enterprise Security - APIM handles authentication, authorization, and rate limiting
- Scalability - Load balance across multiple container instances
- Monitoring - Centralized logging and analytics through APIM
- Version Management - Deploy multiple versions side-by-side
- Deploy FastAPI to Azure Container Apps (see Production Deployment section)
- Configure APIM to expose MCP endpoints:
- Map MCP protocol operations to FastAPI endpoints
- Configure CORS for web-based MCP clients
- Set up authentication (API keys, OAuth, etc.)
- Register with MCP clients:
- Provide APIM endpoint URL
- Configure authentication credentials
- MCP clients will auto-discover available agents
{
"mcpServers": {
"azure-ai-agents": {
"url": "https://your-apim.azure-api.net",
"apiKey": "your-apim-subscription-key",
"endpoints": {
"bing-grounding": "/bing-grounding",
"completion": "/completion",
"chat": "/chat"
}
}
}
}ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β [INTERNET] β
β Clients β
βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β [AZURE API MANAGEMENT] β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Load Balancer + Circuit Breaker β β
β β β’ Session Affinity (Sticky Sessions) β β
β β β’ Health-Based Routing β β
β β β’ Auto Failover & Recovery β β
β βββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββΌββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ ... (5 instances)
β β
HEALTHY β β β UNHEALTHYβ
β Container β β Container β
β App #1 β β App #2 β
β [ACTIVE] β β [REMOVED] β
βββββββββββββββ βββββββββββββββ
Legend:
β
[HEALTHY] - Backend available in pool, receiving traffic
β [UNHEALTHY] - Backend removed from pool, no traffic
β±οΈ [UNKNOWN] - Backend status being evaluated
- Session Affinity (Sticky Sessions) - Clients stick to the same backend via cookies
- Circuit Breaker - Unhealthy backends automatically removed from pool
- Auto-Recovery - Backends rejoin when returning 200 OK
- Health-Aware Routing - Only route to healthy instances
This policy provides intelligent load balancing across 5 Azure Container App instances.
<!--
Azure API Management Policy for Load Balancing with Session Affinity and Circuit Breaker
Features:
- Cookie-based session affinity (sticky sessions)
- Automatic circuit breaking based on backend health
- Failover to healthy instances when backend returns 500, 429, or 401
- Automatic recovery when backends return 200 OK
Apply this policy at the API level for your main endpoints
-->
<policies>
<inbound>
<base />
<!-- Check for healthy backends from cache -->
<set-variable name="healthyBackends" value="@{
var allBackends = new[] { "0", "1", "2", "3", "4" };
var healthyList = new System.Collections.Generic.List<string>();
foreach (var id in allBackends)
{
string cacheKey = "backend-health-" + id;
string healthStatus;
if (context.Cache.TryGetValue(cacheKey, out healthStatus))
{
if (healthStatus == "healthy")
{
healthyList.Add(id);
}
}
else
{
healthyList.Add(id);
}
}
return healthyList.Count > 0 ? healthyList.ToArray() : allBackends;
}" />
<!-- Session affinity with health check -->
<choose>
<when condition="@(context.Request.Headers.GetValueOrDefault("Cookie","").Contains("APIM-Backend-Instance"))">
<set-variable name="backendInstance" value="@{
string cookie = context.Request.Headers.GetValueOrDefault("Cookie","");
var match = System.Text.RegularExpressions.Regex.Match(cookie, @"APIM-Backend-Instance=(\d+)");
string requestedId = match.Success ? match.Groups[1].Value : null;
var healthyBackends = (string[])context.Variables["healthyBackends"];
if (requestedId != null && healthyBackends.Contains(requestedId))
{
return requestedId;
}
var random = new Random();
return healthyBackends[random.Next(0, healthyBackends.Length)];
}" />
</when>
<otherwise>
<set-variable name="backendInstance" value="@{
var healthyBackends = (string[])context.Variables["healthyBackends"];
var random = new Random();
return healthyBackends[random.Next(0, healthyBackends.Length)];
}" />
</otherwise>
</choose>
<!-- Set backend URL based on instance ID -->
<set-backend-service base-url="@{
string id = context.Variables.GetValueOrDefault<string>("backendInstance", "0");
var backends = new System.Collections.Generic.Dictionary<string, string> {
{ "0", "https://your-app-instance-1.azurecontainerapps.io" },
{ "1", "https://your-app-instance-2.azurecontainerapps.io" },
{ "2", "https://your-app-instance-3.azurecontainerapps.io" },
{ "3", "https://your-app-instance-4.azurecontainerapps.io" },
{ "4", "https://your-app-instance-5.azurecontainerapps.io" }
};
return backends.ContainsKey(id) ? backends[id] : backends["0"];
}" />
<set-header name="X-APIM-Correlation-Id" exists-action="skip">
<value>@(Guid.NewGuid().ToString())</value>
</set-header>
<set-header name="X-Backend-Instance" exists-action="override">
<value>@(context.Variables.GetValueOrDefault<string>("backendInstance", "0"))</value>
</set-header>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
<!-- Circuit breaker: Update health status based on response -->
<choose>
<when condition="@(context.Response.StatusCode >= 500 || context.Response.StatusCode == 429 || context.Response.StatusCode == 401)">
<!-- Mark backend as unhealthy for 30 seconds on errors -->
<cache-store-value key="@("backend-health-" + context.Variables.GetValueOrDefault<string>("backendInstance"))" value="unhealthy" duration="30" />
</when>
<when condition="@(context.Response.StatusCode == 200)">
<!-- Mark backend as healthy on 200 OK response -->
<cache-store-value key="@("backend-health-" + context.Variables.GetValueOrDefault<string>("backendInstance"))" value="healthy" duration="30" />
</when>
</choose>
<!-- Set session affinity cookie -->
<set-header name="Set-Cookie" exists-action="append">
<value>@{
string instance = context.Variables.GetValueOrDefault<string>("backendInstance", "0");
return $"APIM-Backend-Instance={instance}; Path=/; Max-Age=86400; HttpOnly; Secure; SameSite=Lax";
}</value>
</set-header>
<set-header name="X-Served-By-Instance" exists-action="override">
<value>@(context.Variables.GetValueOrDefault<string>("backendInstance", "0"))</value>
</set-header>
</outbound>
<on-error>
<base />
<!-- Circuit breaker: Mark backend as unhealthy on connection errors -->
<cache-store-value key="@("backend-health-" + context.Variables.GetValueOrDefault<string>("backendInstance", "0"))" value="unhealthy" duration="30" />
<set-header name="X-Error-Backend-Instance" exists-action="override">
<value>@(context.Variables.GetValueOrDefault<string>("backendInstance", "unknown"))</value>
</set-header>
</on-error>
</policies>-
Update Backend URLs
In the policy XML, replace the placeholder URLs:
var backends = new System.Collections.Generic.Dictionary<string, string> { { "0", "https://your-app-instance-1.azurecontainerapps.io" }, { "1", "https://your-app-instance-2.azurecontainerapps.io" }, { "2", "https://your-app-instance-3.azurecontainerapps.io" }, { "3", "https://your-app-instance-4.azurecontainerapps.io" }, { "4", "https://your-app-instance-5.azurecontainerapps.io" } };
-
Apply Policy in Azure Portal
- Navigate to your APIM service
- Go to your API β Design tab
- Click "All operations" (or specific operations)
- In "Inbound processing", click the code editor (
</>) - Paste the policy XML
- Click Save
-
Deploy Container Apps
- Deploy 5 instances of this application to Azure Container Apps
- Ensure each has a unique URL
- Verify
/healthendpoint is accessible
A backend is marked [UNHEALTHY] (removed from pool for 30 seconds) when:
- Returns
500,502,503,504(Server errors) - Returns
429(Rate limit exceeded) - Returns
401(Authentication failed) - Connection timeout or failure
A backend is marked [HEALTHY] (rejoins pool) when:
- Returns
200 OK - Health status cache expires (after 30 seconds)
βββββββββββββββββββββββββββ
β [START] Request β
β Incoming β
βββββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β [CHECK] Read Cache β
β Get Healthy Backends β
β (Instances 0-4) β
βββββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β [DECISION] Does client β
β have session cookie? β
βββββββββββββ¬ββββββββββββββ
β
βββββββ΄ββββββ
β β
[YES] [NO]
β β
βΌ βΌ
βββββββββββββββ ββββββββββββββββββββ
β[CHECK] Is β β[ASSIGN] Pick β
βcookie's β βrandom healthy β
βbackend β βbackend (0-4) β
βhealthy? β ββββββββββ¬ββββββββββ
ββββββββ¬βββββββ β
β β
βββββ΄ββββ β
β β β
[YES] [NO] β
β β β
β ββββββββ¬ββββββββ
β β
β βΌ
β ββββββββββββββββββββ
β β[REASSIGN] Pick β
β βdifferent healthy β
β βbackend β
β βββββββββββ¬βββββββββ
β β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β [ROUTE] Forward to β
β Selected Backend β
β Instance (0-4) β
βββββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β [RESPONSE] Backend β
β Returns Status Code β
βββββββββββββ¬ββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β [EVALUATE] Check Status β
β Code from Backend β
βββββββββββββ¬ββββββββββββββ
β
βββββββ΄ββββββ
β β
[200 OK] [ERROR: 401/429/500+]
β β
βΌ βΌ
βββββββββββββββ ββββββββββββββββββββ
ββ
HEALTHY β ββ UNHEALTHY β
βCache as β βCache as β
β"healthy" β β"unhealthy" β
βTTL: 30s β βTTL: 30s β
β[AVAILABLE] β β[REMOVED FROM β
β β β POOL] β
βββββββββββββββ ββββββββββββββββββββ
| Backend Response | Circuit Breaker Action | Status Symbol | Duration | Client Impact |
|---|---|---|---|---|
200 OK |
Mark healthy | β [HEALTHY] | 30s cache | Continues routing |
401 Unauthorized |
Mark unhealthy, remove from pool | β [UNHEALTHY] | 30s | Route to different backend |
429 Rate Limit |
Mark unhealthy, remove from pool | β [UNHEALTHY] | 30s | Route to different backend |
500+ Server Error |
Mark unhealthy, remove from pool | β [UNHEALTHY] | 30s | Route to different backend |
| Connection Error | Mark unhealthy, remove from pool | β [UNHEALTHY] | 30s | Route to different backend |
| Cache Expired | Re-evaluate on next request | β±οΈ [UNKNOWN] | N/A | May retry backend |
The APIM policy adds several headers for monitoring:
X-APIM-Correlation-Id: Unique request ID for tracingX-Backend-Instance: Which backend (0-4) will handle the request
X-Served-By-Instance: Which backend actually served the responseX-Error-Backend-Instance: (On errors) Which backend caused the error
APIM-Backend-Instance: Session affinity cookie (value 0-4, 24hr TTL)
# First request - receives a backend assignment
curl -i https://your-apim.azure-api.net/completion
# Check the Set-Cookie header for: APIM-Backend-Instance=X
# Subsequent requests with cookie go to same backend
curl -i https://your-apim.azure-api.net/completion \
-H "Cookie: APIM-Backend-Instance=0"-
Simulate Failure
# Stop one Container App instance or cause it to return 500s -
Observe Failover
# Requests automatically route to healthy instances curl -i https://your-apim.azure-api.net/health | grep X-Served-By-Instance
-
Test Recovery
# Restart the instance, wait 30 seconds # It automatically rejoins the pool on first 200 response
Check which backends are currently healthy:
# Make requests and check which instances respond
for i in {1..10}; do
curl -s https://your-apim.azure-api.net/completion \
-i | grep "X-Served-By-Instance"
done- Azure subscription
- Azure API Management instance
- 5 Azure Container App instances
- Container registry (Azure Container Registry recommended)
-
Build Docker image
docker build -t your-registry.azurecr.io/openai-wrapper:latest . -
Push to registry
docker push your-registry.azurecr.io/openai-wrapper:latest
-
Deploy to Container Apps
az containerapp create \ --name openai-wrapper-1 \ --resource-group your-rg \ --environment your-env \ --image your-registry.azurecr.io/openai-wrapper:latest \ --target-port 8000 \ --ingress external \ --env-vars \ OPENAI_ENDPOINT="https://your-instance.openai.azure.com/" \ OPENAI_API_KEY="your-key" \ OPENAI_API_VERSION="2025-01-01-preview" \ OPENAI_MODEL_DEPLOYMENT_NAME="gpt-4"Repeat for instances 2-5 with different names.
-
Enable Internal Cache (Required for circuit breaker)
- Navigate to APIM β Caching
- Enable built-in cache
-
Import API
- Create or import your OpenAI wrapper API
- Add operations:
/health,/completion,/chat
-
Apply Policy
- Use the policy XML from
apim-policy.xml - Update backend URLs
- Apply at API level or operation level
- Use the policy XML from
- Internal cache enabled in APIM
- All 5 Container Apps deployed and running
- Health endpoints returning 200 OK
- Backend URLs updated in APIM policy
- Policy applied and tested
- Session affinity tested with cookies
- Circuit breaker tested with simulated failures
- Monitoring/alerts configured (Application Insights)
- Security: APIM subscription keys configured
- Security: Container Apps ingress restricted to APIM (if needed)
- Cache TTL: 30 seconds is default, adjust based on recovery time needs
- Monitoring: Set up Application Insights for both APIM and Container Apps
- Alerts: Create alerts when >50% of backends are unhealthy
- Scaling: Configure Container Apps autoscaling based on CPU/memory
- Security: Use Azure Key Vault for storing OpenAI API keys
- Rate Limits: Configure APIM rate limiting policies
- Quotas: Set appropriate quotas per client/subscription
Issue: All requests go to same instance
- Fix: Verify APIM cache is enabled
- Fix: Check
Set-Cookieheader is being sent - Fix: Test without cookies to verify random distribution
Issue: Backends not marked unhealthy on failures
- Fix: Verify
/healthendpoint returns correct status codes - Fix: Check APIM diagnostic logs
- Fix: Ensure cache is properly configured
Issue: Circuit breaker not recovering
- Fix: Wait 30 seconds for cache expiration
- Fix: Ensure backend returns
200 OK - Fix: Check
X-Served-By-Instanceheader
Issue: High latency on health checks
- Fix: Health checks are passive (based on regular traffic)
- Fix: Consider implementing active health monitoring
simple_openai_api_wrapper/
βββ agents/ # AI Agent implementations
β βββ __init__.py
β βββ base_agent.py # Abstract base class for all agents
β βββ agent_bing_grounding.py # Bing grounding agent with citation extraction
βββ ai/ # Azure OpenAI integration
β βββ __init__.py
β βββ azure_openai_client.py # Azure OpenAI client wrapper
βββ app/ # FastAPI application
β βββ __init__.py
β βββ chat_completion.py # Completion and chat logic
β βββ create_table.py # Database table creation (optional)
β βββ main.py # FastAPI endpoints (/health, /completion, /chat, /bing-grounding)
βββ models/ # Data models
β βββ __init__.py
β βββ model.py # Pydantic models (Messages, etc.)
βββ apim-policy.xml # Main APIM policy (load balancing + circuit breaker)
βββ apim-policy-with-healthcheck.xml # APIM policy with enhanced health monitoring
βββ apim-healthcheck-monitor.xml # Optional active health monitoring policy
βββ docker-compose.yaml # Local development with Docker
βββ dockerfile # Container image definition
βββ env.sample # Environment variable template
βββ main.py # Application entry point
βββ requirements.txt # Python dependencies (openai, fastapi, azure-ai-projects, etc.)
βββ _env_activate.bat # Windows: Activate virtual environment
βββ _env_create.bat # Windows: Create virtual environment
βββ _install.bat # Windows: Install dependencies
βββ _run_server.bat # Windows: Run FastAPI server locally
βββ _up.bat # Windows: Start Docker Compose
βββ _down.bat # Windows: Stop Docker Compose
βββ README.md # This file
| Variable | Description | Example |
|---|---|---|
| Azure OpenAI Configuration | ||
OPENAI_ENDPOINT |
Azure OpenAI endpoint URL | https://your-instance.openai.azure.com/ |
OPENAI_API_KEY |
Azure OpenAI API key | your-api-key |
OPENAI_API_VERSION |
API version | 2025-01-01-preview |
OPENAI_MODEL_DEPLOYMENT_NAME |
Deployment name | gpt-4 or o1 |
OPENAI_PROMPT |
Default system prompt | You are a helpful assistant. |
| Azure AI Agent Configuration | ||
AZURE_AI_PROJECT_ENDPOINT |
Azure AI Project endpoint | https://your-project.services.ai.azure.com/api/projects/yourProject |
AZURE_AI_AGENT_ID |
Azure AI Agent ID | asst_xxxxxxxxxxxxx |
MIT