This repository contains a reference implementation of a hyper‑personalized multimodal AIOps agent built on serverless Azure services. The solution ingests telemetry (logs, metrics, documents and images), performs retrieval‑augmented reasoning with a large language model, and optionally executes remediation actions. A ready‑to‑use Microsoft Teams Adaptive Card is included to broadcast root cause analyses and recommendations to your operations teams.
The implementation is designed to accompany Edition 26 of the Dominant Forces in AI newsletter and is intended to give developers and solution architects a working example of a virtual SRE agent that can be deployed to the Azure cloud.
Wiki: https://github.com/Huzefaaa2/AIOps/wiki
aiops_code/
├── function_app/ # Main AIOps agent Azure Function
│ ├── __init__.py # Orchestrates logs, RAG, LLM and remediation
│ ├── function.json # HTTP trigger configuration
│ └── requirements.txt # Python dependencies
├── remediation/ # Secondary function for executing safe actions
│ ├── __init__.py
│ └── function.json
├── teams_rca_card_template.json # Adaptive Card schema for Teams posts
└── README.md # This documentation
Before deploying the AIOps agent you will need:
- Azure subscription and resource group where your resources will live.
- Log Analytics Workspace for ingesting logs and metrics. Note the workspace ID and ensure your services send telemetry to it.
- Azure Cognitive Search index populated with runbooks, past incidents, architecture diagrams or configuration snapshots. If you wish to use vector search you will need to generate embeddings for your documents ahead of time.
- Azure OpenAI Service (or another model provider) with a GPT‑4‑class model deployment. Record the resource endpoint, API key and deployment name.
- Microsoft Teams Incoming Webhook set up in a channel where you want root cause analyses to be posted.
- Azure Storage or Blob container if you intend to index PDF or image documents for the search index.
git clone https://github.com/Huzefaaa2/AIOps.git
cd AIOps/aiops_code
Use the Azure Portal or the Azure CLI to create a Cognitive Search service and an index. The index schema should include at minimum:
id– unique identifier (Edm.String)title– document title (Edm.String)content– full text content (Edm.String)url– link back to the source document (Edm.String)
If you choose vector search, also include an embedding field of type
Collection(Edm.Single) and populate it with embeddings generated from
your documents using the same OpenAI model you will deploy.
Populate the index with your runbooks, incident post‑mortems and architecture documentation. The AIOps agent will use this as its knowledge base for retrieval augmented generation (RAG).
The remediation function executes low/medium risk actions on your infrastructure. It is packaged separately to allow for distinct security boundaries. Deploy it first so you can reference its URL in the main agent configuration.
-
Create a new Azure Function App targeting Python (3.10 or newer) in your resource group.
-
Ensure the setting
FUNCTIONS_WORKER_RUNTIMEis set topython. -
Deploy the contents of
aiops_code/remediationvia Zip Deploy or the Azure Functions Core Tools. The simplest way from the root of this repository is:func azure functionapp publish <your-remediation-app-name> \ --python \ --build local \ --no-bundler \ --source ./aiops_code/remediation -
Note the URL of the function (e.g.
https://<your-remediation-app>.azurewebsites.net/api/remediation) and, if enabled, the function key. You will refer to this when configuring the agent.
-
Create another Azure Function App for the agent (or deploy to the same app using a different route). Enable Managed Identity so the function can authenticate to Log Analytics via Azure AD.
-
In the Function App Configuration blade, add the following application settings:
Setting Description LOG_ANALYTICS_WORKSPACE_IDWorkspace ID of your Log Analytics instance KQL_QUERYKQL query used to sample recent logs (defaults to `AppTraces SEARCH_ENDPOINTEndpoint URL of your Cognitive Search service SEARCH_INDEXName of the index created in step 2 SEARCH_API_KEYAdmin or query key for your search service OPENAI_ENDPOINTEndpoint URL of your Azure OpenAI resource OPENAI_API_KEYAPI key for your OpenAI resource OPENAI_DEPLOYMENTDeployment name of your GPT‑4 model TEAMS_WEBHOOK_URLIncoming webhook URL for posting Adaptive Cards REMEDIATION_URLURL of the remediation function you deployed in step 3 REMEDIATION_KEYFunction key for the remediation endpoint (optional) -
Deploy the contents of
aiops_code/function_appto your Function App. Using the Functions Core Tools:func azure functionapp publish <your-agent-app-name> \ --python \ --build local \ --no-bundler \ --source ./aiops_code/function_app
Once deployed, you can test the agent by sending an HTTP request to the function endpoint. For example:
POST https://<agent-app>.azurewebsites.net/api/aiops-agent
Content-Type: application/json
{
"question": "Why did the response time spike overnight?",
"incident": {
"title": "Payments API latency spike",
"environment": "prod",
"severity": "Sev2",
"start_time_local": "2025-10-06T10:00:00",
"id": "INC-12345",
"service_name": "payments-api",
"region": "uk-south",
"change_ref": "deploy 1042",
"dashboard_url": "https://portal.azure.com/",
"incident_url": "https://dev.azure.com/"
}
}
The response will include the root cause summary, a list of proposed actions (and their execution results if applicable), the documents used for grounding, and the HTTP status of the Teams message post.
- Modify the KQL query to suit your telemetry. For example, you might query across multiple tables or focus on specific services.
- Extend the search index with additional fields such as severity
tags, runbook categories or component names. Update the retrieval
logic in
function_app/__init__.pyaccordingly. - Add more remediation actions by editing the whitelist in
remediation/__init__.pyand implementing the corresponding automation logic. - Adjust the model prompt in
_build_prompt()to control how verbose the root cause summary is and to tailor the JSON schema. - Style the Adaptive Card by editing
teams_rca_card_template.jsonor the_build_adaptive_card()function to match your brand.
Pull requests are welcome! If you discover issues or have ideas to improve the agent—for example integrating with Prometheus or adding role‑specific summarisation—feel free to open an issue or submit a PR.
This project is licensed under the GPL‑3.0. See the LICENSE file for details.