AI-powered incident response agent that receives Alertmanager webhooks and performs automated Kubernetes debugging. Learns from feedback and maintains a knowledge base for faster resolution.
- Automated Debugging: Gathers logs, events, pod status, services, and network policies from Kubernetes
- Multi-LLM Support: Ollama (self-hosted), OpenAI, Anthropic Claude, Google Gemini, or AWS Bedrock
- Real-Time Streaming: Analysis streams progressively to Slack as it develops
- Learning System: Rate analyses with ✅/❌ in Slack; system learns from feedback
- Knowledge Base (Optional): PostgreSQL + pgvector for semantic search of past incidents
- Slack Integration: Threaded conversations with historical context links
Ollama (Self-hosted)
export LLM_PROVIDER=ollama
export OLLAMA_URL=http://ollama.ollama.svc.cluster.local:11434
export OLLAMA_MODEL=llama3OpenAI / Claude / Gemini
export LLM_PROVIDER=openai # or anthropic, gemini, bedrock
export OPENAI_API_KEY=sk-...
export OPENAI_MODEL=gpt-4-turbo-previewSee LLM_PROVIDERS.md for all options.
docker build -t k8flex-agent:latest .
kubectl apply -f k8s/deployment.yamlreceivers:
- name: 'k8flex-ai-debug'
webhook_configs:
- url: 'http://k8flex-agent.k8flex.svc.cluster.local:8080/webhook'Full setup: INTEGRATION.md
export SLACK_BOT_TOKEN=xoxb-...
export SLACK_CHANNEL_ID=C01234567Required scopes: chat:write, chat:write.public, reactions:read
Details: SLACK_SETUP.md
export KB_ENABLED=true
export KB_DATABASE_URL="postgresql://user:pass@host:5432/k8flex"
export KB_EMBEDDING_PROVIDER=openaiSetup: KNOWLEDGE_BASE.md
- Alertmanager sends webhook → K8flex receives alert
- AI categorizes alert type (pod/service/node/network/resource)
- System searches knowledge base for similar past cases (if enabled)
- Gathers targeted Kubernetes debug information
- AI analyzes and streams results to Slack in real-time
- Users rate analysis with ✅/❌ reactions
- Validated solutions stored for future incidents
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
ollama |
ollama, openai, anthropic, gemini, bedrock |
OLLAMA_URL |
http://ollama.ollama.svc.cluster.local:11434 |
Ollama endpoint |
OLLAMA_MODEL |
llama3 |
Model name |
OPENAI_API_KEY |
- | OpenAI API key |
SLACK_BOT_TOKEN |
- | Slack bot token (for advanced features) |
SLACK_CHANNEL_ID |
- | Slack channel ID |
KB_ENABLED |
false |
Enable knowledge base |
KB_DATABASE_URL |
- | PostgreSQL connection string |
WEBHOOK_AUTH_TOKEN |
- | Webhook authentication token |
Full reference: See Complete Configuration Reference section below or Configuration Documentation.
For feedback system and threading:
chat:write- Post messageschat:write.public- Post to public channelsreactions:read- Detect emoji reactions
Alerts must include these labels:
namespace(required): Kubernetes namespacepod(optional): Pod nameservice(optional): Service namealertname: Alert identifierseverity: Alert severity
Example:
- alert: PodNotReady
expr: kube_pod_status_phase{phase!="Running"} == 1
labels:
namespace: "{{ $labels.namespace }}"
pod: "{{ $labels.pod }}"
severity: warning# Local testing
go run main.go
# Test webhook
curl -XPOST 'http://localhost:8080/webhook' \
-H 'Content-Type: application/json' \
-d @test-alert.json
# Build
go build -o k8flex-agent .- INTEGRATION.md - Alertmanager/Prometheus setup
- ARCHITECTURE.md - Complete architecture and workflow
- QUICKSTART.md - Quick reference and examples
- USE_CASES.md - Use cases, benefits, and best practices
- LLM_PROVIDERS.md - All LLM provider configs
- SLACK_SETUP.md - Slack bot configuration
- FEEDBACK.md - Feedback system details
- KNOWLEDGE_BASE.md - Vector database setup
- WEBHOOK_SECURITY.md - Webhook authentication
Click to expand full environment variable list
| Variable | Default | Description |
|---|---|---|
PORT |
8080 |
HTTP server port |
LLM_PROVIDER |
ollama |
LLM provider |
OLLAMA_URL |
http://ollama.ollama.svc.cluster.local:11434 |
Ollama endpoint |
OLLAMA_MODEL |
llama3 |
Ollama model |
OPENAI_API_KEY |
- | OpenAI API key |
OPENAI_MODEL |
gpt-4-turbo-preview |
OpenAI model |
ANTHROPIC_API_KEY |
- | Anthropic API key |
ANTHROPIC_MODEL |
claude-3-5-sonnet-20241022 |
Anthropic model |
GEMINI_API_KEY |
- | Gemini API key |
GEMINI_MODEL |
gemini-1.5-pro |
Gemini model |
BEDROCK_REGION |
us-east-1 |
AWS region |
BEDROCK_MODEL |
anthropic.claude-3-5-sonnet-20241022-v2:0 |
Bedrock model ARN |
SLACK_WEBHOOK_URL |
- | Slack webhook (basic) |
SLACK_BOT_TOKEN |
- | Slack bot token (advanced) |
SLACK_CHANNEL_ID |
- | Slack channel ID |
SLACK_WORKSPACE_ID |
- | Workspace ID for thread links |
WEBHOOK_AUTH_TOKEN |
- | Webhook auth token |
KB_ENABLED |
false |
Enable knowledge base |
KB_DATABASE_URL |
- | PostgreSQL URL |
KB_EMBEDDING_PROVIDER |
openai |
openai or gemini |
KB_EMBEDDING_API_KEY |
- | Embedding API key |
KB_EMBEDDING_MODEL |
text-embedding-3-small |
Embedding model |
KB_SIMILARITY_THRESHOLD |
0.75 |
Similarity threshold (0-1) |
KB_MAX_RESULTS |
5 |
Max similar cases |
MIT
Contributions welcome! Ensure parameters are extracted from alert labels.