SRE Bot is a Slack bot designed for site reliability engineering at CDS. It automates incident management, integrates with cloud and collaboration platforms, and streamlines SRE workflows for modern teams.
-
Incident Management
- Create, update, and manage incidents (status, roles, conversations, documents, folders, notifications)
- Display and update incident information
- Notify about stale incident channels
- Schedule incident retrospectives
- On-call management for incidents
-
AWS Integration & SCIM-like User/Group Management
- Manage AWS access requests and approvals
- Monitor AWS account health
- Assign and manage AWS user and group memberships (SCIM bridge functionality)
- Manage AWS Identity Center and SSO access
- Track AWS spending and cost reports
-
Slack Integration & Webhook Management
- Create, list, and manage Slack webhooks
- Send notifications to Slack channels
- Integrate with Slack for incident and alert workflows
-
Google Workspace Integration
- Manage Google Workspace users and groups (provisioning, reporting)
- Integrate with Google for incident and workflow automation
-
Role & Talent Management
- Manage organizational roles, including special workflows for "Talent" roles
- Assign and update user roles within the organization
-
Secret Management
- Store, retrieve, and manage secrets securely
-
SRE & Geolocation Features
- Geolocate users or incidents (using MaxMind or similar)
- SRE-specific workflows and reporting
-
Notification & Alerting Integrations
- Integrate with external notification systems (OpsGenie, Sentinel, Trello, etc.)
- Send and manage alerts from various sources
-
Reporting & Analytics
- Generate and display reports (including Google Groups, AWS spending, etc.)
-
Webhook Integrations (General)
- Manage and process incoming webhooks from various sources (AWS SNS, custom, etc.)
- Route and handle webhook-based notifications
- SRE teams, DevOps engineers, and incident responders at CDS or similar organizations.
- Teams looking to automate cloud, incident, and collaboration workflows in Slack.
/sre help— Show all available SRE bot commands/sre incident— List incident management commands/sre geolocate <ip>— Geolocate an IP address/sre webhooks— List webhooks for the current channel/sre webhooks create— Create a new webhook/sre reports— List available reports/sre reports google-groups— Generate a Google Groups statistics report/aws users create <user>— Provision AWS user(s) via SCIM bridge (using Google Groups)/aws groups sync— Sync AWS groups with Google security groups/aws health— Query the health of an AWS account
This project uses Visual Studio Code Remote - Containers.
- Docker installed and running
- VS Code
- Clone the repo
- Open VS Code with Dev Container (Quick start guide)
- Install Python dependencies:
cd app && pip install --no-cache-dir -r requirements.txt- Add a
.envfile to the/workspace/appfolder (Contact SRE team for the project-specific .env setup) - Launch the dev bot:
make dev- Test your development in the dedicated Slack channel (SRE team will confirm which channel to use)
app/integrations/— Integrations with external services (Google Workspace, Slack, AWS, etc.)app/modules/— Bot features and user-facing commandsapp/jobs/— Scheduled jobs (e.g., reminders, status checks)
SRE Bot handles sensitive data such as secrets and user/group assignments. Please review our security guidelines and ensure you follow best practices for environment configuration and access control.
- For questions or support, contact the SRE team.
- For feature requests or bug reports, open an issue in this repository.
