Agentic ClouOps with AWS CloudOps Assistant | AI-Powered AWS Operations with Strands Agent Framework
Conversational AI Agent for Day-to-Day AWS Operations | Manage EC2, IAM, S3, and more through natural language with persistent memory, fully automated deployment on AWS Bedrock AgentCore.
The AWS CloudOps Assistant is an intelligent AI agent designed to simplify day-to-day AWS operations through natural language conversations. Instead of navigating the AWS Console, simply tell the agent what you needβmanaging EC2 instances, IAM operations, querying resources, code and AWS CLI commands generation, cost analysis, security and network management, and it handles the rest.
Built on the Strands Agent Framework and powered by AWS Bedrock LLMs, this project showcases:
- Agentic AI Capabilities: Autonomous task execution with tool use, reasoning, and memory
- DevOps Best Practices: Infrastructure as Code, CI/CD automation, containerized deployments
- Enterprise-Ready Architecture: Managed serverless hosting on AWS Bedrock AgentCore
overview.mp4
| Capability | Description |
|---|---|
| Tool Use | Executes real AWS operations via use_aws toolβcreate, modify, query, and delete resources |
| Contextual Memory | Remembers resources created, user preferences, and conversation history across sessions |
| Reasoning | Understands intent, asks clarifying questions when needed, and explains actions before executing |
| Guardrails | Focused exclusively on AWS operations; declines off-topic requests gracefully |
| Streaming Responses | Real-time response streaming for immediate feedback during long operations |
| Component | Technology | Purpose |
|---|---|---|
| AI Agent Framework | Strands Agent | Orchestrates tool use, reasoning, and conversation flow |
| LLM Platform | AWS Bedrock | Foundation model for natural language understanding and generation |
| Agent Hosting | AWS Bedrock AgentCore Runtime | Managed serverless infrastructure for running agents at scale |
| Long-Term Memory | AWS Bedrock AgentCore Memory | Persistent storage for user context, resource tracking, and session history |
| Web Interface | Streamlit | Interactive chat UI with session management and real-time streaming |
| API Layer | FastAPI | High-performance REST API with async support |
| Containerization | Docker & Docker Buildx | container builds |
| Infrastructure as Code | Terraform | Modular, reusable infrastructure definitions |
| CI/CD Pipeline | Jenkins | Automated build, scan, test, and deployment workflows |
| Backend Language | Python 3.11+ | Agent logic, API endpoints, and integrations |
- Natural Language Interface: Describe what you want in plain English
- Autonomous Execution: Agent plans and executes multi-step operations
- Memory Persistence: Resources and context remembered across sessions using AgentCore Memory
- Safe Operations: Confirmation prompts for destructive actions, clear explanations before execution
- Markdown Responses: Structured output with tables for resource listings
- Real-Time Streaming: See agent responses as they're generated
- Session Management: Unique session IDs with persistent actor identity
- Agent Status Display: Live connection status to AgentCore runtime
- Memory Indicators: Visual confirmation when memory is active
- Chat History: Full conversation history within sessions
- Modular Terraform: Reusable modules for ECR, AgentCore, VPC, ALB, ECS
- Multi-Environment Support: Deploy to dev, staging, or prod with parameter switches
- Security Scanning: Trivy for IaC and container images, Snyk for code analysis
- Approval Gates: Manual approval steps for infrastructure changes
- Slack Notifications: Pipeline status updates at every stage
The AWS CloudOps Assistant excels at everyday AWS operations. Below are demonstrated use cases with video walkthroughs.
"Create an EC2 instance in us-east-1 using the latest Amazon Linux 2023 AMI with t2.micro instance type. Use default VPC and subnet, no key pair needed, and use default security group settings."
The agent creates the instance and automatically stores the instance ID, ARN, and configuration in memory for future reference.
create_ec2.mp4
"Stop the EC2 instance that you created previously."
The agent retrieves the instance ID from memory and stops itβno need to specify IDs manually.
stop_ec2.mp4
"Attach the policy AmazonEC2FullAccess to the IAM user named JohnDoe."
The agent attaches the managed policy and stores the action in memory.
iam_attach.mp4
"Detach the policy from the IAM user JohnDoe that you attached previously."
Even in a new session, the agent recalls the previous action from memory and detaches the correct policy.
iam_detach.mp4
"Generate a Python script using Boto3 that iterates through all S3 buckets and stores their information in a CSV file."
The agent generates ready-to-use Python code with proper error handling, CSV formatting, and AWS best practices.
boto3.mp4
"Give me the AWS CLI command to copy a local folder to an S3 bucket."
The agent provides the exact CLI command with explanations of flags and options for your specific use case.
cli.mp4
The AWS CloudOps Assistant can handle a wide range of additional AWS operations:
"Analyze my AWS costs for the last 30 days and identify the top 5 services by spending. Show me cost optimization recommendations."
The agent retrieves cost data from AWS Cost Explorer, generates detailed reports, and provides actionable recommendations for reducing expenses.
"Create a CloudWatch alarm that triggers when CPU utilization exceeds 80% for any EC2 instance in us-east-1. Send notifications to my SNS topic."
The agent creates the alarm with proper thresholds, associates it with the SNS topic, and configures evaluation periods.
"Analyze CloudTrail logs from the past 7 days and show me all failed IAM authentication attempts. Identify any suspicious access patterns."
The agent queries CloudTrail logs, filters for security events, and provides insights on access patterns and potential security issues.
"Tag all EC2 instances in us-east-1 with Environment=Production and Project=WebApp. Also add a CostCenter tag with value IT-001."
The agent identifies all instances, applies the specified tags consistently, and verifies the tagging operation.
"Add an inbound rule to security group sg-12345678 allowing SSH access (port 22) from IP 203.0.113.0/24. Also remove any existing rules that allow access from 0.0.0.0/0 on port 22."
The agent modifies security group rules, adds new rules with proper CIDR blocks, and removes overly permissive rules for enhanced security.
"Create an S3 bucket lifecycle policy that moves objects older than 30 days to Glacier storage and deletes objects older than 365 days."
The agent creates the lifecycle configuration, applies it to the bucket, and ensures proper transition and expiration rules.
"Update the environment variables for my Lambda function 'processData' to set LOG_LEVEL=DEBUG and TIMEOUT=300."
The agent updates the Lambda function configuration, modifies environment variables, and verifies the changes.
"Create a snapshot of my RDS database instance 'prod-db' and name it 'prod-db-backup-2024-01-15'. Also show me the last 5 snapshots."
The agent creates the database snapshot, monitors the snapshot creation process, and lists recent snapshots for backup management.
"Create a new VPC with CIDR 10.0.0.0/16 in us-east-1. Set up public and private subnets across two availability zones, and configure an internet gateway and NAT gateway."
The agent creates the complete VPC infrastructure with proper networking components, route tables, and gateway configurations.
"Configure an auto scaling group for my application that scales between 2 and 10 instances based on CPU utilization. Set up a target tracking policy to maintain 50% CPU."
The agent creates the auto scaling group, configures scaling policies, and sets up CloudWatch alarms for automatic scaling.
"Create a backup plan for all EBS volumes in us-east-1. Schedule daily backups with 7-day retention and weekly backups with 30-day retention."
The agent sets up AWS Backup plans, configures backup schedules, and applies retention policies for disaster recovery compliance.
"Generate a compliance report showing all S3 buckets that are publicly accessible. Also check which buckets don't have encryption enabled."
The agent audits S3 bucket configurations, identifies security and compliance issues, and generates a detailed report with remediation recommendations.
The entire agent lifecycleβfrom build to deployment to destructionβis fully automated through Terraform and Jenkins.
Infrastructure is organized using custom Terraform modules that I developed and maintain in separate repositories:
| Module | Source Repository | Resources Created |
|---|---|---|
| ECR | Terraform-AWS-ECR-ECS | Container registries for agent and app images |
| AgentCore Memory | Terraform-AWS-AgentCore | Long-term memory store for agent context |
| AgentCore Runtime | Terraform-AWS-AgentCore | Managed serverless agent hosting |
| VPC | Terraform-AWS-VPC-EKS | Network infrastructure with public/private subnets |
| ALB | Terraform-AWS-ECR-ECS | Application Load Balancer for web traffic |
| ECS | Terraform-AWS-ECR-ECS | Fargate cluster for Streamlit web app |
The CI/CD pipeline provides flexible deployment options with built-in safety:
| Feature | Description |
|---|---|
| Deployment Types | NewDeployment, FullRelease, AgentRelease, AppRelease, UpdateInfra |
| Multi-Environment | Deploy to dev, staging, or prod with a single parameter |
| Parameterized Builds | Agent name, version, region configurable per run |
| Security Scanning | Trivy scans for IaC and container vulnerabilities; Snyk for code |
| Plan Before Apply | Terraform plan with detailed exit codes; apply only on changes |
| Approval Gates | Manual approval required before infrastructure modifications |
| Slack Integration | Notifications for start, approval requests, success, and failure |
| Selective Builds | Build only agent, only app, or both based on deployment type |
| Destroy Capability | Safe teardown with plan preview and approval |
- Docker and Docker Compose installed
- AWS credentials configured (access key, secret key, or IAM role)
- AWS Bedrock model access enabled (Claude 4.5 Sonnet recommended)
-
Clone the repository
git clone https://github.com/Tarique-B-DevOps/AWS-CloudOps-Agent.git cd AWS-CloudOps-Agent -
Configure environment variables
Required variables:
AWS_ACCESS_KEY_ID=your_access_key AWS_SECRET_ACCESS_KEY=your_secret_key BEDROCK_MODEL_REGION=us-east-1 BEDROCK_MODEL_ID=us.anthropic.claude-3-5-sonnet-20241022-v2:0 -
Start the services
docker-compose up -d --build
-
Access the application
- Web Interface: http://localhost:8501
- API Health Check: http://localhost:8080/ping
-
Stop the services
docker-compose down
docker-compose logs -fAWS-CloudOps-Agent/
βββ agent.py # FastAPI backend with Strands Agent
βββ app.py # Streamlit web interface
βββ models.py # Pydantic request/response schemas
βββ Dockerfile.agent # Agent container definition
βββ Dockerfile.app # Web app container definition
βββ docker-compose.yml # Local development orchestration
βββ Jenkinsfile # CI/CD pipeline definition
βββ main.tf # Root Terraform configuration
βββ variables.tf # Terraform variable definitions
βββ outputs.tf # Terraform output values
βββ frontend/ # React/Vite frontend (under development)
-
Least Privilege Principle: The agent follows AWS security best practices. Grant only the specific permissions required for the operations you intend to perform. Avoid broad
*permissionsβscope IAM policies to the exact actions and resources the agent will access. -
Model Access: Ensure your AWS account has access to the Bedrock model specified in
BEDROCK_MODEL_ID. Claude 3.5 Sonnet is recommended for optimal performance.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with Strands Agent Framework β’ Deployed on AWS Bedrock AgentCore β’ Automated with Terraform & Jenkins



