Skip to content

Tarique-B-DevOps/AWS-CloudOps-Agent

Repository files navigation

Agentic ClouOps with AWS CloudOps Assistant | AI-Powered AWS Operations with Strands Agent Framework

AWS AgentCore Strands Terraform Jenkins Python Docker License

Conversational AI Agent for Day-to-Day AWS Operations | Manage EC2, IAM, S3, and more through natural language with persistent memory, fully automated deployment on AWS Bedrock AgentCore.


🎯 Overview

The AWS CloudOps Assistant is an intelligent AI agent designed to simplify day-to-day AWS operations through natural language conversations. Instead of navigating the AWS Console, simply tell the agent what you needβ€”managing EC2 instances, IAM operations, querying resources, code and AWS CLI commands generation, cost analysis, security and network management, and it handles the rest.

Built on the Strands Agent Framework and powered by AWS Bedrock LLMs, this project showcases:

  • Agentic AI Capabilities: Autonomous task execution with tool use, reasoning, and memory
  • DevOps Best Practices: Infrastructure as Code, CI/CD automation, containerized deployments
  • Enterprise-Ready Architecture: Managed serverless hosting on AWS Bedrock AgentCore
overview.mp4

πŸ€– Agentic Capabilities

What Makes This Agent Intelligent

Capability Description
Tool Use Executes real AWS operations via use_aws toolβ€”create, modify, query, and delete resources
Contextual Memory Remembers resources created, user preferences, and conversation history across sessions
Reasoning Understands intent, asks clarifying questions when needed, and explains actions before executing
Guardrails Focused exclusively on AWS operations; declines off-topic requests gracefully
Streaming Responses Real-time response streaming for immediate feedback during long operations

πŸ› οΈ Technology Stack

Component Technology Purpose
AI Agent Framework Strands Agent Orchestrates tool use, reasoning, and conversation flow
LLM Platform AWS Bedrock Foundation model for natural language understanding and generation
Agent Hosting AWS Bedrock AgentCore Runtime Managed serverless infrastructure for running agents at scale
Long-Term Memory AWS Bedrock AgentCore Memory Persistent storage for user context, resource tracking, and session history
Web Interface Streamlit Interactive chat UI with session management and real-time streaming
API Layer FastAPI High-performance REST API with async support
Containerization Docker & Docker Buildx container builds
Infrastructure as Code Terraform Modular, reusable infrastructure definitions
CI/CD Pipeline Jenkins Automated build, scan, test, and deployment workflows
Backend Language Python 3.11+ Agent logic, API endpoints, and integrations

✨ Features

🧠 Intelligent Agent

  • Natural Language Interface: Describe what you want in plain English
  • Autonomous Execution: Agent plans and executes multi-step operations
  • Memory Persistence: Resources and context remembered across sessions using AgentCore Memory
  • Safe Operations: Confirmation prompts for destructive actions, clear explanations before execution
  • Markdown Responses: Structured output with tables for resource listings

πŸ’¬ Streamlit Web Interface

  • Real-Time Streaming: See agent responses as they're generated
  • Session Management: Unique session IDs with persistent actor identity
  • Agent Status Display: Live connection status to AgentCore runtime
  • Memory Indicators: Visual confirmation when memory is active
  • Chat History: Full conversation history within sessions

βš™οΈ Infrastructure & Deployment

  • Modular Terraform: Reusable modules for ECR, AgentCore, VPC, ALB, ECS
  • Multi-Environment Support: Deploy to dev, staging, or prod with parameter switches
  • Security Scanning: Trivy for IaC and container images, Snyk for code analysis
  • Approval Gates: Manual approval steps for infrastructure changes
  • Slack Notifications: Pipeline status updates at every stage

πŸ“– Use Cases

The AWS CloudOps Assistant excels at everyday AWS operations. Below are demonstrated use cases with video walkthroughs.

πŸ–₯️ Resource Management

Create an EC2 Instance

"Create an EC2 instance in us-east-1 using the latest Amazon Linux 2023 AMI with t2.micro instance type. Use default VPC and subnet, no key pair needed, and use default security group settings."

The agent creates the instance and automatically stores the instance ID, ARN, and configuration in memory for future reference.

create_ec2.mp4

Stop a Previously Created Instance

"Stop the EC2 instance that you created previously."

The agent retrieves the instance ID from memory and stops itβ€”no need to specify IDs manually.

stop_ec2.mp4

πŸ” IAM Policy Management

Attach a Policy to a User

"Attach the policy AmazonEC2FullAccess to the IAM user named JohnDoe."

The agent attaches the managed policy and stores the action in memory.

iam_attach.mp4

Detach a Policy from a User

"Detach the policy from the IAM user JohnDoe that you attached previously."

Even in a new session, the agent recalls the previous action from memory and detaches the correct policy.

iam_detach.mp4

πŸ’» Code Generation

Generate a Boto3 Script

"Generate a Python script using Boto3 that iterates through all S3 buckets and stores their information in a CSV file."

The agent generates ready-to-use Python code with proper error handling, CSV formatting, and AWS best practices.

boto3.mp4

πŸ“‹ Command Suggestions

Get AWS CLI Commands

"Give me the AWS CLI command to copy a local folder to an S3 bucket."

The agent provides the exact CLI command with explanations of flags and options for your specific use case.

cli.mp4

πŸ” Additional Use Cases

The AWS CloudOps Assistant can handle a wide range of additional AWS operations:

Cost Analysis

"Analyze my AWS costs for the last 30 days and identify the top 5 services by spending. Show me cost optimization recommendations."

The agent retrieves cost data from AWS Cost Explorer, generates detailed reports, and provides actionable recommendations for reducing expenses.

Alarm Creation

"Create a CloudWatch alarm that triggers when CPU utilization exceeds 80% for any EC2 instance in us-east-1. Send notifications to my SNS topic."

The agent creates the alarm with proper thresholds, associates it with the SNS topic, and configures evaluation periods.

CloudTrail Log Analysis

"Analyze CloudTrail logs from the past 7 days and show me all failed IAM authentication attempts. Identify any suspicious access patterns."

The agent queries CloudTrail logs, filters for security events, and provides insights on access patterns and potential security issues.

Resource Tagging

"Tag all EC2 instances in us-east-1 with Environment=Production and Project=WebApp. Also add a CostCenter tag with value IT-001."

The agent identifies all instances, applies the specified tags consistently, and verifies the tagging operation.

Security Group Rule Management

"Add an inbound rule to security group sg-12345678 allowing SSH access (port 22) from IP 203.0.113.0/24. Also remove any existing rules that allow access from 0.0.0.0/0 on port 22."

The agent modifies security group rules, adds new rules with proper CIDR blocks, and removes overly permissive rules for enhanced security.

S3 Bucket Operations

"Create an S3 bucket lifecycle policy that moves objects older than 30 days to Glacier storage and deletes objects older than 365 days."

The agent creates the lifecycle configuration, applies it to the bucket, and ensures proper transition and expiration rules.

Lambda Function Management

"Update the environment variables for my Lambda function 'processData' to set LOG_LEVEL=DEBUG and TIMEOUT=300."

The agent updates the Lambda function configuration, modifies environment variables, and verifies the changes.

RDS Database Operations

"Create a snapshot of my RDS database instance 'prod-db' and name it 'prod-db-backup-2024-01-15'. Also show me the last 5 snapshots."

The agent creates the database snapshot, monitors the snapshot creation process, and lists recent snapshots for backup management.

VPC Configuration

"Create a new VPC with CIDR 10.0.0.0/16 in us-east-1. Set up public and private subnets across two availability zones, and configure an internet gateway and NAT gateway."

The agent creates the complete VPC infrastructure with proper networking components, route tables, and gateway configurations.

Auto Scaling Management

"Configure an auto scaling group for my application that scales between 2 and 10 instances based on CPU utilization. Set up a target tracking policy to maintain 50% CPU."

The agent creates the auto scaling group, configures scaling policies, and sets up CloudWatch alarms for automatic scaling.

Backup and Disaster Recovery

"Create a backup plan for all EBS volumes in us-east-1. Schedule daily backups with 7-day retention and weekly backups with 30-day retention."

The agent sets up AWS Backup plans, configures backup schedules, and applies retention policies for disaster recovery compliance.

Compliance Reporting

"Generate a compliance report showing all S3 buckets that are publicly accessible. Also check which buckets don't have encryption enabled."

The agent audits S3 bucket configurations, identifies security and compliance issues, and generates a detailed report with remediation recommendations.


πŸš€ Automated Agent Lifecycle on AgentCore

The entire agent lifecycleβ€”from build to deployment to destructionβ€”is fully automated through Terraform and Jenkins.

πŸ“¦ Terraform Modular Architecture

Infrastructure is organized using custom Terraform modules that I developed and maintain in separate repositories:

Module Source Repository Resources Created
ECR Terraform-AWS-ECR-ECS Container registries for agent and app images
AgentCore Memory Terraform-AWS-AgentCore Long-term memory store for agent context
AgentCore Runtime Terraform-AWS-AgentCore Managed serverless agent hosting
VPC Terraform-AWS-VPC-EKS Network infrastructure with public/private subnets
ALB Terraform-AWS-ECR-ECS Application Load Balancer for web traffic
ECS Terraform-AWS-ECR-ECS Fargate cluster for Streamlit web app

πŸ”§ Jenkins Pipeline Features

The CI/CD pipeline provides flexible deployment options with built-in safety:

Feature Description
Deployment Types NewDeployment, FullRelease, AgentRelease, AppRelease, UpdateInfra
Multi-Environment Deploy to dev, staging, or prod with a single parameter
Parameterized Builds Agent name, version, region configurable per run
Security Scanning Trivy scans for IaC and container vulnerabilities; Snyk for code
Plan Before Apply Terraform plan with detailed exit codes; apply only on changes
Approval Gates Manual approval required before infrastructure modifications
Slack Integration Notifications for start, approval requests, success, and failure
Selective Builds Build only agent, only app, or both based on deployment type
Destroy Capability Safe teardown with plan preview and approval

πŸ“Έ Screenshots

Pipeline Parameters

Image

Approval Gate

Image

Pipeline Stage Overview

Image Image


🐳 Running Locally

Prerequisites

  • Docker and Docker Compose installed
  • AWS credentials configured (access key, secret key, or IAM role)
  • AWS Bedrock model access enabled (Claude 4.5 Sonnet recommended)

Quick Start with Docker Compose

  1. Clone the repository

    git clone https://github.com/Tarique-B-DevOps/AWS-CloudOps-Agent.git
    cd AWS-CloudOps-Agent
  2. Configure environment variables

    Required variables:

    AWS_ACCESS_KEY_ID=your_access_key
    AWS_SECRET_ACCESS_KEY=your_secret_key
    BEDROCK_MODEL_REGION=us-east-1
    BEDROCK_MODEL_ID=us.anthropic.claude-3-5-sonnet-20241022-v2:0
    
  3. Start the services

    docker-compose up -d --build
  4. Access the application

  5. Stop the services

    docker-compose down

View Logs

docker-compose logs -f

πŸ“ Project Structure

AWS-CloudOps-Agent/
β”œβ”€β”€ agent.py              # FastAPI backend with Strands Agent
β”œβ”€β”€ app.py                # Streamlit web interface
β”œβ”€β”€ models.py             # Pydantic request/response schemas
β”œβ”€β”€ Dockerfile.agent      # Agent container definition
β”œβ”€β”€ Dockerfile.app        # Web app container definition
β”œβ”€β”€ docker-compose.yml    # Local development orchestration
β”œβ”€β”€ Jenkinsfile           # CI/CD pipeline definition
β”œβ”€β”€ main.tf               # Root Terraform configuration
β”œβ”€β”€ variables.tf          # Terraform variable definitions
β”œβ”€β”€ outputs.tf            # Terraform output values
└── frontend/             # React/Vite frontend (under development)

πŸ“ Notes

  • Least Privilege Principle: The agent follows AWS security best practices. Grant only the specific permissions required for the operations you intend to perform. Avoid broad * permissionsβ€”scope IAM policies to the exact actions and resources the agent will access.

  • Model Access: Ensure your AWS account has access to the Bedrock model specified in BEDROCK_MODEL_ID. Claude 3.5 Sonnet is recommended for optimal performance.


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Built with Strands Agent Framework β€’ Deployed on AWS Bedrock AgentCore β€’ Automated with Terraform & Jenkins

About

Agentic AI for AWS CloudOps powered by Strands Agent, Bedrock FMs, AgentCore Runtime & Memory | Full CI/CD with Docker, Terraform & Jenkins

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors