Cloud Cost Optimization Agent

An autonomous multi-agent system for AWS cost optimization built with a Google ADK-inspired architecture and tree-of-thought reasoning. The system moves from one-off cost reports to continuous monitoring, prioritized recommendations, and safe automated remediation.

What Makes This Different

Traditional tools: static analysis → manual review → manual actions
This agent system: continuous monitoring → multi-path reasoning → autonomous actions → learning

Key Capabilities

Autonomous operation: runs scheduled and continuous workflows
Tree-of-thought reasoning: explores multiple decision paths before acting
Adaptive learning: stores patterns and prior decisions in local memory
Safety-first automation: risk tiers, approvals, and simulation mode
Real-time response: reacts to anomalies and high-confidence savings opportunities
Multi-agent architecture: monitor, analyzer, executor, and orchestrator agents

Agent Architecture

Cloud Cost Optimization Agent System
├── Orchestrator Agent    - workflow coordination and strategic insights
├── Monitor Agent         - continuous AWS resource monitoring
├── Analyzer Agent        - recommendations with tree-of-thought reasoning
├── Executor Agent        - safe autonomous action execution
├── Memory System         - persistent learning and pattern storage
├── Safety Framework      - risk assessment and approval workflows
└── Dashboard System      - runtime status and pending approvals

Enhanced Detection Capabilities

Resource Type	Detection Method	Autonomous Actions	Learning Features
Unattached EBS Volumes	Age + attachment analysis	Auto-snapshot + delete	Pattern-based retention
Idle EC2 Instances	Multi-metric analysis	Auto-stop with schedules	Usage pattern learning
Stale Snapshots	Age + dependency tracking	Smart retention policies	Policy optimization
Idle Load Balancers	Traffic analysis + trends	Auto-consolidation	Load pattern recognition
Overprovisioned RDS	Performance + cost modeling	Automated rightsizing	Workload characterization
S3 Lifecycle Gaps	Access pattern analysis	Smart lifecycle policies	Data aging patterns
Unused Resources	Dependency mapping	Safe automated cleanup	Resource correlation

Quick Start

Prerequisites

Python 3.11+
AWS account with read permissions (write permissions for live remediation)
Google Cloud project (optional, for Vertex AI reasoning)
4GB+ RAM recommended for full agent operation

Installation

Clone the repository:

git clone https://github.com/maheshchebrolu-git/cloud-cost-optimization-agent.git
cd cloud-cost-optimization-agent

Install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configure AWS credentials:

# Create .env
AWS_ACCESS_KEY_ID=your_access_key_here
AWS_SECRET_ACCESS_KEY=your_secret_key_here
AWS_DEFAULT_REGION=us-east-1

# Optional: Google Cloud configuration
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_REGION=us-central1

Usage Modes

Interactive mode (recommended for first use)

python cloud_cost_agent.py --mode interactive --simulation

Single optimization run

python cloud_cost_agent.py --mode single --simulation

Continuous operation

python cloud_cost_agent.py --mode continuous --simulation

Monitoring only

python cloud_cost_agent.py --mode monitor

Batch sweeper (read-only analysis)

python cloud_cost_sweeper.py

Interactive Demo

🤖 Cloud Cost Optimization Agent - Interactive Mode
agent> run
Running optimization cycle...
✅ Optimization complete!
   Potential savings: $247.50/month
   Actions executed: 5
   Pending approvals: 2

agent> approvals
📋 Pending Approvals (2)
1. action_20251003_171857_1 - $62.00/month
   Resource: i-0abc123def456789
   Action: terminate_stopped_instance
   Risk: caution

agent> approve
Enter approval number: 1
Approve action? (y/n): y
✅ Action approved and executed

Tree-of-Thought Reasoning in Action

Problem: should we delete a 30-day-old unattached EBS volume?

Traditional approach:

Volume is older than 30 days → delete

Tree-of-thought agent approach:

Reasoning Path 1 (Cost Focus):
├── $8/month waste for 30 days
├── Quick win with minimal risk
└── Recommendation: delete immediately

Reasoning Path 2 (Risk Analysis):
├── Check for recent snapshots
├── Verify no pending reattachments
├── Assess business criticality
└── Recommendation: snapshot first, then delete

Reasoning Path 3 (Pattern Learning):
├── Historical volume usage patterns
├── Team behavior analysis
├── Seasonal considerations
└── Recommendation: set smart retention policy

Synthesized Decision:
├── Confidence: 0.87
├── Action: create final snapshot, delete volume, update policy
└── Learning: update retention rules for this volume type

Safety and Risk Management

Risk classification system

Risk Levels:
├── SAFE (auto-execute)
│   ├── Release unused Elastic IPs
│   ├── Add S3 lifecycle policies
│   └── Delete unattached volumes >30 days
│
├── CAUTION (request approval)
│   ├── Stop/resize instances
│   ├── Modify database configurations
│   └── Change security settings
│
└── REVIEW_REQUIRED (manual review)
    ├── Delete production databases
    ├── Modify network configurations
    └── Cross-service dependencies

Safety mechanisms

Simulation mode: test actions without mutating AWS resources
Human approval: required for medium/high risk actions
Rollback plans: reversal steps captured per action
Circuit breakers: stop execution on unexpected results
Audit trail: decision and action history stored locally

Production Deployment

Option 1: Google Cloud Run

./deploy/deploy.sh your-project-id us-central1
gcloud run services describe cloud-cost-agent --region=us-central1

Option 2: Docker

docker build -t cloud-cost-agent .
docker run -d \
  -e AWS_ACCESS_KEY_ID=... \
  -e AWS_SECRET_ACCESS_KEY=... \
  -v $(pwd)/memory:/app/memory \
  cloud-cost-agent

Option 3: Local development

python cloud_cost_agent.py --mode continuous --simulation --log-level DEBUG

Configuration

Variable	Required	Description	Default
`AWS_ACCESS_KEY_ID`	✅	AWS access key	-
`AWS_SECRET_ACCESS_KEY`	✅	AWS secret key	-
`AWS_DEFAULT_REGION`	❌	Primary AWS region	us-east-1
`GOOGLE_CLOUD_PROJECT`	❌	GCP project for Vertex AI	-
`SIMULATION_MODE`	❌	Enable simulation mode	true
`LOG_LEVEL`	❌	Logging verbosity	INFO
`MEMORY_PATH`	❌	Agent memory storage	./memory

Required AWS Permissions

Minimum permissions (read-only)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:Describe*",
        "rds:Describe*",
        "s3:List*",
        "s3:GetBucketLifecycle*",
        "elbv2:Describe*",
        "elb:Describe*",
        "cloudwatch:GetMetricStatistics"
      ],
      "Resource": "*"
    }
  ]
}

Extended permissions (for actions)

{
  "Effect": "Allow",
  "Action": [
    "ec2:CreateSnapshot",
    "ec2:DeleteVolume",
    "ec2:ReleaseAddress",
    "ec2:StopInstances",
    "s3:PutLifecycleConfiguration"
  ],
  "Resource": "*",
  "Condition": {
    "StringEquals": {
      "aws:RequestedRegion": ["us-east-1", "us-west-2"]
    }
  }
}

Troubleshooting

Agent startup fails

pip install -r requirements.txt
aws sts get-caller-identity

No metrics found
- New resources may not have CloudWatch history yet
- The agent improves as metrics accumulate
High memory usage
- Increase container memory or reduce scan scope for large accounts
Action approval timeouts
- Use interactive approval commands or integrate your own approval channel

Debug mode

python cloud_cost_agent.py --mode single --log-level DEBUG --simulation

Health checks

agent> dashboard
python -c "import boto3; print(boto3.client('ec2').describe_regions())"
agent> run

Contributing

Adding new resource types

Add a detector in agents/analyzer.py
Add pricing constants for cost estimation
Implement execution logic in agents/executor.py
Add safety checks and risk classification
Test thoroughly in simulation mode

License

This project is licensed under the MIT License — see the LICENSE file for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Source: Repository home

Important Disclaimers

Always run in simulation mode before enabling live actions
Validate recommendations against business requirements
Monitor agent decisions and outcomes continuously
Keep backups for critical resources
Confirm actions comply with your organization's governance policies

Google ADK Integration

The project includes adk_integration.CloudCostADKIntegration, which exposes the orchestrator through a Google Agent Development Kit agent surface.

from adk_integration import CloudCostADKIntegration

cloud_adk = CloudCostADKIntegration(project_id="my-gcp-project", simulation_mode=True)
adk_agent = cloud_adk.adk_agent

Available ADK tools include run_optimization_cycle, run_monitoring_cycle, get_dashboard_snapshot, list_pending_approvals, approve_pending_action, and toggle_simulation_mode.

Install the preview google.adk package following Google ADK guidance until the SDK is generally available.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agents		agents
deploy		deploy
multi_tool_agen		multi_tool_agen
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
adk_integration.py		adk_integration.py
cloud_cost_agent.py		cloud_cost_agent.py
cloud_cost_sweeper.py		cloud_cost_sweeper.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Cloud Cost Optimization Agent

What Makes This Different

Key Capabilities

Agent Architecture

Enhanced Detection Capabilities

Quick Start

Prerequisites

Installation

Usage Modes

Interactive mode (recommended for first use)

Single optimization run

Continuous operation

Monitoring only

Batch sweeper (read-only analysis)

Interactive Demo

Tree-of-Thought Reasoning in Action

Problem: should we delete a 30-day-old unattached EBS volume?

Safety and Risk Management

Risk classification system

Safety mechanisms

Production Deployment

Option 1: Google Cloud Run

Option 2: Docker

Option 3: Local development

Configuration

Required AWS Permissions

Minimum permissions (read-only)

Extended permissions (for actions)

Troubleshooting

Debug mode

Health checks

Contributing

Adding new resource types

License

Support

Important Disclaimers

Google ADK Integration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages