A comprehensive Python tool for monitoring AWS EC2 instances and identifying various failure scenarios including system status checks, CloudWatch metrics anomalies, disk space issues, memory problems, and more.
- EC2 Instance Status Monitoring: Checks instance state and system/instance status checks
- CloudWatch Metrics Analysis: Monitors CPU utilization, network metrics, and status check failures
- Disk Space Monitoring: Uses AWS Systems Manager (SSM) to check disk usage
- Memory Usage Tracking: Monitors memory consumption via SSM
- Security Group Validation: Checks for common security group misconfigurations
- Comprehensive Reporting: Generates detailed failure reports with recommendations
- Multiple Output Formats: Supports text and JSON output formats
- Python 3.7 or higher
- AWS account with appropriate permissions
- AWS CLI configured (or environment variables set)
- EC2 instances with SSM agent installed (for disk/memory checks)
-
Clone or download this repository
-
Install required dependencies:
pip install -r requirements.txt- Configure AWS credentials using one of these methods:
- AWS CLI:
aws configure - Environment variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY - IAM role (if running on EC2)
- AWS profile:
export AWS_PROFILE=your-profile
- AWS CLI:
The tool requires the following AWS permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstanceAttribute",
"cloudwatch:GetMetricStatistics",
"ssm:DescribeInstanceInformation",
"ssm:SendCommand",
"ssm:GetCommandInvocation"
],
"Resource": "*"
}
]
}Check all instances in the default region (us-east-1):
python server_monitor.pypython server_monitor.py --instance-id i-1234567890abcdef0python server_monitor.py --region us-west-2python server_monitor.py --profile my-aws-profilepython server_monitor.py --no-ssmpython server_monitor.py --no-metricspython server_monitor.py --output report.txtpython server_monitor.py --json --output report.jsonpython server_monitor.py \
--region us-west-2 \
--instance-id i-1234567890abcdef0 \
--profile production \
--output failure_report.txt- Instance not running
- System status check failures
- Status check failures (CloudWatch)
- Instance status check failures
- High CPU utilization (>80%)
- High memory usage (>90%)
- High disk usage (>85%)
- Security group misconfigurations
- SSM command failures
The tool generates detailed reports including:
- Timestamp of detection
- Instance ID and name
- Failure type and severity
- Detailed description
- Metrics data
- Actionable recommendations
================================================================================
AWS SERVER FAILURE REPORT
Generated: 2024-01-15 10:30:45 UTC
Total Failures: 2
================================================================================
[CRITICAL] Failures (1):
--------------------------------------------------------------------------------
Instance: web-server-01 (i-1234567890abcdef0)
Type: System Status Check
Time: 2024-01-15T10:30:00.000000
Description: System status check failed: impaired - failed
Metrics: {
"system_status": "impaired",
"details": "failed"
}
Recommendations:
- Check EC2 console for detailed status information
- Review instance logs via Systems Manager
- Consider rebooting the instance if issue persists
- Check for hardware failures
For continuous monitoring, you can set up a cron job or scheduled task:
# Run every 15 minutes
*/15 * * * * cd /path/to/SERVER-SAVER && python server_monitor.py --output reports/$(date +\%Y\%m\%d_\%H\%M\%S).txtCreate a scheduled task to run:
python D:\SERVER-SAVER\server_monitor.py --output reports\report_%date%.txtIf instances don't have SSM agent installed or configured:
- Use
--no-ssmflag to skip SSM-based checks - Install SSM agent on instances: https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent.html
- Ensure IAM role has SSM permissions
- Ensure CloudWatch monitoring is enabled (detailed monitoring recommended)
- Check IAM permissions for CloudWatch
- Use
--no-metricsflag to skip CloudWatch checks
- Verify AWS credentials are configured correctly
- Check IAM permissions
- Ensure the AWS region is correct
The tool logs all activities to server_monitor.log and stdout. Check the log file for detailed debugging information.
0: No failures detected or only non-critical failures1: Critical failures detected
This allows integration with monitoring systems and alerting tools.
See the examples/ directory for:
- Configuration file templates (
config.example.json) - Example scripts you can copy and customize
- Setup guides and best practices
Important: Copy example files to the root directory and update with your values. Your actual config files are protected by .gitignore.
We welcome contributions! See CONTRIBUTING.md for guidelines.
Feel free to extend this tool with additional checks:
- Application-specific health checks
- Database connectivity tests
- Custom CloudWatch alarms
- Integration with notification services (SNS, Slack, etc.)
- Clone the repository
- Copy
examples/config.example.jsontoconfig.json - Update
config.jsonwith your test instance details - Install dependencies:
pip install -r requirements.txt - Set up AWS credentials (see
AWS_CREDENTIALS_SETUP.md) - Start contributing!
This tool is provided as-is for server monitoring and failure identification purposes.