[Feature][High] Add disk usage monitoring and auto-alerting for all GCE instances

## Summary

Multiple GCE instances have reached or are approaching critical disk thresholds with no automated alerting in place. The 2026-03-15 mainnet validator-1 incident (auto-shutdown at 97% disk) and the current testnet-validator-3 situation (96% disk as of 2026-03-17) demonstrate an urgent need for proactive disk monitoring.

## Current State (2026-03-17)

| Instance | Disk | Used | Use% | Status |
|---|---|---|---|---|
| numbers-mainnet-validator-1 | 3.4T | 2.8T | 84% | Warning |
| numbers-mainnet-validator-a1 | 1.9T | 1.1T | 57% | OK |
| numbers-mainnet-validator-a2 | 2.0T | 970G | 49% | OK |
| numbers-testnet-validator-3 | 497G | 476G | 96% | CRITICAL |
| testnet-explorer | 29G | 25G | 84% | Warning |
| mainnet-explorer | 47G | 33G | 72% | OK |

## Proposed Implementation

1. **GCP Cloud Monitoring alerting policies**: Create uptime/disk metric alerts that fire at 80% (warning) and 90% (critical) thresholds
2. **Notification channels**: Configure email and/or Slack notifications for disk alerts
3. **Runbook documentation**: Add a disk cleanup/expansion runbook to the repository covering:
   - How to expand GCE persistent disks (online resize)
   - Avalanchego chain data pruning options
   - Blockscout/explorer database cleanup procedures
4. **Monitoring script**: Add a cron-based disk check script that can be deployed to each instance as a fallback

## Immediate Actions Needed

- `numbers-testnet-validator-3` at **96%** needs immediate disk expansion or cleanup
- `testnet-explorer` and `numbers-mainnet-validator-1` at **84%** should be monitored closely

## Impact

High — without disk monitoring, validators will silently auto-shutdown when disk < 3% free, causing chain downtime and transaction mempool backlog (as seen in the 2026-03-15 incident).

Generated by Health Monitor with [Omni](https://omniai.one/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][High] Add disk usage monitoring and auto-alerting for all GCE instances #138

Summary

Current State (2026-03-17)

Proposed Implementation

Immediate Actions Needed

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Instance	Disk	Used	Use%	Status
numbers-mainnet-validator-1	3.4T	2.8T	84%	Warning
numbers-mainnet-validator-a1	1.9T	1.1T	57%	OK
numbers-mainnet-validator-a2	2.0T	970G	49%	OK
numbers-testnet-validator-3	497G	476G	96%	CRITICAL
testnet-explorer	29G	25G	84%	Warning
mainnet-explorer	47G	33G	72%	OK

[Feature][High] Add disk usage monitoring and auto-alerting for all GCE instances #138

Description

Summary

Current State (2026-03-17)

Proposed Implementation

Immediate Actions Needed

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions