Goal: Complete storage integration and deploy full observability stack
Milestone: User has production-ready monitoring, logging, and persistent storage
Phase 2 must be complete:
- ✓ K3s cluster running with VIP
- ✓ Helm client working
- ✓ Contour and cert-manager deployed
- ✓ OpenBAO and Zot installed
Architecture Decision - S3 Storage: SeaweedFS is deployed as the S3 endpoint for all components (Loki, Velero, etc.). Its backing storage comes from Longhorn PVCs, providing distributed and replicated storage.
This provides a consistent S3 API with high performance, simplifying component configuration.
Working States:
- Storage component deployed (supports Longhorn, NFS, local-path backends)
- StorageClass configured and set as default
- SeaweedFS component for S3-compatible storage
- S3-compatible storage architecture defined (SeaweedFS on Longhorn PVCs)
- Integration tests for PVC provisioning
Key Tasks:
Deploy storage component with Longhorn backend✓ Implemented (unified storage component)Configure StorageClass✓ ImplementedTest PVC provisioning and mounting✓ Integration tests existDeploy SeaweedFS via Helm✓ ImplementedConfigure SeaweedFS with Longhorn-backed PVCs✓ SeaweedFS uses Longhorn StorageClassCreate S3 buckets for various components✓ Helm chart handles bucket creationDocument storage architecture decisions✓ Documented in docs/storage.md
Files:
v1/internal/component/storage/types.go✓ (includes LonghornConfig)v1/internal/component/storage/install.go✓ (implements Longhorn, NFS, local-path)v1/internal/component/storage/*_test.go✓v1/internal/component/seaweedfs/types.go✓v1/internal/component/seaweedfs/install.go✓v1/internal/component/seaweedfs/*_test.go✓v1/cmd/foundry/commands/storage/provision.go✓
Working States:
- Prometheus deployed via kube-prometheus-stack Helm chart
- ServiceMonitors configured for auto-discovery
- Scraping K3s, OpenBAO, Zot metrics
- PVC for Prometheus TSDB
- Retention configured per stack.yaml
Key Tasks:
Deploy kube-prometheus-stack Helm chart✓ ImplementedConfigure Prometheus retention✓ Implemented (configurable retention_days and retention_size)Set up service discovery for✓ Configured with nil selectors for all namespacesfoundry-*namespacesCreate ServiceMonitors for core components (OpenBAO, Zot, etc.)✓ External targets auto-configured via additionalScrapeConfigsConfigure PVC for storage✓ Implemented with configurable storage class and size- Set up alerting rules (basic)
- Test metric collection
Files:
v1/internal/component/prometheus/types.go✓v1/internal/component/prometheus/install.go✓v1/internal/component/prometheus/types_test.go✓ (94% coverage)v1/internal/component/prometheus/install_test.go✓
Working States:
- Loki deployed via loki-stack Helm chart
- Collecting logs from all pods (via Promtail)
- S3-compatible storage for log retention (via SeaweedFS)
- Retention configured per stack.yaml
Key Tasks:
Deploy loki-stack Helm chart✓ Implemented (using grafana/loki chart)Configure S3 backend (SeaweedFS)✓ Implemented with full S3 configSet up retention policies✓ Implemented (configurable retention_days)Configure promtail for log collection✓ Implemented (optional, enabled by default)- Test log ingestion and querying
Files:
v1/internal/component/loki/types.go✓v1/internal/component/loki/install.go✓v1/internal/component/loki/types_test.go✓ (92.7% coverage)v1/internal/component/loki/install_test.go✓
Working States:
- Grafana deployed via Helm
- Prometheus and Loki configured as data sources
- Admin password stored in OpenBAO
- OIDC auth via OpenBAO (optional for now, required in Phase 4)
- Default dashboards provisioned
- Accessible via Ingress
Key Tasks:
Deploy Grafana Helm chart✓ Implemented (grafana/grafana chart v8.8.2)Configure data sources (Prometheus, Loki)✓ Implemented with auto-configurationStore/retrieve admin password from OpenBAO✓foundry-core/grafanawithadmin_passwordandadmin_userkeysCreate Ingress with TLS✓ Implemented via ContourProvision default dashboards (K3s, Prometheus, Loki)✓ Kubernetes cluster + node-exporter dashboards- Test dashboard access
Files:
v1/internal/component/grafana/types.go✓v1/internal/component/grafana/install.go✓ (includes datasources configuration)v1/internal/component/grafana/openbao.go✓ (OpenBAO credential storage/retrieval)v1/internal/component/grafana/types_test.go✓ (92.5% coverage)v1/internal/component/grafana/install_test.go✓v1/internal/component/grafana/openbao_test.go✓ (100% coverage on most functions)
Working States:
- External-DNS deployed via Helm
- Automatically creates DNS records for Ingress resources
- Configured for appropriate DNS provider (or left unconfigured for user setup)
Key Tasks:
Deploy External-DNS Helm chart✓ Implemented (external-dns/external-dns chart v1.15.0)Configure DNS provider✓ Supports PowerDNS, Cloudflare, RFC2136, Route53, Google, AzureStore DNS provider credentials in OpenBAO✓ Per-provider paths underfoundry-core/external-dns/<provider>- Test automatic DNS record creation
Document supported DNS providers✓ Documented in docs/dns.md
Supported Providers:
- PowerDNS (pdns) - with API URL and key → stored at
foundry-core/external-dns/pdns - Cloudflare - with API token, optional proxied mode → stored at
foundry-core/external-dns/cloudflare - RFC2136 - for dynamic DNS updates with TSIG → stored at
foundry-core/external-dns/rfc2136 - AWS Route53, Google Cloud DNS, Azure DNS - pass-through for custom config (use IAM roles/workload identity)
Files:
v1/internal/component/externaldns/types.go✓v1/internal/component/externaldns/install.go✓v1/internal/component/externaldns/openbao.go✓ (OpenBAO credential storage/retrieval for all providers)v1/internal/component/externaldns/types_test.go✓ (93.3% coverage)v1/internal/component/externaldns/install_test.go✓v1/internal/component/externaldns/openbao_test.go✓ (>80% coverage on all functions)
Working States:
- Velero deployed via Helm
- S3 backend configured (SeaweedFS)
- Can create cluster backups (CLI:
foundry backup create) - Can restore from backups (CLI:
foundry backup restore) - Scheduled backup configured (CLI:
foundry backup schedule)
Key Tasks:
Deploy Velero Helm chart✓ Implemented (vmware-tanzu/velero chart v8.0.0)Configure S3-compatible storage backend✓ Supports SeaweedFS and AWS S3Set up backup schedules✓ Configurable cron schedules with TTL- Test backup and restore (requires integration tests)
Create backup CLI commands✓foundry backup create/list/restore/schedule/delete
Supported Features:
- SeaweedFS and AWS S3 providers
- Configurable backup retention (TTL)
- Scheduled backups with cron expressions
- Namespace inclusion/exclusion filters
- CSI volume snapshots (optional)
- ServiceMonitor for Prometheus metrics
- Resource requests/limits configuration
Files:
v1/internal/component/velero/types.go✓v1/internal/component/velero/install.go✓v1/internal/component/velero/types_test.go✓ (94.8% coverage)v1/internal/component/velero/install_test.go✓
Commands Implemented:
# Storage
foundry storage provision <name> --size 10Gi ✓
foundry storage pvc list [--namespace NS] ✓
foundry storage pvc delete <name> ✓
# Backup
foundry backup create [--name NAME] ✓
foundry backup list ✓
foundry backup restore <name> ✓
foundry backup schedule --cron "0 2 * * *" ✓
foundry backup delete <name> ✓
# Observability
foundry dashboard open ✓ # Opens Grafana
foundry dashboard url ✓ # Print Grafana URL
foundry logs <pod> ✓ # Quick log access
foundry metrics <query> ✓ # Query Prometheus
foundry metrics list ✓ # List available metrics
foundry metrics targets ✓ # List scrape targetsFiles:
v1/cmd/foundry/commands/storage/provision.go✓v1/cmd/foundry/commands/storage/pvc.go✓v1/cmd/foundry/commands/storage/provision_test.go✓v1/cmd/foundry/commands/backup/commands.go✓v1/cmd/foundry/commands/backup/client.go✓v1/cmd/foundry/commands/backup/create.go✓v1/cmd/foundry/commands/backup/list.go✓v1/cmd/foundry/commands/backup/restore.go✓v1/cmd/foundry/commands/backup/schedule.go✓v1/cmd/foundry/commands/backup/delete.go✓v1/cmd/foundry/commands/backup/client_test.go✓v1/cmd/foundry/commands/dashboard/commands.go✓v1/cmd/foundry/commands/dashboard/commands_test.go✓v1/cmd/foundry/commands/logs/commands.go✓v1/cmd/foundry/commands/logs/commands_test.go✓v1/cmd/foundry/commands/metrics/commands.go✓v1/cmd/foundry/commands/metrics/commands_test.go✓
Test Scenarios:
- Deploy PVC and mount to pod
- Backup cluster and restore
- Metrics are scraped and visible in Grafana
- Logs are collected and queryable in Loki
- DNS records are created automatically
Files:
v1/test/integration/phase3_storage_test.go✓ (PVC provisioning, SeaweedFS deployment)v1/test/integration/phase3_observability_test.go✓ (Prometheus, Loki, Grafana, full stack)v1/test/integration/phase3_backup_test.go✓ (Velero deployment, backup/restore)v1/test/integration/phase3_helpers_test.go✓ (Shared test utilities)
Documents to Create/Update:
-
docs/storage.md- Full CSI driver setup, PVC usage ✓ -
docs/observability.md- Prometheus, Loki, Grafana usage ✓ -
docs/backups.md- Velero usage, restore procedures ✓ -
docs/dns.md- External-DNS configuration ✓
- All components deploy successfully
- Storage provisioning works
- Metrics are collected and visible
- Logs are collected and queryable
- Grafana dashboards show system health
- Backups can be created and restored
- DNS records are managed automatically
- All tests pass
- Documentation complete
# Install storage with Longhorn backend for distributed block storage
foundry component install storage --backend longhorn
# Install SeaweedFS for S3-compatible storage
foundry component install seaweedfs
# Install observability stack (depends on storage)
foundry component install prometheus
foundry component install loki
foundry component install grafana
foundry component install velero
# Test storage
foundry storage provision test-pvc --size 1Gi
kubectl get pvc
# Test backup
foundry backup create test-backup
foundry backup list
foundry backup restore test-backup
# Access Grafana
foundry dashboard openEstimated Working States: ~20 testable states Estimated LOC: ~3000-4000 lines Timeline: Not time-bound