This document describes the bucket drain functionality and explains the architectural decision to not implement tenant-level drain orchestration.
Bucket drain allows users to automatically delete all objects from an S3 bucket before deleting the bucket resource itself. This is essential because:
- StorageGrid buckets cannot be deleted while they contain objects
- Manual object deletion is impractical for buckets with many objects
- Users need a declarative way to express "delete this bucket and everything in it"
The bucket drain follows a clear state machine with the following phases:
stateDiagram-v2
[*] --> Pending: Bucket Created
Pending --> Ready: Bucket exists in StorageGrid
Ready --> Draining: Annotation added\n(objects > 0)
Draining --> Ready: Drain complete\n(objects == 0)
Draining --> Ready: Annotation removed\n(drain canceled)
Ready --> Deleting: kubectl delete\n(objects == 0)
Draining --> Failed: Drain error
Failed --> Ready: Error resolved
note right of Ready
Normal operation
Bucket available for use
end note
note right of Draining
StorageGrid deleting objects
Operator polling progress
Status tracking active
end note
note right of Deleting
Finalizer cleanup
Removing from StorageGrid
end note
Phase Descriptions:
- Pending: Initial state after bucket creation, waiting for backend confirmation
- Ready: Normal operation, bucket available for object storage
- Draining: StorageGrid actively deleting objects, operator polling progress
- Failed: Error condition, requires intervention
- Deleting: Finalizer processing, removing bucket from StorageGrid
1. Drain Detection (reconcileDrain)
The controller detects when to start, poll, cancel, or complete a drain:
// State machine transitions
case wantsDrain && !isDraining && objectCount > 0:
return r.initiateDrain(ctx, rctx)
case isDraining && wantsDrain && objectCount > 0:
return r.pollDrainProgress(ctx, rctx, backendStatus)
case isDraining && !wantsDrain:
return r.cancelDrain(ctx, rctx)
case isDraining && objectCount == 0:
return r.completeDrain(ctx, rctx)2. Drain Initiation (initiateDrain)
- Calls StorageGrid API to start drain operation
- Sets bucket phase to
Draining - Records initial object count and timestamp
- Computes next poll interval based on configuration
3. Progress Polling (pollDrainProgress)
- Recomputes poll interval every reconciliation (picks up config changes)
- Two-tier polling strategy:
- Initial: 3 minutes (first hour when progress is faster)
- Long-running: 30 minutes (after 1 hour for large buckets)
- Tracks object count changes to detect progress
- Emits warning if stuck (no progress for 3 hours by default)
4. Completion (completeDrain)
- Verifies object count is zero
- Cleans up drain status
- Removes annotation
- Returns phase to
Ready
5. Cancellation (cancelDrain)
- User removes annotation while draining
- Calls StorageGrid API to stop drain
- Cleans up drain status
- Returns phase to
Ready
6. Orphaned Drain Detection
If the operator detects the backend is draining but we have no drain status (e.g., drain initiated outside K8s), the operator cancels it:
if backendStatus.IsDeletingObjects &&
(DrainStatus == nil || DrainStatus.StartedAt == nil) {
return r.cancelOrphanedDrain(ctx, rctx, backendStatus)
}Drain behavior is configurable at three levels with clear precedence:
Precedence: Bucket Spec > StorageGrid Spec > Hardcoded Defaults
flowchart TD
A[Drain Configuration Request] --> B{Bucket Spec\nOverride?}
B -->|Yes| C[Use Bucket drainPollInterval]
B -->|No| D{StorageGrid Spec?}
D -->|Yes| E[Use StorageGrid operations.drain.*]
D -->|No| F[Use Hardcoded Defaults]
C --> G[Apply Configuration]
E --> G
F --> G
Configuration Levels:
-
Bucket-Level Override (Highest Priority)
spec.drainPollInterval: Single interval for all pollingspec.drainStuckThreshold: Custom stuck detection threshold
-
Grid-Level Configuration (Medium Priority)
spec.operations.drain.initialPollInterval: First hour polling intervalspec.operations.drain.longRunningPollInterval: After first hour intervalspec.operations.drain.stuckThreshold: Warning if no progress
-
Hardcoded Defaults (Lowest Priority)
- Initial: 3 minutes
- Long-running: 15 minutes
- Stuck threshold: 3 hours DefaultDrainLongRunningPollInterval = 15 * time.Minute DefaultDrainStuckThreshold = 3 * time.Hour
The drain status provides full observability:
type BucketDrainStatus struct {
StartedAt *metav1.Time // When drain began
IsDeletingObjects bool // Backend drain active
InitialObjectCount int64 // Objects at start
InitialObjectBytes int64 // Bytes at startThe drain status provides full observability into the drain operation:
type BucketDrainStatus struct {
StartedAt *metav1.Time // When drain began
IsDeletingObjects bool // Backend drain active
InitialObjectCount int64 // Objects at start
InitialObjectBytes int64 // Bytes at start
LastCheckedAt *metav1.Time // Last poll time
LastProgressAt *metav1.Time // Last time count decreased
PreviousObjectCount int64 // For progress detection
Message string // Human-readable status
NextPollInterval metav1.Duration // When to poll next
}The controller emits events for all drain state transitions providing real-time observability:
| Event Reason | Type | Description |
|---|---|---|
BucketDrainingStarted |
Normal | Drain initiated, shows initial object count |
BucketDrainingProgress |
Normal | Progress update, objects deleted |
BucketDrainingStuck |
Warning | No progress for threshold period |
BucketDrainingComplete |
Normal | Drain finished successfully |
BucketDrainingCanceled |
Normal | User canceled drain |
BucketOrphanedDrain |
Warning | Detected and canceled drain not initiated by operator |
BucketAlreadyEmpty |
Normal | Drain annotation added but no objects to delete |
// Reconcile just reads the precomputed interval
if bucket.Status.Phase == Draining {
return ctrl.Result{
RequeueAfter: bucket.Status.DrainStatus.NextPollInterval.Duration
}, err
}
// Every drain reconciliation recomputes it
func pollDrainProgress() {
elapsed := time.Since(drainStatus.StartedAt)
nextPollInterval := computeNextPollInterval(ctx, rctx, elapsed)
drainStatus.NextPollInterval = metav1.Duration{Duration: nextPollInterval}
}The initial idea was to implement tenant-level drain that would:
- Fan out drain annotations to all buckets in the tenant
- Wait for all buckets to drain
- Delete buckets when empty
- Handle both K8s bucket resources and external buckets
- Delete tenant when all buckets are gone
Complexity explosion:
- Hybrid K8s/backend reconciliation (buckets may be created/deleted externally)
- State tracking for N buckets in tenant status
- Edge cases: stuck buckets, new buckets created during drain, external buckets
User can already do this simply:
# Drain all buckets for a tenant
kubectl get s3buckets -l tenant=my-tenant -o name | \
xargs -I {} kubectl annotate {} bucket.s3.bedag.ch/force-drain-bucket=true
# Wait for completion
kubectl wait --for=delete s3buckets -l tenant=my-tenant --timeout=24h
# Delete tenant
kubectl delete s3tenant my-tenantThis is more transparent and gives users full control.
- Tenant controller should manage tenants
- Bucket controller should manage buckets
- Tenant orchestrating bucket operations creates tight coupling
- Blurs architectural boundaries
- If we ignore external buckets: feature is incomplete/broken
- If we handle external buckets: we bypass our own abstractions
- Mixed environments create confusion
- Better: users clean up external buckets manually (they created them outside K8s anyway)
Survey of comparable operators:
- AWS Controllers (ACK): No cascading drain operations
- Crossplane: Each resource independent, no orchestration
- CloudNativePG: Only drains single cluster resource (not "all databases")
- Redis Operator: No cascading operations
Pattern: Operators focus on infrastructure lifecycle, not data lifecycle orchestration.
Deleting production tenants is:
- Rare: Not a frequent operation
- High-stakes: Requires approval workflows
- Needs audit: Manual checkpoints and verification
- Requires rollback: May need to abort mid-process
A magic annotation doesn't fit these requirements. Production needs:
- Scripts with approval gates
- Manual verification checkpoints
- Backup verification before deletion
- Clear audit trail
"You Aren't Gonna Need It"
Every feature has a cost:
- Code complexity and maintenance
- Test coverage requirements
- Documentation burden
- Bug surface area
Only add features when:
- ✅ Users are actively requesting it
- ✅ Manual approach is truly painful
- ✅ Operator is mature enough for complexity
Tenant drain fails all three criteria.
The operator provides clear error messages and guidance when tenant deletion fails due to existing buckets, directing users to drain buckets first using the bucket drain annotation.
We will not implement tenant drain orchestration.
Focus areas:
- Rock-solid bucket drain implementation
- Comprehensive documentation in project README
- Clear error messages and webhook guidance
- Operator stability and maturity
If users require tenant drain automation after production usage, we'll have better context for requirements and can build it appropriately.
The operator stores NextPollInterval in status but recomputes it on every reconciliation. This hybrid approach:
- ✅ Keeps Reconcile() simple (one-line requeue)
- ✅ Picks up configuration changes during active drains
- ✅ Implements two-tier polling (fast initially, slower for large buckets)
// Reconcile just reads the precomputed interval
if bucket.Status.Phase == Draining {
return ctrl.Result{
RequeueAfter: bucket.Status.DrainStatus.NextPollInterval.Duration
}, err
}
// Every drain reconciliation recomputes it
func pollDrainProgress() {
elapsed := time.Since(drainStatus.StartedAt)
nextPollInterval := computeNextPollInterval(ctx, rctx, elapsed)
drainStatus.NextPollInterval = metav1.Duration{Duration: nextPollInterval}
}The drain reconciliation uses a clean state machine pattern with explicit transitions:
func (r *S3BucketReconciler) reconcileDrain(ctx context.Context, rctx *reconcileContext) (ctrl.Result, error) {
wantsDrain := rctx.Bucket.HasDrainAnnotation()
isDraining := rctx.Bucket.Status.Phase == s3v1alpha1.BucketPhaseDraining
objectCount := rctx.Bucket.Status.ObjectCount
switch {
case wantsDrain && !isDraining && objectCount > 0:
return r.initiateDrain(ctx, rctx)
case isDraining && wantsDrain && objectCount > 0:
return r.pollDrainProgress(ctx, rctx, backendStatus)
case isDraining && !wantsDrain:
return r.cancelDrain(ctx, rctx)
case isDraining && objectCount == 0:
return r.completeDrain(ctx, rctx)
}
}Each state has a single-responsibility function handling that specific transition.