Drain Operations Architecture

Overview

This document describes the bucket drain functionality and explains the architectural decision to not implement tenant-level drain orchestration.

Bucket Drain

Purpose

Bucket drain allows users to automatically delete all objects from an S3 bucket before deleting the bucket resource itself. This is essential because:

StorageGrid buckets cannot be deleted while they contain objects
Manual object deletion is impractical for buckets with many objects
Users need a declarative way to express "delete this bucket and everything in it"

State Machine

The bucket drain follows a clear state machine with the following phases:

stateDiagram-v2
    [*] --> Pending: Bucket Created
    Pending --> Ready: Bucket exists in StorageGrid
    Ready --> Draining: Annotation added\n(objects > 0)
    Draining --> Ready: Drain complete\n(objects == 0)
    Draining --> Ready: Annotation removed\n(drain canceled)
    Ready --> Deleting: kubectl delete\n(objects == 0)
    Draining --> Failed: Drain error
    Failed --> Ready: Error resolved
    
    note right of Ready
        Normal operation
        Bucket available for use
    end note
    
    note right of Draining
        StorageGrid deleting objects
        Operator polling progress
        Status tracking active
    end note
    
    note right of Deleting
        Finalizer cleanup
        Removing from StorageGrid
    end note

Phase Descriptions:

Pending: Initial state after bucket creation, waiting for backend confirmation
Ready: Normal operation, bucket available for object storage
Draining: StorageGrid actively deleting objects, operator polling progress
Failed: Error condition, requires intervention
Deleting: Finalizer processing, removing bucket from StorageGrid

Implementation Details

1. Drain Detection (reconcileDrain)

The controller detects when to start, poll, cancel, or complete a drain:

// State machine transitions
case wantsDrain && !isDraining && objectCount > 0:
    return r.initiateDrain(ctx, rctx)
case isDraining && wantsDrain && objectCount > 0:
    return r.pollDrainProgress(ctx, rctx, backendStatus)
case isDraining && !wantsDrain:
    return r.cancelDrain(ctx, rctx)
case isDraining && objectCount == 0:
    return r.completeDrain(ctx, rctx)

2. Drain Initiation (initiateDrain)

Calls StorageGrid API to start drain operation
Sets bucket phase to Draining
Records initial object count and timestamp
Computes next poll interval based on configuration

3. Progress Polling (pollDrainProgress)

Recomputes poll interval every reconciliation (picks up config changes)
Two-tier polling strategy:
- Initial: 3 minutes (first hour when progress is faster)
- Long-running: 30 minutes (after 1 hour for large buckets)
Tracks object count changes to detect progress
Emits warning if stuck (no progress for 3 hours by default)

4. Completion (completeDrain)

Verifies object count is zero
Cleans up drain status
Removes annotation
Returns phase to Ready

5. Cancellation (cancelDrain)

User removes annotation while draining
Calls StorageGrid API to stop drain
Cleans up drain status
Returns phase to Ready

6. Orphaned Drain Detection

If the operator detects the backend is draining but we have no drain status (e.g., drain initiated outside K8s), the operator cancels it:

if backendStatus.IsDeletingObjects && 
   (DrainStatus == nil || DrainStatus.StartedAt == nil) {
    return r.cancelOrphanedDrain(ctx, rctx, backendStatus)
}

Configuration

Drain behavior is configurable at three levels with clear precedence:

Precedence: Bucket Spec > StorageGrid Spec > Hardcoded Defaults

flowchart TD
    A[Drain Configuration Request] --> B{Bucket Spec\nOverride?}
    B -->|Yes| C[Use Bucket drainPollInterval]
    B -->|No| D{StorageGrid Spec?}
    D -->|Yes| E[Use StorageGrid operations.drain.*]
    D -->|No| F[Use Hardcoded Defaults]
    
    C --> G[Apply Configuration]
    E --> G
    F --> G

Configuration Levels:

Bucket-Level Override (Highest Priority)
- spec.drainPollInterval: Single interval for all polling
- spec.drainStuckThreshold: Custom stuck detection threshold
Grid-Level Configuration (Medium Priority)
- spec.operations.drain.initialPollInterval: First hour polling interval
- spec.operations.drain.longRunningPollInterval: After first hour interval
- spec.operations.drain.stuckThreshold: Warning if no progress
Hardcoded Defaults (Lowest Priority)
- Initial: 3 minutes
- Long-running: 15 minutes
- Stuck threshold: 3 hours DefaultDrainLongRunningPollInterval = 15 * time.Minute DefaultDrainStuckThreshold = 3 * time.Hour

Status Tracking

The drain status provides full observability:

type BucketDrainStatus struct {
    StartedAt           *metav1.Time       // When drain began
    IsDeletingObjects   bool               // Backend drain active
    InitialObjectCount  int64              // Objects at start
    InitialObjectBytes  int64              // Bytes at start

Status Tracking

The drain status provides full observability into the drain operation:

type BucketDrainStatus struct {
    StartedAt           *metav1.Time       // When drain began
    IsDeletingObjects   bool               // Backend drain active
    InitialObjectCount  int64              // Objects at start
    InitialObjectBytes  int64              // Bytes at start
    LastCheckedAt       *metav1.Time       // Last poll time
    LastProgressAt      *metav1.Time       // Last time count decreased
    PreviousObjectCount int64              // For progress detection
    Message             string             // Human-readable status
    NextPollInterval    metav1.Duration    // When to poll next
}

Events

The controller emits events for all drain state transitions providing real-time observability:

Event Reason	Type	Description
`BucketDrainingStarted`	Normal	Drain initiated, shows initial object count
`BucketDrainingProgress`	Normal	Progress update, objects deleted
`BucketDrainingStuck`	Warning	No progress for threshold period
`BucketDrainingComplete`	Normal	Drain finished successfully
`BucketDrainingCanceled`	Normal	User canceled drain
`BucketOrphanedDrain`	Warning	Detected and canceled drain not initiated by operator
`BucketAlreadyEmpty`	Normal	Drain annotation added but no objects to delete

// Reconcile just reads the precomputed interval
if bucket.Status.Phase == Draining {
    return ctrl.Result{
        RequeueAfter: bucket.Status.DrainStatus.NextPollInterval.Duration
    }, err
}

// Every drain reconciliation recomputes it
func pollDrainProgress() {
    elapsed := time.Since(drainStatus.StartedAt)
    nextPollInterval := computeNextPollInterval(ctx, rctx, elapsed)
    drainStatus.NextPollInterval = metav1.Duration{Duration: nextPollInterval}
}

Tenant Drain: Not Implemented

The Proposal

The initial idea was to implement tenant-level drain that would:

Fan out drain annotations to all buckets in the tenant
Wait for all buckets to drain
Delete buckets when empty
Handle both K8s bucket resources and external buckets
Delete tenant when all buckets are gone

Why We Decided Not To Build This

1. Massive Complexity for Marginal Benefit

Complexity explosion:

Hybrid K8s/backend reconciliation (buckets may be created/deleted externally)
State tracking for N buckets in tenant status
Edge cases: stuck buckets, new buckets created during drain, external buckets

User can already do this simply:

# Drain all buckets for a tenant
kubectl get s3buckets -l tenant=my-tenant -o name | \
  xargs -I {} kubectl annotate {} bucket.s3.bedag.ch/force-drain-bucket=true

# Wait for completion
kubectl wait --for=delete s3buckets -l tenant=my-tenant --timeout=24h

# Delete tenant
kubectl delete s3tenant my-tenant

This is more transparent and gives users full control.

2. Violates Single Responsibility Principle

Tenant controller should manage tenants
Bucket controller should manage buckets
Tenant orchestrating bucket operations creates tight coupling
Blurs architectural boundaries

3. External Buckets Problem Is Unsolvable Cleanly

If we ignore external buckets: feature is incomplete/broken
If we handle external buckets: we bypass our own abstractions
Mixed environments create confusion
Better: users clean up external buckets manually (they created them outside K8s anyway)

4. Industry Precedent: Operators Don't Do This

Survey of comparable operators:

AWS Controllers (ACK): No cascading drain operations
Crossplane: Each resource independent, no orchestration
CloudNativePG: Only drains single cluster resource (not "all databases")
Redis Operator: No cascading operations

Pattern: Operators focus on infrastructure lifecycle, not data lifecycle orchestration.

5. Production Reality

Deleting production tenants is:

Rare: Not a frequent operation
High-stakes: Requires approval workflows
Needs audit: Manual checkpoints and verification
Requires rollback: May need to abort mid-process

A magic annotation doesn't fit these requirements. Production needs:

Scripts with approval gates
Manual verification checkpoints
Backup verification before deletion
Clear audit trail

The YAGNI Principle

"You Aren't Gonna Need It"

Every feature has a cost:

Code complexity and maintenance
Test coverage requirements
Documentation burden
Bug surface area

Only add features when:

✅ Users are actively requesting it
✅ Manual approach is truly painful
✅ Operator is mature enough for complexity

Tenant drain fails all three criteria.

What We Provide Instead

The operator provides clear error messages and guidance when tenant deletion fails due to existing buckets, directing users to drain buckets first using the bucket drain annotation.

Decision

We will not implement tenant drain orchestration.

Focus areas:

Rock-solid bucket drain implementation
Comprehensive documentation in project README
Clear error messages and webhook guidance
Operator stability and maturity

If users require tenant drain automation after production usage, we'll have better context for requirements and can build it appropriately.

Architecture Patterns

Hybrid Polling Strategy

The operator stores NextPollInterval in status but recomputes it on every reconciliation. This hybrid approach:

✅ Keeps Reconcile() simple (one-line requeue)
✅ Picks up configuration changes during active drains
✅ Implements two-tier polling (fast initially, slower for large buckets)

// Reconcile just reads the precomputed interval
if bucket.Status.Phase == Draining {
    return ctrl.Result{
        RequeueAfter: bucket.Status.DrainStatus.NextPollInterval.Duration
    }, err
}

// Every drain reconciliation recomputes it
func pollDrainProgress() {
    elapsed := time.Since(drainStatus.StartedAt)
    nextPollInterval := computeNextPollInterval(ctx, rctx, elapsed)
    drainStatus.NextPollInterval = metav1.Duration{Duration: nextPollInterval}
}

State Machine Pattern

The drain reconciliation uses a clean state machine pattern with explicit transitions:

func (r *S3BucketReconciler) reconcileDrain(ctx context.Context, rctx *reconcileContext) (ctrl.Result, error) {
    wantsDrain := rctx.Bucket.HasDrainAnnotation()
    isDraining := rctx.Bucket.Status.Phase == s3v1alpha1.BucketPhaseDraining
    objectCount := rctx.Bucket.Status.ObjectCount
    
    switch {
    case wantsDrain && !isDraining && objectCount > 0:
        return r.initiateDrain(ctx, rctx)
    case isDraining && wantsDrain && objectCount > 0:
        return r.pollDrainProgress(ctx, rctx, backendStatus)
    case isDraining && !wantsDrain:
        return r.cancelDrain(ctx, rctx)
    case isDraining && objectCount == 0:
        return r.completeDrain(ctx, rctx)
    }
}

Each state has a single-responsibility function handling that specific transition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drain Operations Architecture

Overview

Bucket Drain

Purpose

State Machine

Implementation Details

Configuration

Status Tracking

Status Tracking

Events

Tenant Drain: Not Implemented

The Proposal

Why We Decided Not To Build This

1. Massive Complexity for Marginal Benefit

2. Violates Single Responsibility Principle

3. External Buckets Problem Is Unsolvable Cleanly

4. Industry Precedent: Operators Don't Do This

5. Production Reality

The YAGNI Principle

What We Provide Instead

Decision

Architecture Patterns

Hybrid Polling Strategy

State Machine Pattern

References

FilesExpand file tree

drain-operations.md

Latest commit

History

drain-operations.md

File metadata and controls

Drain Operations Architecture

Overview

Bucket Drain

Purpose

State Machine

Implementation Details

Configuration

Status Tracking

Status Tracking

Events

Tenant Drain: Not Implemented

The Proposal

Why We Decided Not To Build This

1. Massive Complexity for Marginal Benefit

2. Violates Single Responsibility Principle

3. External Buckets Problem Is Unsolvable Cleanly

4. Industry Precedent: Operators Don't Do This

5. Production Reality

The YAGNI Principle

What We Provide Instead

Decision

Architecture Patterns

Hybrid Polling Strategy

State Machine Pattern

References