This document describes the event recording implementation in the StorageGrid Operator and the design decisions behind it.
The operator emits Kubernetes events to provide observability into controller operations and state changes. Events are recorded for significant operations, errors, and state transitions across all five controllers.
Event constant definitions: internal/controller/events.go
We chose to implement events similar to how Kubernetes handles PersistentVolumes and PersistentVolumeClaims - each controller emits its own events without propagating them across related resources.
What this means:
S3TenantAccountemits events about backend tenant operationsS3Tenantemits events about account binding and configuration- Events are visible on the specific resource where the operation occurred
Alternatives considered:
-
Cross-resource propagation: Bubble events from S3TenantAccount up to S3Tenant
- Rejected: Adds complexity and event noise
- Rejected: Obscures which component is responsible for what
-
Single event stream: All events on one resource
- Rejected: Violates separation of concerns
- Rejected: Makes troubleshooting harder
-
No events: Rely solely on logs and conditions
- Rejected: Events provide user-visible timeline without log access
- Rejected: Standard Kubernetes pattern expected by operators
Events are emitted immediately when operations occur using the emitEvent() method.
Implementation:
// Emit event immediately when operation occurs
r.emitEvent(rctx, corev1.EventTypeWarning, EventTenantCreateFailed,
fmt.Sprintf("Failed to create tenant: %v", err))Why immediate instead of batching:
- Simpler implementation - no queue management needed
- Events appear in real-time as operations occur
- Same number of API calls either way (no batching API in Kubernetes)
- Events naturally reflect the operation timeline
Alternatives considered:
-
End-of-reconciliation batching: Queue and emit all at once
- Rejected: No actual batching in Kubernetes event API (same number of calls)
- Rejected: Added complexity with queue management for no benefit
- Rejected: Delayed visibility of events until reconciliation completes
-
Conditional emission with queuing: Complex state tracking
- Rejected: Over-engineered for the problem
- Rejected: Kubernetes already deduplicates similar events
Events are emitted only when state actually changes, not on every reconciliation.
Examples:
- ✅ Emit when tenant transitions from creating → ready
- ✅ Emit when grid transitions from unhealthy → healthy
- ✅ Emit when credentials are rotated (user action)
- ❌ Don't emit "BucketReady" on every successful reconciliation
- ❌ Don't emit "GridHealthy" when grid remains healthy
Why state-change only:
- Prevents event spam in steady-state
- Follows standard Kubernetes controller patterns
- Events remain useful signals, not noise
Examples not available:
EventBucketReady- fired every reconciliation when bucket was readyEventGridHealthy- fired every health check when grid remained healthy (keptEventGridHealthRecoveredfor transitions)EventAllowlistSyncCompleted- fired every sync regardless of changes
Periodic events (kept):
EventGatewayStatusRefreshed- fires only whenspec.refreshIntervalelapses, not every reconciliation
Each controller follows the same pattern:
type XReconciler struct {
client.Client
Scheme *runtime.Scheme
Recorder record.EventRecorder // Injected in main.go
}
type xReconcileContext struct {
Resource *v1alpha1.X
}
func (r *XReconciler) emitEvent(rctx *xReconcileContext, eventType, reason, message string) {
r.Recorder.Event(rctx.Resource, eventType, reason, message)
}Event recorders are created and injected in cmd/main.go:
eventRecorder := mgr.GetEventRecorderFor("s3bucket-controller")
if err = (&controller.S3BucketReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
Recorder: eventRecorder,
}).SetupWithManager(mgr); err != nil {
// handle error
}Each controller has its own recorder with a distinct name for event source identification.
Controllers require permissions to create and patch events:
# +kubebuilder:rbac:groups="",resources=events,verbs=create;patchGenerated RBAC manifests include these permissions automatically.
Events are organized by controller and operation type. See internal/controller/events.go for the complete list (64 unique event reasons).
S3 Endpoint Access (S3Bucket controller):
EventS3EndpointConnectionFailed: Cannot reach S3 loadbalancerEventS3EndpointConnectionEstablished: S3 endpoint accessible
These events are critical for diagnosing network connectivity issues between the operator and StorageGrid's S3 loadbalancer endpoints. If bucket policy operations are failing, check for these events.
Backend Connection (S3TenantAccount controller):
EventBackendConnectionFailed: Cannot reach StorageGrid management APIEventBackendConnectionRestored: Connection to management API restored
Health Monitoring (StorageGrid controller):
EventGridUnhealthy: Grid has too many unavailable nodesEventGridHealthRecovered: Grid has recovered from unhealthy state
Events marking significant state changes:
- Lifecycle: Creating → Created → Deleting → Deleted
- Health: Unhealthy → Recovered
- Quota: Normal → Warning → Exceeded → Normal
Warning/error events for failed operations:
*Failedevents indicate operation failures- Include error details in message for debugging
- Always paired with condition updates
-
Define constant in
internal/controller/events.go:const EventNewOperation = "NewOperation"
-
Emit during reconciliation:
r.emitEvent(rctx, corev1.EventTypeNormal, EventNewOperation, fmt.Sprintf("Operation completed: %s", details))
-
Only emit on state changes:
if previousState != currentState { r.emitEvent(rctx, ...) }
Events are visible on the resource where they occurred. Use standard kubectl commands or Kubernetes dashboard to view them.
Event emission can be verified in controller tests by inspecting the recorded events on the fake event recorder. See existing controller tests for examples.