Skip to content

FlushTracker stall: consumer alive but permanently stuck (no progress detection) #3983

@alco

Description

@alco

Parent: #3980

Scenario

A consumer process is alive (Process.alive? returns true) but has stopped making progress on flushing transactions. The FlushTracker tracks the shape, but no handle_flush_notification ever arrives because the consumer is stuck.

How this can happen

  • Deadlock or infinite wait: The consumer is blocked waiting on a resource that will never become available (e.g., a GenServer.call to a dead process without a timeout, a storage backend that hangs on I/O).
  • Infinite loop in event processing: A bug in change handling, move-in processing, or materializer interaction causes the consumer to loop without returning from handle_call.
  • Message queue starvation: The consumer's mailbox is flooded with low-priority messages that are processed before the storage :flushed callback, effectively starving the flush path indefinitely.

Why this is distinct

This scenario cannot be detected by process monitoring (Option A in the parent issue) because the consumer process is alive. Process.alive? returns true, and no :DOWN message is ever sent.

Only a progress-based detection mechanism can catch this — e.g., tracking the last time each shape in FlushTracker.last_flushed advanced and treating shapes that haven't progressed within a timeout as stuck.

Fix

Extend the liveness sweep approach (Option B in #3980) with a staleness timeout: if a shape has been in FlushTracker.last_flushed for longer than N seconds without its last_flushed offset advancing, treat it as stuck and call handle_shape_removed (or trigger a consumer restart).

This is the only scenario that requires timeout-based detection rather than monitor-based detection.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions