Skip to content

Decouple handler fanout/scheduling to support per-handler backoff (vs event-level backoff_until) #79

@dillonstreator

Description

@dillonstreator

Problem

Today txob tracks per-handler completion state in handler_results, but retry/backoff is event-scoped via a single events.backoff_until timestamp.

In src/processor.ts, when any handler errors the processor computes a backoff list:

  • per-handler custom backoff via TxOBError.backoffUntil
  • plus the default backoff(event.errors)

…and then sets:

  • lockedEvent.backoff_until = max(backoffs)

Because this is a single column, one slow/rate-limited handler can delay reprocessing for all other remaining handlers (even if those other handlers could run sooner with a different backoff policy).

We want to decouple handler processing / fanout scheduling so we can configure:

  • handler-specific backoff strategies (e.g. webhook handler vs email handler)
  • handler-specific retry counters / max errors
  • (optionally) handler-specific concurrency/priority in the future

Current behavior (references)

  • TxOBError explicitly documents that the processor uses the latest (maximum) backoff among handlers: src/error.ts.
  • processEvent() collects backoffs and sets event-level backoff_until: src/processor.ts.
  • The canonical schema is a single events table with handler_results JSONB + backoff_until TIMESTAMPTZ: README.md.

Goals / success criteria

  • Per-handler backoff without forcing unrelated handlers to wait.
  • Keep at-least-once semantics and existing handler idempotency story.
  • Preserve good query performance (indexable “due work” query).
  • Prefer additive/backwards-compatible migration paths where possible.

Design options

Option A (recommended): Separate table for handler work items

Introduce a new table that materializes handler fanout and scheduling:

  • event_handlers (or event_handler_jobs)
    • event_id (FK)
    • event_type
    • handler_name
    • status (pending|processed|unprocessable)
    • attempts (or errors)
    • backoff_until TIMESTAMPTZ NULL
    • processed_at TIMESTAMPTZ NULL
    • unprocessable_at TIMESTAMPTZ NULL
    • last_error (optional) / error_history (optional)
    • timestamps

Processing model:

  • Poll/query due handler rows:
    • processed_at IS NULL
    • unprocessable_at IS NULL
    • backoff_until IS NULL OR backoff_until < now()
    • attempts < maxAttempts(handler)
  • Lock row FOR UPDATE SKIP LOCKED and execute one handler.
  • Update only that handler row (its backoff, attempts, status).
  • Mark the parent events.processed_at when all handler rows are done (processed or unprocessable) OR when a policy says to stop.

Indexes:

  • (processed_at, unprocessable_at, backoff_until, attempts) with a partial index where processed_at IS NULL AND unprocessable_at IS NULL.

Pros:

  • True per-handler scheduling/backoff with an index-friendly due-work query.
  • Clean foundation for future features (priorities, per-handler concurrency, dead-lettering per handler).

Cons:

  • Requires schema changes + migration story.
  • Requires defining how/when to create handler rows (on enqueue vs on first processing attempt).

Option B: Keep single events table, add per-handler scheduling inside handler_results

Store backoff_until, attempts, etc. per handler in the JSONB.

Processing model:

  • When an event is locked, run only handlers whose JSONB indicates they are due.
  • Compute event-level “next wakeup” as the minimum next handler backoff (so the event row remains queryable by a single timestamp).

Pros:

  • No extra table.

Cons:

  • Hard to query/index “events with at least one handler due” without expensive JSONB scans.
  • Hard to evolve cleanly; JSON shape becomes part of the storage contract.

Option C: Split by handler into separate events (“fanout events”)

When processing an event, create child events like UserCreated.sendWelcomeEmail.

Pros:

  • Reuses existing event queueing/backoff.

Cons:

  • Amplifies event volume; complicates correlation/observability.
  • Harder to treat the original event as “done” only when all children are done.

Open questions

  • When to materialize handler rows?
    • On insert (requires knowing handler map at enqueue time) vs on first processing (requires deriving from handler map at runtime).
    • Potential approach: materialize on first lock of the parent event.
  • Where do handler-specific configs live?
    • API: handlerMap value could become { handler, backoff?, maxErrors?, ... }.
    • Back-compat: accept bare function as today.
  • How do event-level errors / maxErrors change?
    • With per-handler attempts, events.errors may become less meaningful; it could become “processor-level attempts” or be deprecated.

Proposed next steps

  • Spike Option A with a minimal Postgres client implementation + migration SQL.
  • Decide API shape for handler-specific policy (backoff/maxErrors).
  • Add docs for migration from JSONB-only tracking.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions