Skip to content

add cancellation watermark to close at-least-once gap on cancelled tasks#21

Merged
lesnik512 merged 1 commit intomainfrom
cancellation-watermark
May 3, 2026
Merged

add cancellation watermark to close at-least-once gap on cancelled tasks#21
lesnik512 merged 1 commit intomainfrom
cancellation-watermark

Conversation

@lesnik512
Copy link
Copy Markdown
Member

Today _extract_ready_prefixes drops a cancelled task and everything after from per-partition pending state, and _map_offsets_per_partition stops the offset advance at the cancellation. That keeps cancelled-and-after offsets redeliverable on restart — but only because cancellation never actually occurs mid-stream today (it's gated by shutdown, after which no new tasks are absorbed).

If a future change ever allowed mid-stream cancellation, a new task arriving for the same partition with a higher offset would slip past the boundary: the cancelled task is gone from pending, _map_offsets_per_partition has no memory of the cancellation, and the new task's offset would be committed, silently skipping the cancelled-and-after window.

This change adds a per-partition cancellation watermark on the committer. When _map_offsets_per_partition sees a cancelled task at offset N for partition P, it records watermarks[P] = N (keeping the earliest if multiple batches see cancellations). On every subsequent batch the watermark blocks that partition from advancing — the partition's pending still drains for task_done() balance, but no commit is issued for it until the watermark is cleared. The rebalance listener clears the watermark for revoked partitions after commit_all() runs, so the next assignment starts fresh.

Trace: tasks 9 (✓), 10 (✗), 11 (✓), 12 (✓) all in pending → first commit produces {tp: 10}, sets wm[tp] = 10. Second batch sees task 13 (✓), but 13+1 > 10, so {tp} is dropped. On restart, fetch from 10 — re-process 10, 11, 12, 13. At-least-once preserved.

Today _extract_ready_prefixes drops a cancelled task and everything after
from per-partition pending state, and _map_offsets_per_partition stops the
offset advance at the cancellation. That keeps cancelled-and-after offsets
redeliverable on restart — but only because cancellation never actually
occurs mid-stream today (it's gated by shutdown, after which no new tasks
are absorbed).

If a future change ever allowed mid-stream cancellation, a new task arriving
for the same partition with a higher offset would slip past the boundary:
the cancelled task is gone from pending, _map_offsets_per_partition has no
memory of the cancellation, and the new task's offset would be committed,
silently skipping the cancelled-and-after window.

This change adds a per-partition cancellation watermark on the committer.
When _map_offsets_per_partition sees a cancelled task at offset N for
partition P, it records watermarks[P] = N (keeping the earliest if multiple
batches see cancellations). On every subsequent batch the watermark blocks
that partition from advancing — the partition's pending still drains for
task_done() balance, but no commit is issued for it until the watermark is
cleared. The rebalance listener clears the watermark for revoked partitions
after commit_all() runs, so the next assignment starts fresh.

Trace: tasks 9 (✓), 10 (✗), 11 (✓), 12 (✓) all in pending → first commit
produces {tp: 10}, sets wm[tp] = 10. Second batch sees task 13 (✓), but
13+1 > 10, so {tp} is dropped. On restart, fetch from 10 — re-process 10,
11, 12, 13. At-least-once preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@lesnik512 lesnik512 self-assigned this May 3, 2026
@lesnik512 lesnik512 merged commit 7595ee5 into main May 3, 2026
5 checks passed
@lesnik512 lesnik512 deleted the cancellation-watermark branch May 3, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant