fix: Resolve pending shapes in FlushTracker when consumer process receives commit fragment with no relevant changes#4064
fix: Resolve pending shapes in FlushTracker when consumer process receives commit fragment with no relevant changes#4064
Conversation
Assert that the pattern matches on maps cover all map keys
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4064 +/- ##
=======================================
Coverage 88.67% 88.67%
=======================================
Files 25 25
Lines 2438 2438
Branches 615 611 -4
=======================================
Hits 2162 2162
Misses 274 274
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Non-commit fragments no longer register shapes in FlushTracker. The Consumer will defer flush notifications until the commit fragment is processed, so early registration is no longer needed. - FlushTracker.handle_txn_fragment now takes 3 args (removed unused shapes_with_changes parameter) - SLC's dropped-fragment path only calls FlushTracker for commit fragments - Remove misleading FlushTracker tests that claimed to test cross-fragment tracking (that's EventRouter's responsibility, already tested there) - Update handle_txn test helper for arity 3 Refs: #4063
When a {Storage, :flushed, offset} message arrives while a
multi-fragment
transaction is pending, the Consumer now saves the offset instead of
immediately notifying the ShapeLogCollector. After the commit fragment
populates txn_offset_mapping, the deferred offset is aligned and sent
as a single notification.
This fixes the race condition where the consumer sent an unaligned
flush offset to FlushTracker because txn_offset_mapping was empty
at the time of the storage flush.
Tests now verify that flush notifications are deferred during pending transactions and sent only after the commit fragment is processed. Also fix a compilation error in Consumer (cannot call remote function in pattern match) and add a non-commit fragment clause to FlushTracker's handle_txn_fragment/4. Refs: #4063
4f143f8 to
7b56ab0
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
87d9295 to
7cd24ee
Compare
Claude Code ReviewSummaryFixes a real production bug (#4063) where the FlushTracker could get permanently stuck when a storage buffer-size flush fires during multi-fragment transaction processing. The approach of deferring flush notifications at the Consumer level and simplifying FlushTracker to commit-only tracking is clean and well-tested. What's Working Well
Issues FoundCritical (Must Fix)None. Important (Should Fix)None. Suggestions (Nice to Have)
Issue ConformanceIssue #4063 is well-specified with a precise root-cause analysis, exact race sequence, production state dumps, and three proposed fix directions. The PR implements Option 1 (deferred flush notification), the cleanest approach. Implementation fully addresses the race condition. No scope creep. Previous Review Status
Review iteration: 2 | 2026-03-26 |
Summary
Fix #4063
When a
{Storage, :flushed, offset}message arrives at a Consumer process while it's in the middle of processing a multi-fragment transaction, the flush notification could be lost. This caused the FlushTracker in ShapeLogCollector to get stuck waiting for a flush that was already completed.Changes
Consumer (
consumer.ex,consumer/state.ex):pending_flush_offsetFlushTracker (
flush_tracker.ex):shapes_with_changesparameter fromhandle_txn_fragment/3— all affected shapes are tracked uniformly at commit timeShapeLogCollector (
shape_log_collector.ex):FlushTracker.handle_txn_fragment/3on commit fragmentsshapes_with_changescomputation that is no longer neededTest plan