fix(timewheel): use absolute target sequence in addCmd#6
Merged
Conversation
The producer used to precompute (pos, rounds) from cursor.Load() and write the result to addCh. If the producer was descheduled between the cursor read and the channel send — common under -race — the handler could drain the cmd long after cursor had advanced past that slot, leaving the entry stranded for a full wheel cycle (65.5s with the default 64K capacity / 1ms tick), which exceeded the stress-test deadline and surfaced as "fired N-1 of N". Carry the absolute targetSeq instead. The handler resolves the slot at drain time against the up-to-date seq via a new placeAdd helper, and fires immediately when targetSeq <= seq (i.e. the deadline has already passed). placeAdd is shared by the pre- and post-slot drain loops, removing the duplicated slot-append logic in Handle. Stress tests now pass 8/8 under -race (previously ~2/3 failure rate on master). Two unit regression tests cover the past-deadline and past-deadline-but-cancelled paths directly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #5.
Root cause
add()used to precompute(pos, rounds)fromcursor.Load()and write the result toaddCh. If the producer goroutine was descheduled between the cursor read and the channel send — common under-race, where the detector measurably slows producer goroutines — the handler could drain the cmd long aftercursorhad advanced past that slot. The entry would then be appended to a slot the cursor had already passed, stranding it for a full wheel cycle (65.5s with the default 64K-slot / 1ms-tick configuration). The stress test's 10s deadline elapsed first, surfacing asfired N-1 of N.Fix
Carry the absolute
targetSeqinaddCmdinstead of pre-resolving the slot. The handler computes(pos, rounds)against the up-to-dateseqat drain time, and fires immediately whentargetSeq <= seq(deadline already passed). The pre- and post-slot drain loops both delegate to a newplaceAddhelper, which also folds in the cancelled-set check that previously lived in two places.Test plan
placeAddcovering past-deadline-fire and past-deadline-but-cancelled-drop-race(go test -race ./...)TestStress_TimeWheel_Concurrent_4Writerspasses 8/8 under-racelocally; previously ~2/3 failure rate on master