fix(eventbus): timer-drain hang + silent-drop observability (#112)#116
Merged
intel352 merged 2 commits intoMay 29, 2026
Merged
Conversation
Design+plan for issue #112. Passed adversarial-design-review (rev 2): 2 Critical + 4 Important found in rev 1, all resolved in rev 2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes three issues in modules/eventbus surfaced in #112: P1 (bug): MemoryEventBus.Publish hung forever in deliveryMode "timeout". The legacy `if !deadline.Stop() { <-deadline.C }` drain blocks on an already-drained timer channel (on Go 1.23+ timer channels are unbuffered). Replaced with a non-blocking, version-safe drain covering both the timer-fired and ctx-cancelled exit branches. P2 (observability): CustomMemoryEventBus dropped events silently. Added deliveredCount/droppedCount + Stats() mirroring MemoryEventBus, and changed EngineRouter.CollectStats/CollectPerEngineStats to dispatch over a new statsProvider interface so the custom engine participates in module-level stats (previously they type-asserted the concrete *MemoryEventBus and ignored every other engine). P3 (observability): Unsubscribe/Stop abandoned buffered events without counting them. Both engines now count events still buffered in a subscriber channel as dropped on handler-goroutine exit (drainSubscription, run via defer so no exit path skips it), plus the dequeued-but-cancelled event. MemoryEventBus also drains the worker pool after wg.Wait in Stop so abandoned async tasks are counted (closes the async conservation gap). CustomMemoryEventBus gained the WaitGroup + finished-channel shutdown synchronization the standard engine already had, so Stop waits for handler goroutines to drain and Stats() is final on return. Contract: at-most-once delivery across teardown (delivered + dropped == enqueued once publishers quiesce). Corrected the false "processes all in-flight events" Stop doc comments in eventbus.go and module.go. Decision recorded in decisions/0001-eventbus-at-most-once-teardown.md. Design + adversarial review: docs/plans/2026-05-28-eventbus-112-fix.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
📋 API Contract Changes Summary✅ No breaking changes detected - only additions and non-breaking modifications Changed Components:Core FrameworkContract diff saved to artifacts/diffs/core.json Module: authContract diff saved to artifacts/diffs/auth.json Module: cacheContract diff saved to artifacts/diffs/cache.json Module: databaseContract diff saved to artifacts/diffs/database.json Module: eventbusContract diff saved to artifacts/diffs/eventbus.json Module: jsonschemaContract diff saved to artifacts/diffs/jsonschema.json Module: letsencryptContract diff saved to artifacts/diffs/letsencrypt.json Module: reverseproxyContract diff saved to artifacts/diffs/reverseproxy.json Artifacts📁 Full contract diffs and JSON artifacts are available in the workflow artifacts. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
intel352
added a commit
that referenced
this pull request
May 29, 2026
…lity (#117) Post-merge retrospective for PR #116 (squash ca4bf71). Covers which gates worked (adversarial review caught C1+C2+I1+I3 pre-code; TDD RED→GREEN; CI race+lint+BDD all green) and carry-forward items (DurableMemoryEventBus stats arity, multi-engine config threading, residual publish-after-drain race). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #112 — one real bug plus two observability gaps in
modules/eventbus.P1 (bug) —
MemoryEventBus.Publishhang indeliveryMode: "timeout"The legacy
if !deadline.Stop() { <-deadline.C }drain blocks the publisher forever once the timer fires (on Go 1.23+ timer channels are unbuffered, so the unconditional re-receive never returns). Replaced with the race-free non-blocking drain, covering both the timer-fired and ctx-cancelled exit branches. Kept inline (notdefer) because the block is inside a per-subscriberforloop.P2 (observability) —
CustomMemoryEventBusdropped silentlyAdded
deliveredCount/droppedCount+Stats()mirroringMemoryEventBus. Also changedEngineRouter.CollectStats/CollectPerEngineStatsto dispatch over a newstatsProviderinterface instead of type-asserting the concrete*MemoryEventBus, so the custom engine (and any future stats-bearing engine) participates inmodule.Stats()/PerEngineStats().DurableMemoryEventBus.Stats()has a single-value return and intentionally does not satisfy the interface (it never drops).P3 (observability) —
Unsubscribe/Stopabandoned buffered events uncountedBoth engines now count events still buffered in a subscriber channel as dropped on handler-goroutine exit (
drainSubscription, run viadeferso no exit path skips it), plus the dequeued-but-cancelled event.MemoryEventBus.Stopadditionally drains the worker pool afterwg.Wait()(race-free: all senders/receivers are wg-tracked and have exited) so abandoned async tasks are counted — closing the async conservation gap.CustomMemoryEventBusgained theWaitGroup+finished-channel shutdown synchronization the standard engine already had, soStopwaits for handlers to drain andStats()is final on return.Contract: at-most-once delivery across teardown —
delivered + dropped == enqueuedonce publishers quiesce. Corrected the false "processes all in-flight events"Stopdoc comments ineventbus.goandmodule.go. Decision recorded indecisions/0001-eventbus-at-most-once-teardown.md.Process
docs/plans/2026-05-28-eventbus-112-fix.md. The async-workerPool conservation gap (C1) and the dequeued-cancelled accounting hole (C2) were both caught at design time.issue112_memory_test.go,issue112_custom_test.go(timer-hang guard, sync + async conservation across Unsubscribe and Stop, custom Stats, router statsProvider wiring).Regression-invariant proof (timer hang, P1)
The conservation tests showed the same RED→GREEN: pre-fix
delivered+droppedwas 1/6 (Unsubscribe) and 2/30 (async Stop — 28 events lost in the worker pool); post-fix both reconcile exactly.Verification
cd modules/eventbus && go test -race ./...→ ok (25.9s)gofmtclean,go vetclean,golangci-lint run→ 0 issuesCloses #112.