fix(handlers): stop flaky coverage CI from global-logger race in parallel tests#603
Closed
AbirAbbas wants to merge 1 commit into
Closed
fix(handlers): stop flaky coverage CI from global-logger race in parallel tests#603AbirAbbas wants to merge 1 commit into
AbirAbbas wants to merge 1 commit into
Conversation
…llel tests
TestDiscoveryLoggingIncludesOptionalRequestID and
TestExecutionCleanupService_StartStopBranches both call
setupExecutionCleanupTestLogger, which swaps the process-global
logger.Logger, while also calling t.Parallel(). When they ran alongside
another parallel test, a sibling reassigned the global logger mid-run, so
the discovery test's captured buffer came back empty and its assertions
("discovery request failed", `"request_id":"req-123"`, `"format":"compact"`)
failed. This broke the `coverage (control-plane)` job and cascaded into
`coverage-summary` (missing control-plane.total.txt).
The non-test build is unaffected, so it slipped past the required gates
and only showed up in the coverage job's full parallel run.
Drop t.Parallel() from both tests so they run in the serial phase, where
nothing else mutates the global logger concurrently -- matching the
existing non-parallel logger-swapping tests in execution_cleanup_test.go.
Verified by running the previously-failing command 20x clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
📊 Coverage gateThresholds from
✅ Gate passedNo surface regressed past the allowed threshold and the aggregate stayed above the floor. |
Contributor
📐 Patch coverage gateThreshold: 80% on lines this PR touches vs
✅ Patch gate passedEvery surface whose lines were touched by this PR has patch coverage at or above the threshold. |
Contributor
Author
|
Folded into #602 — keeping it to a single PR. The coverage flake fix (drop t.Parallel() from the two global-logger-swapping tests) is now the first commit on that branch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
coverage (control-plane)CI job (and the cascadingcoverage-summaryjob) flake because two tests mutate the process-globallogger.Loggerwhile running witht.Parallel():TestDiscoveryLoggingIncludesOptionalRequestIDTestExecutionCleanupService_StartStopBranchesBoth call
setupExecutionCleanupTestLogger, which swapslogger.Loggerand restores it viat.Cleanup. Run in parallel, a sibling test reassigns the global logger mid-run, so the discovery test's captured buffer comes back empty and its assertions fail:coverage-summarythen fails withmissing input: control-plane.total.txtbecause the control-plane coverage step exited non-zero. The non-test build is unaffected, which is why this slipped past the required gates and only surfaced in the coverage job's full parallel run.Fix
Drop
t.Parallel()from both tests so they run in the serial phase, where nothing else mutates the global logger concurrently. This matches the existing convention — the logger-swapping tests inexecution_cleanup_test.goare already non-parallel for the same reason.Verification
go test ./internal/handlers/ -count=20 -run 'TestDiscoveryLoggingIncludesOptionalRequestID|StartStopBranches|TestExecution'— passes (this exact command reproduced the failure before the fix)go test ./internal/handlers/— passesgo vetclean; no newgofmtdrift introduced (the file has pre-existing struct-alignment drift onmain, left untouched to keep this PR scoped)Out of scope (flagged, not fixed here)
Running the package under
-racesurfaces several separate, pre-existing data races in the heartbeat coverage tests (processHeartbeatAsyncasync goroutines writing to test stubs inTestHeartbeatHandler_AdditionalCoverage/TestDiscoveryAndNodeHandlers_AdditionalCoverage). The coverage CI job does not run with-race, so those are not what breaks it — worth a separate follow-up.🤖 Generated with Claude Code