Skip to content

Get global retry coordinator only inside test#715

Merged
chombium merged 1 commit intocloudfoundry:mainfrom
jorbaum:fix-http-retry-test-race-condition
Jan 7, 2026
Merged

Get global retry coordinator only inside test#715
chombium merged 1 commit intocloudfoundry:mainfrom
jorbaum:fix-http-retry-test-race-condition

Conversation

@jorbaum
Copy link
Copy Markdown
Contributor

@jorbaum jorbaum commented Jan 5, 2026

Description

  • Gets global retry coordinator only inside test to make test pass reliably
  • Also adds a coordinator.Release() to the second retrier, which does not change the test outcome but looks wrong to me

Before this change tests were failing sometimes. It is not 100% clear why. See for example https://github.com/cloudfoundry/loggregator-agent-release/actions/runs/20717887432/job/59473536901 .

It looks to me like WithParallelRetries recreates the RetryCoordinator while GetRetryCoordinator does so as well.

Verified this assumption via:

syslog.WithParallelRetries(2)
coord1 := coordinator
coordinator = syslog.GetGlobalRetryCoordinator()
coord2 := coordinator
Expect(coord1).To(BeIdenticalTo(coord2)) // Not the same pointer

I think ginkgo is creating a race condition for some reason. I tried to move WithParallelRetries this into BeforeSuite():

var _ = BeforeSuite(func() {
	syslog.WithParallelRetries(2)
})

But this did not work.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing performed?

  • Unit tests
  • Integration tests
  • Acceptance tests

Checklist:

  • This PR is being made against the main branch, or relevant version branch
  • I have made corresponding changes to the documentation
  • I have added testing for my changes

Before this tests were failing sometimes. It is not 100% clear why.

It looks to me like `WithParallelRetries` recreates the
RetryCoordinator while `GetRetryCoordinator` does so as well.

Verified this assumption via:

```
syslog.WithParallelRetries(2)
coord1 := coordinator
coordinator = syslog.GetGlobalRetryCoordinator()
coord2 := coordinator
Expect(coord1).To(BeIdenticalTo(coord2)) // Not the same pointer
```
coordinator.Acquire("test-host", "test-app")
blocked.Wait()
coordinator.Acquire("test-host", "test-app")
coordinator.Release()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was for sure wrong.
nice catch 👍

Comment on lines 90 to +91
syslog.WithParallelRetries(2)
coordinator := syslog.GetGlobalRetryCoordinator()
Copy link
Copy Markdown
Contributor

@nicklas-dohrn nicklas-dohrn Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving the GetGlobalRetry here is correct, as this is the only test where it is used anyways.
I am pretty sure the issue was the second calling of syslog.WithParallelRetries which should only be done once per test and was done twice before.
moving everything to where it is used is correct as well, maybe even more comprehensive.
So I am totally in agreement with your fix. 👍

Copy link
Copy Markdown
Contributor

@nicklas-dohrn nicklas-dohrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, this was inconsistent and is now fixed 👍

Copy link
Copy Markdown
Contributor

@chombium chombium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorbaum thanks for the fix and @nicklas-dohrn for the review.

LGTM!

@github-project-automation github-project-automation Bot moved this from Inbox to Pending Merge | Prioritized in Application Runtime Platform Working Group Jan 7, 2026
@chombium chombium merged commit ed9c681 into cloudfoundry:main Jan 7, 2026
5 checks passed
@github-project-automation github-project-automation Bot moved this from Pending Merge | Prioritized to Done in Application Runtime Platform Working Group Jan 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

3 participants