Skip to content

Conversation

@ChrisJBurns
Copy link
Collaborator

@ChrisJBurns ChrisJBurns commented Dec 15, 2025

Improves e2e test reliability by fixing MCP client handling and adding robust retry helpers.

Key changes:

  • Fix root cause of "Session not found" errors: MCP clients cannot be reliably restarted after a failed Start() or Initialize(). Added CreateInitializedMCPClientWithRetry and CreateAuthenticatedMCPClientWithRetry helpers that create a fresh client on each retry attempt instead of reusing a failed client.
  • Replace time.Sleep with WaitForHealthy: Polls the /health endpoint to verify server readiness instead of arbitrary sleep delays.
  • Add tool discovery helpers: WaitForToolsDiscovered and WaitForSpecificToolsDiscovered ensure backends are fully connected before running tool-related assertions.
  • Reduce K8s API pressure: Increased polling intervals from 1s to 3s across tests.
  • Add FlakeAttempts(2) to the auth discovery test for additional resilience.

Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
@github-actions github-actions bot added the size/M Medium PR: 300-599 lines changed label Dec 15, 2025
@codecov
Copy link

codecov bot commented Dec 15, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.51%. Comparing base (589ecb5) to head (15ace12).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3058   +/-   ##
=======================================
  Coverage   56.51%   56.51%           
=======================================
  Files         334      334           
  Lines       33170    33170           
=======================================
  Hits        18747    18747           
  Misses      12839    12839           
  Partials     1584     1584           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/M Medium PR: 300-599 lines changed labels Dec 15, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Dec 15, 2025
@ChrisJBurns ChrisJBurns changed the title draft: fixes tests improves e2e test reliability with client handling and retry helpers Dec 15, 2025
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Dec 15, 2025
@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Dec 15, 2025
@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Dec 15, 2025
Comment on lines +1070 to +1082
func CreateAuthenticatedMCPClientWithRetry(
nodePort int32,
clientName string,
httpClient *http.Client,
timeout time.Duration,
pollingInterval time.Duration,
) (*InitializedMCPClient, error) {
return CreateInitializedMCPClientWithRetry(
nodePort,
clientName,
timeout,
pollingInterval,
transport.WithHTTPBasicClient(httpClient),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, feel free to ignore: I would delete this and have the caller create the option transport.WithHTTPBasicClient(httpClient),.

}

It("should list and call tools from all backends with discovered auth", func() {
It("should list and call tools from all backends with discovered auth", FlakeAttempts(2), func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer not introducing FlakeAttempts because it will prolong CI and normalize handling flakiness with more retries. If this pattern proliferates, we'll have a lot of low value, high cost tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good spot, I forgot to take these out when the other changes were added 😅

@jhrozek
Copy link
Contributor

jhrozek commented Dec 16, 2025

needs a rebase atop @jerm-dro 's PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants