CSHARP-5935: Command activities may be skipped when using pooled connection#1918
CSHARP-5935: Command activities may be skipped when using pooled connection#1918ajcvickers wants to merge 2 commits intomongodb:mainfrom
Conversation
…ection I was looking at flaky tests, and found MongoClient_should_create_activities_when_tracing_enabled was failing occasionally. Turns out it was an actual bug. Junie (Opus 4.6) says: ### Root Cause `_shouldTrace` in `CommandEventHelper` was set **once** at connection construction time via `MongoTelemetry.ActivitySource.HasListeners()`. Since connections are pooled and reused, if a connection was created before an `ActivityListener` was registered (or by a test with tracing disabled), `_shouldTrace` remained `false` permanently for that connection — command activities were never created even when a listener was later active.
| kmsProviders.Add("local", localKey); | ||
|
|
||
| var keyVaultNamespace = CollectionNamespace.FromFullName("encryption.__keyVault"); | ||
| var keyVaultMongoClient = new MongoClient(); |
There was a problem hiding this comment.
This is a simple test fix to use the configured local connection string. Not related to the bug fix--I can create another PR if necessary.
| collection.Find(Builders<BsonDocument>.Filter.Empty).FirstOrDefault(); | ||
| collection.DeleteOne(Builders<BsonDocument>.Filter.Eq("name", "test")); | ||
|
|
||
| SpinWait.SpinUntil(() => capturedActivities.Count >= 6, millisecondsTimeout: 10000); |
There was a problem hiding this comment.
I'm not sure this is needed--I added it when I thought the test was flaky. But I don't think it does harm.
There was a problem hiding this comment.
Pull request overview
Fixes a tracing/instrumentation bug where command-level OpenTelemetry activities could be permanently skipped on pooled connections when an ActivityListener was registered after the connection was created.
Changes:
- Update
CommandEventHelperto decide whether to start command activities based on currentActivitySourcelisteners (instead of only at connection construction time). - Stabilize the OpenTelemetry smoke test by capturing activities thread-safely and waiting for async activity completion.
- Minor smoke test / test infrastructure adjustments (key vault client URI selection; unobserved-exception tracking test case selection).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/MongoDB.Driver/Core/Connections/CommandEventHelper.cs |
Adjusts tracing gating and activity creation logic for command events. |
tests/SmokeTests/MongoDB.Driver.SmokeTests.Sdk/OpenTelemetryTests.cs |
Makes activity capture thread-safe and waits for expected activities. |
tests/SmokeTests/MongoDB.Driver.SmokeTests.Sdk/LibmongocryptTests.cs |
Changes how the key vault MongoClient chooses its connection string. |
tests/MongoDB.TestHelpers/XunitExtensions/TimeoutEnforcing/TimeoutEnforcingXunitTestAssemblyRunner.cs |
Avoids SingleOrDefault failure when multiple tracking test cases exist. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
tests/SmokeTests/MongoDB.Driver.SmokeTests.Sdk/LibmongocryptTests.cs
Outdated
Show resolved
Hide resolved
...goDB.TestHelpers/XunitExtensions/TimeoutEnforcing/TimeoutEnforcingXunitTestAssemblyRunner.cs
Show resolved
Hide resolved
| _tracingDisabled = tracingOptions?.Disabled == true; | ||
| _queryTextMaxLength = tracingOptions?.QueryTextMaxLength ?? 0; | ||
|
|
||
| _shouldTrackState = _shouldTrackSucceeded || _shouldTrackFailed || _shouldTrace; | ||
| _shouldTrackState = _shouldTrackSucceeded || _shouldTrackFailed || !_tracingDisabled; | ||
| _shouldProcessRequestMessages = _shouldTrackStarted || _shouldTrackState; |
There was a problem hiding this comment.
While copilot is not wrong here, making this change re-introduces the bug, since if we're not tracking state, then we never get to the point of creating the activity.
There was a problem hiding this comment.
It's important not to track stack when there are no listeners.
If it comes to that, we might need a more creative solution, or adjust our testing.
There was a problem hiding this comment.
Yep, I think you are right.
| public void ConnectionFailed(ConnectionId connectionId, ObjectId? serviceId, Exception exception, bool skipLogging) | ||
| { | ||
| if (!_shouldTrackFailed && !_shouldTrace) | ||
| if (!_shouldTrackFailed && (_tracingDisabled || !MongoTelemetry.ActivitySource.HasListeners())) |
There was a problem hiding this comment.
minor: probably it' cleaner to extract to IsTracingDisabled method.
| _tracingDisabled = tracingOptions?.Disabled == true; | ||
| _queryTextMaxLength = tracingOptions?.QueryTextMaxLength ?? 0; | ||
|
|
||
| _shouldTrackState = _shouldTrackSucceeded || _shouldTrackFailed || _shouldTrace; | ||
| _shouldTrackState = _shouldTrackSucceeded || _shouldTrackFailed || !_tracingDisabled; | ||
| _shouldProcessRequestMessages = _shouldTrackStarted || _shouldTrackState; |
There was a problem hiding this comment.
It's important not to track stack when there are no listeners.
If it comes to that, we might need a more creative solution, or adjust our testing.
I was looking at flaky tests, and found MongoClient_should_create_activities_when_tracing_enabled was failing occasionally. Turns out it was an actual bug. Junie (Opus 4.6) says:
Root Cause
_shouldTraceinCommandEventHelperwas set once at connection construction time viaMongoTelemetry.ActivitySource.HasListeners(). Since connections are pooled and reused, if a connection was created before anActivityListenerwas registered (or by a test with tracing disabled),_shouldTraceremainedfalsepermanently for that connection — command activities were never created even when a listener was later active.