Skip to content

CSHARP-5887: Simplify retryable read and writes#1882

Open
papafe wants to merge 9 commits intomongodb:mainfrom
papafe:csharp5887-simplify_retryability
Open

CSHARP-5887: Simplify retryable read and writes#1882
papafe wants to merge 9 commits intomongodb:mainfrom
papafe:csharp5887-simplify_retryability

Conversation

@papafe
Copy link
Contributor

@papafe papafe commented Feb 17, 2026

No description provided.

@papafe papafe requested a review from sanych-sun February 17, 2026 15:39
@papafe papafe added the improvement Optimizations or refactoring (no new features or fixes). label Feb 17, 2026
@papafe papafe force-pushed the csharp5887-simplify_retryability branch 2 times, most recently from 3321aad to 5a8fcca Compare February 17, 2026 15:55
serverResponse = commandException.Result;
}
catch (Exception exception)
catch (Exception exception) when (!context.ErrorDuringLastChannelAcquisition)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This avoids an exception in EnsureCanProceedNextBatch and ToFinalResultsOrThrow, as the context does not have the channel anymore, as there was an exception.
This error was not visible before, as the channel acquisition was done outside of the try catch and it will just raise the exception on the whole method.
The question here is we want the non-retryable channel acquisition to make the whole method throw (like it was before), or we need to find another way to make EnsureCanProceedNextBatch and ToFinalResultsOrThrow work.

{
_databaseNamespace = Ensure.IsNotNull(databaseNamespace, nameof(databaseNamespace));
_command = Ensure.IsNotNull(command, nameof(command));
_command = command; //can be null
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used so that we can modify the command once the ReadCommandOperation has been created.
This is done because some operations need to have the operationContext to properly create the command, but now the operationContext is not available until later.
I'm not a fan of this implementation, to be honest.
We got two other possibilities in my opinion:

  1. We make ReadCommandOperation equivalent to WriteCommandOperation, so it does not concern with retryability. Then we make a new class, RetryableReadCommandOperation that has retryability and uses ReadCommandOperation inside
  2. Instead of having the possibility of setting the command, we can add a method to modify it when we got the operation context.

I think nr1 is more desirable, but requires further much more refactoring.

{
HashSet<ServerDescription> deprioritizedServers = null;
var attempt = 1;
var totalAttempts = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to be aligned with the write version, and starting from 0 to be more compatible with the future client backpressure integration

@papafe papafe requested a review from sanych-sun February 18, 2026 16:35
@papafe papafe marked this pull request as ready for review February 18, 2026 16:35
@papafe papafe requested a review from a team as a code owner February 18, 2026 16:35
Copilot AI review requested due to automatic review settings February 18, 2026 16:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Simplifies retryable read/write execution by centralizing channel acquisition/replacement inside the retry executors/contexts, and introduces dynamic command creation for retryable read command operations.

Changes:

  • Refactors RetryableReadOperationExecutor/RetryableWriteOperationExecutor to acquire/replace channels per attempt and track last acquired server.
  • Removes RetryableReadContext.Create* / RetryableWriteContext.Create* factories; callers now construct contexts directly.
  • Adds ICommandCreator + a ReadCommandOperation overload to build commands dynamically per attempt/connection (used by find/aggregate/count/distinct).

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/MongoDB.Driver.Tests/Core/Operations/RetryableWriteOperationExecutorTests.cs Updates test helper to construct RetryableWriteContext and acquire a channel explicitly.
tests/MongoDB.Driver.Tests/Core/Operations/CommandOperationBaseTests.cs Removes test asserting command must be non-null (aligning with dynamic-command support).
tests/MongoDB.Driver.Tests/Core/LoadBalancingIntergationTests.cs Updates helper methods to construct contexts and explicitly acquire channels sync/async.
src/MongoDB.Driver/Core/Operations/RetryableWriteOperationExecutor.cs Moves channel acquisition into executor loop; adds attempt counters and uses LastAcquiredServer.
src/MongoDB.Driver/Core/Operations/RetryableWriteContext.cs Removes Create/CreateAsync; adds ErrorDuringLastChannelAcquisition and LastAcquiredServer.
src/MongoDB.Driver/Core/Operations/RetryableWriteCommandOperationBase.cs Switches to direct context construction (executor now acquires channels).
src/MongoDB.Driver/Core/Operations/RetryableUpdateCommandOperation.cs Adjusts retry payload construction and payload variable used in the message section.
src/MongoDB.Driver/Core/Operations/RetryableReadOperationExecutor.cs Moves channel acquisition into executor loop; uses LastAcquiredServer for deprioritization.
src/MongoDB.Driver/Core/Operations/RetryableReadContext.cs Removes Create/CreateAsync; tracks LastAcquiredServer; pins channel after acquisition.
src/MongoDB.Driver/Core/Operations/RetryableInsertCommandOperation.cs Adjusts retry payload construction for insert retries.
src/MongoDB.Driver/Core/Operations/RetryableDeleteCommandOperation.cs Adjusts retry payload construction for delete retries.
src/MongoDB.Driver/Core/Operations/ReadCommandOperation.cs Adds dynamic-command constructor via ICommandCreator; sets command per attempt.
src/MongoDB.Driver/Core/Operations/ListIndexesUsingCommandOperation.cs Switches to direct RetryableReadContext construction.
src/MongoDB.Driver/Core/Operations/ListIndexesOperation.cs Switches to direct RetryableReadContext construction.
src/MongoDB.Driver/Core/Operations/ListCollectionsOperation.cs Switches to direct RetryableReadContext construction.
src/MongoDB.Driver/Core/Operations/ICommandCreator.cs Introduces interface for creating commands dynamically from session/connection info.
src/MongoDB.Driver/Core/Operations/FindOperation.cs Implements ICommandCreator; passes creator into ReadCommandOperation.
src/MongoDB.Driver/Core/Operations/EstimatedDocumentCountOperation.cs Switches to direct RetryableReadContext construction; updates BeginOperation overload usage.
src/MongoDB.Driver/Core/Operations/DistinctOperation.cs Implements ICommandCreator; passes creator into ReadCommandOperation; updates BeginOperation.
src/MongoDB.Driver/Core/Operations/CountOperation.cs Implements ICommandCreator; passes creator into ReadCommandOperation.
src/MongoDB.Driver/Core/Operations/CommandOperationBase.cs Allows null command to support dynamic command building; adds SetCommand.
src/MongoDB.Driver/Core/Operations/ClientBulkWriteOperation.cs Constructs RetryableWriteContext directly; filters exception handling for acquisition failures.
src/MongoDB.Driver/Core/Operations/ChangeStreamOperation.cs Switches to direct RetryableReadContext construction (aggregate operation will acquire channels).
src/MongoDB.Driver/Core/Operations/BulkUnmixedWriteOperationBase.cs Constructs RetryableWriteContext directly; makes final result creation tolerant of null channel.
src/MongoDB.Driver/Core/Operations/BulkMixedWriteOperation.cs Constructs RetryableWriteContext directly.
src/MongoDB.Driver/Core/Operations/AggregateOperation.cs Implements ICommandCreator and uses dynamic ReadCommandOperation construction.
src/MongoDB.Driver/Core/Misc/BatchableSource.cs Adds internal ctor to preserve processedCount; adjusts public ctor delegation/validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@papafe papafe force-pushed the csharp5887-simplify_retryability branch from 866f8a9 to 0fc39ef Compare February 26, 2026 16:43
@papafe papafe requested a review from sanych-sun February 26, 2026 16:44
}

private EventContext.OperationNameDisposer BeginOperation() => EventContext.BeginOperation(OperationName);
private EventContext.OperationIdDisposer BeginOperation() => EventContext.BeginOperation(null, OperationName);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something strange. If we use this overloads, then the name of the command gets set properly.
This happens because before this PR, we were connecting to the server when creating the RetryableContext, and at that time the command name was set properly. Now this is done inside the retryable operation executor, and by that time (for example) ReadCommandOperation.Execute is called and that contains BeginOperation(null, null), that puts the command name to null. This means that the command name in ClusterSelectingServerEvent is null.
We need to understand if this is the way we want to go, and so do it for other read operations as well.

/// Sets the command to be executed. This is used by derived classes that build commands dynamically.
/// </summary>
/// <param name="command">The command.</param>
protected void SetCommand(BsonDocument command)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood we need this method so inhereted classes could set the command just before executing it. Also as far as I understood we do not expose the _command (other then for testing purposes). Should we instead of SetCommand have a CreateCommand abstarct method and use it from ExecuteProtocol, so each inherited class will be able to produce a command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment made me realise we could actually just pass the command to ExecuteProtocol, so I removed the SetCommand method.

server = context.SelectServer(operationContext, deprioritizedServers);
context.AcquireChannel(operationContext);

return operation.ExecuteAttempt(operationContext, context, totalAttempts, transactionNumber: null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we track separatelly executionAttempts? Because in no-CSOT scenario we have only 2 attempts, in case there will be an error on server selection or channel aquision - we might exit after just 1 attempt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we talking about the attempt variable that goes into operation.ExecuteAttempt? If so then this variable it is not actually used for read operation, but instead we have a difference for write operation in RetryableWriteOperationExecutor, where we have both totalAttempts and operationExecutionAttempts. We could add it here too for the sake of symmetry, but it won't be used as well.

If it's about counting how many attempts we have done, and so to count how many attempts we have left, those errors should be counted as well as far as I have understood from the retryable reads and writes specs.

@papafe papafe requested a review from sanych-sun March 18, 2026 16:36
{
_databaseNamespace = Ensure.IsNotNull(databaseNamespace, nameof(databaseNamespace));
_command = Ensure.IsNotNull(command, nameof(command));
_command = command; //can be null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to get rid of _command at all, especially after adding that as a parameter to ExecuteProtocol.

var maxBatchCount = Math.Min(MaxBatchCount ?? int.MaxValue, channel.ConnectionDescription.MaxBatchCount);
var maxDocumentSize = channel.ConnectionDescription.MaxWireDocumentSize;
var payload = new Type1CommandMessageSection<UpdateRequest>("updates", _updates, UpdateRequestSerializer.Instance, NoOpElementNameValidator.Instance, maxBatchCount, maxDocumentSize);
var payload = new Type1CommandMessageSection<UpdateRequest>("updates", updates, UpdateRequestSerializer.Instance, NoOpElementNameValidator.Instance, maxBatchCount, maxDocumentSize);
Copy link
Member

@sanych-sun sanych-sun Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revert this change too. Again it clearly looks like an issue, but it could be the reason why we did not face the missing operations on retry. Let's do this in separate PR/ticket after deeper investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Optimizations or refactoring (no new features or fixes).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants