Streamline distribution of distributed messages to children

The logic to forward/distribute messages to distributed children is CPU intensive, or at least scales linearly with the number of children.  This is somewhat unavoidable, but I want to make sure that this path, which is the hottest path in this software, has been optimized to a reasonable extent.

There are two candidates for optimization: `DistributedConnectionManager.BroadcastMessageAsync` and `Connection.WriteAsync`.

The role of `BroadcastMessageAsync` is to call `WriteAsync` on each of the child connections.  Some things to consider:
* The `ChildConnectionDictionary` must be iterated and the `Connection` objects pulled out of `Lazy` containers.  This could be optimized with a dedicated cache of connections, but some benchmarking is necessary first to see if that's actually true
* There's a `Task.WhenAll` that waits for all of the writes; this could possibly be omitted to avoid congestion in this method, but latency reporting would need to be refactored if this is done.  I don't think the average latency metric is very useful.
* Other than these two things, I don't see a lot of opportunity for optimization.

Regarding `WriteAsync`:

* The overload that accepts `byte[]` creates a `MemoryStream` and passes it along to another method that accepts a stream
* The `WriteInternalAsync` method then reads data from the stream and sends it to `Stream.WriteAsync` as either a `byte[]` or a `ReadOnlyMemory` wrapper around `byte[]`
* `ReadOnlyMemory` is there to avoid allocations under the hood; it should stay

The biggest opportunity here is in the avoidance of ceremony within `WriteInternalAsync`; this should be exposed as something like `WriteAllAsync()` that accepts a `ReadOnlyMemory` and writes it directly to the connection while raising the necessary events and calling `ResetInactivityTime()`.  Additional preprocessor directives will need to be added, probably within the calling `BroadcastMessageAsync`, because < .NET Standard 2.0 doesn't have `ReadOnlyMemory`.

The optimization described above will eliminate a couple of allocations, renting of `ArrayPool`, and a few frames from the stack.  It may not be worth it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline distribution of distributed messages to children #888

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Streamline distribution of distributed messages to children #888

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions