Skip to content

Streamline distribution of distributed messages to children #888

@jpdillingham

Description

@jpdillingham

The logic to forward/distribute messages to distributed children is CPU intensive, or at least scales linearly with the number of children. This is somewhat unavoidable, but I want to make sure that this path, which is the hottest path in this software, has been optimized to a reasonable extent.

There are two candidates for optimization: DistributedConnectionManager.BroadcastMessageAsync and Connection.WriteAsync.

The role of BroadcastMessageAsync is to call WriteAsync on each of the child connections. Some things to consider:

  • The ChildConnectionDictionary must be iterated and the Connection objects pulled out of Lazy containers. This could be optimized with a dedicated cache of connections, but some benchmarking is necessary first to see if that's actually true
  • There's a Task.WhenAll that waits for all of the writes; this could possibly be omitted to avoid congestion in this method, but latency reporting would need to be refactored if this is done. I don't think the average latency metric is very useful.
  • Other than these two things, I don't see a lot of opportunity for optimization.

Regarding WriteAsync:

  • The overload that accepts byte[] creates a MemoryStream and passes it along to another method that accepts a stream
  • The WriteInternalAsync method then reads data from the stream and sends it to Stream.WriteAsync as either a byte[] or a ReadOnlyMemory wrapper around byte[]
  • ReadOnlyMemory is there to avoid allocations under the hood; it should stay

The biggest opportunity here is in the avoidance of ceremony within WriteInternalAsync; this should be exposed as something like WriteAllAsync() that accepts a ReadOnlyMemory and writes it directly to the connection while raising the necessary events and calling ResetInactivityTime(). Additional preprocessor directives will need to be added, probably within the calling BroadcastMessageAsync, because < .NET Standard 2.0 doesn't have ReadOnlyMemory.

The optimization described above will eliminate a couple of allocations, renting of ArrayPool, and a few frames from the stack. It may not be worth it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactorInternal refactoring/code improvements

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions