Skip to content

Conversation

@wfaderhold21
Copy link
Collaborator

What

Extends congestion avoidance already present in onesided alltoall to include RTT to skip and revisit potentially congested peers.

Why ?

The original congestion avoidance calculates a set of tokens to limit the amount of messages in flight, which is useful for medium messages, but has limited effect for large messages.

How ?

When token calculation results in less than 1 token, the algorithm segments messages to calculate a RTT and determine if the RTT is beyond a particular threshold (25%) of the estimated latency. This causes the sender to stop sending to the congested rank and move to the next available rank for a set number of times before forcing the send of the message.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 16, 2025

Skipped: This PR does not contain any of your configured labels: (Ready-For-Review)

@wfaderhold21 wfaderhold21 marked this pull request as draft December 16, 2025 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant