Skip to content

Comments

Reinforcement Learning RFC#69

Open
l45k wants to merge 1 commit intoalphafrom
leo/rl_rfc
Open

Reinforcement Learning RFC#69
l45k wants to merge 1 commit intoalphafrom
leo/rl_rfc

Conversation

@l45k
Copy link
Contributor

@l45k l45k commented Oct 1, 2025

This RFC proposes how Hypha can be improved in order to support Reinforcemnt Learning.

This RFC proposes how Hypha can be improved in order to support
Reinforcemnt Learning.
@codecov
Copy link

codecov bot commented Oct 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment on lines +70 to +72
The second requirement will be satisfied by improving the Scheduler to redirect Worker requests for data to
different RL Data Nodes. Thus, the Scheduler needs to balance latency between RL Data Nodes and Workers as
well as sampling and processing speed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should model this not in the scheduler but via the different connector/bridge much like the stochastic wiring described in the SWARM learning paper. We already have a many reference and different selection strategies allowing us to point from one work to many data nodes and only need to extend this with a strategy that considers the connection and delivery speed (latency, bandwidth, generation) to optimally connect workers with data nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be possible. However, I would like to avoid mixing concepts. We decided to go with DiLoCo and a centralized scheduler. If we now start to loosen this by introducing a form of decentralized scheduling, it will complicate things more than it will help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it be super interesting to benchmark one approach against the other no matter what we'll use as the standard moving forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree. But would rather have a working baseling and start improving from there on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's start with the scheduler approach then

@orlandohohmeier
Copy link
Contributor

We should probably do #80 first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants