Conversation
This RFC proposes how Hypha can be improved in order to support Reinforcemnt Learning.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| The second requirement will be satisfied by improving the Scheduler to redirect Worker requests for data to | ||
| different RL Data Nodes. Thus, the Scheduler needs to balance latency between RL Data Nodes and Workers as | ||
| well as sampling and processing speed. |
There was a problem hiding this comment.
I wonder whether we should model this not in the scheduler but via the different connector/bridge much like the stochastic wiring described in the SWARM learning paper. We already have a many reference and different selection strategies allowing us to point from one work to many data nodes and only need to extend this with a strategy that considers the connection and delivery speed (latency, bandwidth, generation) to optimally connect workers with data nodes.
There was a problem hiding this comment.
I think this would be possible. However, I would like to avoid mixing concepts. We decided to go with DiLoCo and a centralized scheduler. If we now start to loosen this by introducing a form of decentralized scheduling, it will complicate things more than it will help.
There was a problem hiding this comment.
Well, it be super interesting to benchmark one approach against the other no matter what we'll use as the standard moving forward.
There was a problem hiding this comment.
I fully agree. But would rather have a working baseling and start improving from there on.
There was a problem hiding this comment.
Yeah, let's start with the scheduler approach then
|
We should probably do #80 first. |
This RFC proposes how Hypha can be improved in order to support Reinforcemnt Learning.