feat(grpc): Implement PickFirst load balancer#2570
Conversation
a442272 to
73397ff
Compare
73397ff to
bdf7fc3
Compare
7ea10ad to
e1bcaf4
Compare
…ss, and updated sync testing framework
…A61 endpoint handling This commit implements the PickFirst load balancer policy for Tonic gRPC, focusing on: - Efficient subchannel management with backoff preservation. - "Stickiness" support: continuing to use an existing Ready subchannel if it remains in resolver updates. - Compliance with gRFC A61: endpoints are now shuffled before being flattened into an address list, ensuring multiple addresses for a single endpoint (e.g., IPv4/IPv6) stay together. - Clean state reset: subchannels and selected state are now cleared when receiving an empty address list. - Alignment with the updated synchronous testing framework in master. Includes comprehensive test coverage for basic connection, failover, stickiness, exhaustion, deterministic endpoint shuffling, de-duplication, and empty updates.
…active failover This change enhances the PickFirst load balancing policy to better support gRFC A61 (Happy Eyeballs) and improve connection establishment latency. Key changes: - Implement IPv6/IPv4 address interleaving in `compile_address` to ensure subsequent connection attempts alternate between protocol families. - Introduce a `subchannel_states` cache in `PickFirstPolicy` to track the connectivity status of managed subchannels. - Refactor connection logic to use a `frontier_index` and proactively skip subchannels known to be in `TransientFailure` (e.g., during backoff). - Update `advance_frontier` to safely maintain the index within the bounds of the address list, ensuring the policy remains reactive to recovery. - Add deterministic unit tests for shuffling and interleaving logic.
…into refactor/AddressHashing
…stLB # Conflicts: # grpc/Cargo.toml
|
Note that this now includes #2631, which will merge first so as to keep a clearer record. However, I needed the fix to get the tests ported from Go to work. |
|
|
||
| // Should NOT have any more events (no Connect, no UpdatePicker), | ||
| // because it stuck to the original selected subchannel. | ||
| std::thread::sleep(Duration::from_millis(50)); |
There was a problem hiding this comment.
We should use tokio::time::timeout or tokio::time::sleep to avoid blocking the entire thread. Same comment for other tests.
There was a problem hiding this comment.
Good catch. Tokio only runs tests with a single thread, so there's no way for the code under test to fail if the thread is blocked. (https://docs.rs/tokio/latest/tokio/attr.test.html#current-thread-runtime)
I've replaced everything to use helpers to pull from the channel.
This might be less perfect for test cases that know a call into the policy should trigger an event inline, but I think it's OK. Let me know what you think.
There was a problem hiding this comment.
This might be less perfect for test cases that know a call into the policy should trigger an event inline.
Looks good to me. I didn't notice that the mpsc channel was not an async channel, so I didn't consider the result of calling recv.
arjan-bal
left a comment
There was a problem hiding this comment.
Non-test changes look good. Left some minor comments on the test code.
|
|
||
| // Should NOT have any more events (no Connect, no UpdatePicker), | ||
| // because it stuck to the original selected subchannel. | ||
| std::thread::sleep(Duration::from_millis(50)); |
There was a problem hiding this comment.
This might be less perfect for test cases that know a call into the policy should trigger an event inline.
Looks good to me. I didn't notice that the mpsc channel was not an async channel, so I didn't consider the result of calling recv.
Motivation
Full implementation of the pick first load balancer, including 'Happy eyeballs' features.
Solution
Load balancing implementation to pick the first available endpoint to connect to, maintaining stickiness across endpoint updates if configured. Handles accepting new LB configuration and subchannel reconstruction.
Prototype is at https://github.com/nathanielford/grpc-rust-testbed/tree/main/pick_first_lib
Notes