Bugfix: RDMA client cannot fall back to TCP when RDMA is not enabled on the server#3350
Bugfix: RDMA client cannot fall back to TCP when RDMA is not enabled on the server#3350chenBright wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a downgrade-path bug where an RDMA-enabled client connecting to a TCP-only server could block waiting for an RDMA server hello and fail to fall back cleanly. It does so by adding a server-side pseudo-protocol that recognizes RDMA handshake magic, drains the client hello, replies an intentionally “un-negotiable” hello to trigger client-side downgrade to TCP on the same connection, then drains the client ACK and returns control to normal protocol parsing.
Changes:
- Add a server-side
rdma_handshakefallback protocol to enable RDMA-client → TCP-server downgrade on the same TCP connection. - Centralize RDMA handshake wire constants into a shared header and update RDMA handshake code/tests to use them.
- Extend
RawPacker/RawUnpackerwith 16-bit and raw-bytes packing helpers and update v2 handshake serialization to use them.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/brpc_rdma_unittest.cpp | Updates tests to use shared handshake constants and verifies RDMA-client → TCP-server calls no longer fail. |
| src/butil/raw_pack.h | Adds pack16/unpack16 and pack/unpack raw-bytes helpers used by handshake serialization. |
| src/brpc/server.cpp | Ensures rdma_handshake isn’t filtered out by enabled_protocols whitelist logic. |
| src/brpc/rdma/rdma_handshake.h | Switches handshake constants to the shared constants header. |
| src/brpc/rdma/rdma_handshake.cpp | Refactors v2 hello (de)serialization and uses shared constants for versions/sizes/magic. |
| src/brpc/rdma/rdma_handshake_constants.h | New shared header for RDMA handshake wire constants (magic, lengths, versions, bounds, ACK format). |
| src/brpc/rdma/rdma_endpoint.cpp | Uses shared constants for magic-length handling in server handshake path. |
| src/brpc/policy/rdma_handshake_fallback_protocol.h | Declares the new server-side RDMA-handshake fallback protocol parser. |
| src/brpc/policy/rdma_handshake_fallback_protocol.cpp | Implements the fallback handshake state machine using socket parsing context. |
| src/brpc/parse_result.h | Adds missing stddef.h include (size_t). |
| src/brpc/options.proto | Introduces PROTOCOL_RDMA_HANDSHAKE and changes protocol numeric ordering. |
| src/brpc/input_messenger.cpp | Removes the old special-case RDMA “TRY_OTHERS” hack in input parsing. |
| src/brpc/global.cpp | Registers the new rdma_handshake protocol at startup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4d56434 to
7b86333
Compare
|
Separate out the handshake protocol. After negotiation, can any currently supported protocol be used? |
After negotiation, any protocol currently supported by the RDMA client can be used. |
7b86333 to
52bd3cb
Compare
52bd3cb to
445b7fa
Compare
What problem does this PR solve?
Issue Number: resolve
Problem Summary:
When an RDMA-enabled client connects to a server that supports TCP only,
the client sends its handshake hello ("RDMA"/"RDM3" magic + body) and
then blocks waiting for the server hello. The server does not recognize those
leading bytes as any known protocol and simply closes the connection, so the
client only sees EOF on the socket and has no signal to downgrade to TCP.
What is changed and the side effects?
Changed:
A new server-side-only pseudo-protocol
PROTOCOL_RDMA_HANDSHAKE(registered ahead of
PROTOCOL_HTTPso the protocol selector tries it first)implements a tiny handshake state machine:
Detect: If the first 4 bytes are "RDMA" or "RDM3", enter the fallback path;
otherwise return
TRY_OTHERSand let normal protocol parsing continue.Drain client hello: Read and drain the full v2 or v3 hello.
Reply an un-negotiable hello:
hello_ver = 0xFFFF,which makes the client's
ValidHelloMessage()fail.ValidRdmaHello()becauseblock_size = 0 < MIN_BLOCK_SIZEandqp_num = 0.negotiated = falseand downgrades to TCP on thesame connection rather than erroring at the IO layer.
expecting the 4B handshake ACK; the next call drops those bytes, clears the
context, and returns
TRY_OTHERSso the subsequent real RPC payload goesthrough normal parsing.
Side effects:
Performance effects:
Breaking backward compatibility:
Check List: