perftest: Fix double frees during RDMA CM connection retry#368
Conversation
3f0bf12 to
3cf9835
Compare
|
Hi @SherrinZhou, thanks for the contribution. i did some tests and encountered with seg fault which not reproduce with the master. |
| rdma_destroy_event_channel(ctx->cma_master.channel); | ||
| if (ctx->cma_master.rai) { | ||
| rdma_freeaddrinfo(ctx->cma_master.rai); | ||
| int connected_count = 0; | ||
| for (i = 0; i < user_param->num_of_qps; i++) { | ||
| if (ctx->cma_master.nodes[i].connected) { | ||
| connected_count++; | ||
| } | ||
| } | ||
|
|
||
| free(ctx->cma_master.nodes); |
There was a problem hiding this comment.
why these were removed? we still need them in order to cleanup resource for the non-retry flows
There was a problem hiding this comment.
Regarding the cleanup of cma_master.channel, cma_master.rai, and the nodes array: they were removed from rdma_cm_destroy_cma() because preserving them is necessary to keep the global RDMA CM context alive during connection retries. Following your suggestion, I have introduced a new function rdma_cm_destroy_master(struct pingpong_context *ctx) to manage the cleanup of these global resources. It is now called explicitly after rdma_cm_destroy_cma() inside destroy_ctx() and in the create_rdma_cm_connection() error path to ensure proper cleanup for non-retry flows.
|
i would suggest to create new |
48a7102 to
426759f
Compare
|
On top of the aforementioned fixes, upon further review, in rdma_cm_route_handler(), I removed the flawed connection_index++ logic. The client now precisely maps the incoming cma_id back to its designated node index via a loop lookup. This prevents the index from drifting out-of-bounds during sequential retries, guaranteeing state alignment. In the original code, the client relied on ctx->cma_master.connection_index to track which QP was being initialized during the RDMA_CM_EVENT_ROUTE_RESOLVED event. It unconditionally incremented this counter for every resolved route. To fix this, the client must abandon the sequential counter. Since every cma_id is pre-allocated and statically bound to a specific node index inside rdma_cm_allocate_nodes(), the rdma_cm_route_handler should reverse-map the cma_id to find its true owner. for (int i = 0; i < user_param->num_of_qps; i++) {
if (ctx->cma_master.nodes[i].cma_id == cma_id) {
connection_index = i;
break;
}
}As for the server side, it's not affected. The server is strictly passive. Its connection_index only increments when a valid CONNECT_REQUEST actually arrives. If a client's request is lost or rejected, the server's index does not advance. It simply waits for the next valid request to fill its current empty slot. Because Perftest configures all QPs symmetrically (same depth, same size, same attributes), it is fundamentally irrelevant whether Client's QP[1] connects to Server's QP[1] or Server's QP[0]. The out-of-band data (like remote keys, addresses, and QP numbers) is exchanged via ctx_hand_shake() after all RDMA CM connections are fully established. The handshake iterates over the successfully paired connections, ensuring the matching QPs exchange the correct keys. Thus, allowing the client to independently retry its specific failed QPs while the server sequentially accepts incoming requests guarantees both memory safety and robust connection recovery. |
|
Hi @sshaulnv , |
|
@SherrinZhou, i tried the new patch, and it got stuck after the result print waiting for event: |
When running perftest with RDMA CM enabled (-R), if the server rejects a connection request (RDMA_CM_EVENT_REJECTED), the client enters a retry loop. The initial retry logic suffered from severe state desynchronization and memory corruption during resource teardown. The following issues were identified and resolved: 1. Client State Desynchronization & Out-of-Bounds: The client's `rdma_cm_route_handler` previously used `connection_index++` to track QP creation. During a retry, a failed node would cause the index to increment anyway, eventually overflowing the `nodes` array and writing resources into incorrect slots. Fix: Replaced the sequential counter with explicit `cma_id` reverse mapping. The client now accurately identifies the node index bound to the `cma_id`, keeping state strictly aligned even after multiple retries. 2. Delayed Connection State Commitment: Nodes were marked as `connected = 1` too early (during route resolution), shielding them from proper cleanup if they failed during the final connect/accept phase. Fix: Moved the `connected = 1` assignment exclusively to the `rdma_cm_connection_established_handler`. Signed-off-by: Ruizhe Zhou <zhouruizhe@resnics.com>
426759f to
3d9d8d0
Compare
|
@SherrinZhou, thanks for the great effort, merged! |
|
Hi @sshaulnv. I have amended the code. int rdma_cm_connection_request_handler(struct pingpong_context *ctx,
struct perftest_parameters *user_param,
struct rdma_cm_event *event, struct rdma_cm_id *cma_id)
{
int rc, connection_index;
char *error_message = "";
struct cma_node *cm_node;
struct rdma_conn_param conn_param;
struct ibv_qp_attr rtr_attr = {
.min_rnr_timer = MIN_RNR_TIMER,
};
static int inject_reject_cnt = 0;
if (inject_reject_cnt < 2) {
inject_reject_cnt++;
printf("\n[DEBUG] Intentionally rejecting connection request #%d\n", inject_reject_cnt);
rdma_reject(cma_id, NULL, 0);
return 0;
}
//rest of the codeAnd there are still some problems in the clean-up flow. I have fixed them and run tests on my machines to confirm the fix. |
|
Hi @SherrinZhou , please open another PR with the latest fix, i will test and merge if everything will pass successfully |
|
@sshaulnv ok, done. |
When running perftest with RDMA CM enabled under an environment where the server is under high pressure and likely to reject a CM connection issued by the client, if the connection request is rejected (RDMA_CM_EVENT_REJECTED), the client enters a retry loop in
rdma_cm_client_connection.However, the previous retry logic contained multiple flaws causing segmentation faults, double frees, and heap corruption.
The error print looked like this:
RDMA CM event error:
Event: RDMA_CM_EVENT_REJECTED; error: 8.
ERRNO:Operation not supported.
Failed to handle RDMA CM event.
ERRNO: Operation not supported.
Failed to connect RDMA CM events.
ERRNO:Operation not supported.
Failed to resolve RDMA CM address.
ERRNO: Bad file descriptor.
Failed to destroy RDMA CM ID number 0.
ERRNO: Bad file descriptor.
Failed to destroy RDMA CM contexts.
ERRNO: Bad file descriptor.
free(): double free detected in tcache 2
The backtrace of the segfault triggered core dump looked like this:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fcb90783db5 in __GI_abort () at abort.c:79
#2 0x00007fcb907dc4e7 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fcb908ebaae "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007fcb907e35ec in malloc_printerr (str=str@entry=0x7fcb908ed6d8 "free(): double free detected in tcache 2") at malloc.c:5374
#4 0x00007fcb907e535d in _int_free (av=0x7fcb90b21bc0 <main_arena>, p=0xf459c0, have_lock=) at malloc.c:4213
#5 0x000000000040dd86 in create_rdma_cm_connection (ctx=0x7ffcf6f18410, user_param=0x7ffcf6f17fe0, comm=0x7ffcf6f17fc0, my_dest=0xf41590, rem_dest=0xf41a00) at src/perftest_communication.c:2949
#6 0x0000000000404b83 in main (argc=26, argv=0x7ffcf6f186f8) at src/send_bw.c:273
The following issues were identified and fixed:
Double Free and Heap Corruption:
The cleanup logic inside the retry loop destroyed resources (Event Channel,
IDs) without clearing their pointers. Subsequent error handling paths tried
to free them again, triggering "double free" or "Bad file descriptor".
Additionally,
connection_indexwas incremented unconditionally on everyattempt, eventually overflowing the
nodesarray and corrupting heap metadata.Invalid Argument / Context Mismatch:
The previous logic destroyed the Protection Domain (PD) and Event Channel
on every retry but failed to properly re-initialize them or update the
Context pointer. This caused
ibv_create_qpto fail with ENOENT (No suchfile or directory) because it attempted to use a stale PD with a new Context.
Client/Server State Desynchronization:
Resetting the connection flow completely on the client side caused state
desynchronization with the server (which tracks connection indices linearly),
leading to further rejections.
This patch implements a robust "incremental retry" strategy:
are preserved.
to ensure resource validity.
ctx_initis guarded to run only when the PD is not yet initialized.hints->ai_src_addrallocation are fixed.