Skip to content

Commit 0b8b871

Browse files
committed
RDMA/rxe: Fix incomplete state save in rxe_requester
jira KERNEL-325 cve CVE-2023-53539 Rebuild_History Non-Buildable kernel-4.18.0-553.89.1.el8_10 commit-author Bob Pearson <rpearsonhpe@gmail.com> commit 5d122db Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-553.89.1.el8_10/5d122db2.failed If a send packet is dropped by the IP layer in rxe_requester() the call to rxe_xmit_packet() can fail with err == -EAGAIN. To recover, the state of the wqe is restored to the state before the packet was sent so it can be resent. However, the routines that save and restore the state miss a significnt part of the variable state in the wqe, the dma struct which is used to process through the sge table. And, the state is not saved before the packet is built which modifies the dma struct. Under heavy stress testing with many QPs on a fast node sending large messages to a slow node dropped packets are observed and the resent packets are corrupted because the dma struct was not restored. This patch fixes this behavior and allows the test cases to succeed. Fixes: 3050b99 ("IB/rxe: Fix race condition between requester and completer") Link: https://lore.kernel.org/r/20230721200748.4604-1-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> (cherry picked from commit 5d122db) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # drivers/infiniband/sw/rxe/rxe_req.c
1 parent 4f51ba2 commit 0b8b871

File tree

1 file changed

+90
-0
lines changed

1 file changed

+90
-0
lines changed
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
RDMA/rxe: Fix incomplete state save in rxe_requester
2+
3+
jira KERNEL-325
4+
cve CVE-2023-53539
5+
Rebuild_History Non-Buildable kernel-4.18.0-553.89.1.el8_10
6+
commit-author Bob Pearson <rpearsonhpe@gmail.com>
7+
commit 5d122db2ff80cd2aed4dcd630befb56b51ddf947
8+
Empty-Commit: Cherry-Pick Conflicts during history rebuild.
9+
Will be included in final tarball splat. Ref for failed cherry-pick at:
10+
ciq/ciq_backports/kernel-4.18.0-553.89.1.el8_10/5d122db2.failed
11+
12+
If a send packet is dropped by the IP layer in rxe_requester()
13+
the call to rxe_xmit_packet() can fail with err == -EAGAIN.
14+
To recover, the state of the wqe is restored to the state before
15+
the packet was sent so it can be resent. However, the routines
16+
that save and restore the state miss a significnt part of the
17+
variable state in the wqe, the dma struct which is used to process
18+
through the sge table. And, the state is not saved before the packet
19+
is built which modifies the dma struct.
20+
21+
Under heavy stress testing with many QPs on a fast node sending
22+
large messages to a slow node dropped packets are observed and
23+
the resent packets are corrupted because the dma struct was not
24+
restored. This patch fixes this behavior and allows the test cases
25+
to succeed.
26+
27+
Fixes: 3050b9985024 ("IB/rxe: Fix race condition between requester and completer")
28+
Link: https://lore.kernel.org/r/20230721200748.4604-1-rpearsonhpe@gmail.com
29+
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
30+
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
31+
(cherry picked from commit 5d122db2ff80cd2aed4dcd630befb56b51ddf947)
32+
Signed-off-by: Jonathan Maple <jmaple@ciq.com>
33+
34+
# Conflicts:
35+
# drivers/infiniband/sw/rxe/rxe_req.c
36+
diff --cc drivers/infiniband/sw/rxe/rxe_req.c
37+
index f63771207970,d8c41fd626a9..000000000000
38+
--- a/drivers/infiniband/sw/rxe/rxe_req.c
39+
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
40+
@@@ -746,9 -799,12 +748,12 @@@ int rxe_requester(void *arg
41+
pkt.mask = rxe_opcode[opcode].mask;
42+
pkt.wqe = wqe;
43+
44+
+ /* save wqe state before we build and send packet */
45+
+ save_state(wqe, qp, &rollback_wqe, &rollback_psn);
46+
+
47+
av = rxe_get_av(&pkt, &ah);
48+
if (unlikely(!av)) {
49+
- rxe_dbg_qp(qp, "Failed no address vector\n");
50+
+ pr_err("qp#%d Failed no address vector\n", qp_num(qp));
51+
wqe->status = IB_WC_LOC_QP_OP_ERR;
52+
goto err;
53+
}
54+
@@@ -790,17 -840,23 +789,33 @@@
55+
56+
err = rxe_xmit_packet(qp, &pkt, skb);
57+
if (err) {
58+
++<<<<<<< HEAD
59+
+ qp->need_req_skb = 1;
60+
+
61+
+ rollback_state(wqe, qp, &rollback_wqe, rollback_psn);
62+
+
63+
+ if (err == -EAGAIN) {
64+
+ rxe_run_task(&qp->req.task, 1);
65+
+ goto exit;
66+
++=======
67+
+ if (err != -EAGAIN) {
68+
+ wqe->status = IB_WC_LOC_QP_OP_ERR;
69+
+ goto err;
70+
++>>>>>>> 5d122db2ff80 (RDMA/rxe: Fix incomplete state save in rxe_requester)
71+
}
72+
73+
- wqe->status = IB_WC_LOC_QP_OP_ERR;
74+
- goto err;
75+
+ /* the packet was dropped so reset wqe to the state
76+
+ * before we sent it so we can try to resend
77+
+ */
78+
+ rollback_state(wqe, qp, &rollback_wqe, rollback_psn);
79+
+
80+
+ /* force a delay until the dropped packet is freed and
81+
+ * the send queue is drained below the low water mark
82+
+ */
83+
+ qp->need_req_skb = 1;
84+
+
85+
+ rxe_sched_task(&qp->req.task);
86+
+ goto exit;
87+
}
88+
89+
update_state(qp, &pkt);
90+
* Unmerged path drivers/infiniband/sw/rxe/rxe_req.c

0 commit comments

Comments
 (0)