aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorBob Pearson <rpearsonhpe@gmail.com>2024-03-29 09:55:04 -0500
committerJason Gunthorpe <jgg@nvidia.com>2024-04-22 16:54:32 -0300
commit2b23b6097303ed0ba5f4bc036a1c07b6027af5c6 (patch)
treebb21bd2651c8fc6469ca60f6efb10872d67c4bdc
parentca0b44e20a6f3032224599f02e7c8fb49525c894 (diff)
downloadrdma-2b23b6097303ed0ba5f4bc036a1c07b6027af5c6.tar.gz
RDMA/rxe: Fix seg fault in rxe_comp_queue_pkt
In rxe_comp_queue_pkt() an incoming response packet skb is enqueued to the resp_pkts queue and then a decision is made whether to run the completer task inline or schedule it. Finally the skb is dereferenced to bump a 'hw' performance counter. This is wrong because if the completer task is already running in a separate thread it may have already processed the skb and freed it which can cause a seg fault. This has been observed infrequently in testing at high scale. This patch fixes this by changing the order of enqueuing the packet until after the counter is accessed. Link: https://lore.kernel.org/r/20240329145513.35381-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Fixes: 0b1e5b99a48b ("IB/rxe: Add port protocol stats") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-rw-r--r--drivers/infiniband/sw/rxe/rxe_comp.c6
1 files changed, 3 insertions, 3 deletions
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index b78b8c0856abd..c997b7cbf2a9e 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -131,12 +131,12 @@ void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
{
int must_sched;
- skb_queue_tail(&qp->resp_pkts, skb);
-
- must_sched = skb_queue_len(&qp->resp_pkts) > 1;
+ must_sched = skb_queue_len(&qp->resp_pkts) > 0;
if (must_sched != 0)
rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED);
+ skb_queue_tail(&qp->resp_pkts, skb);
+
if (must_sched)
rxe_sched_task(&qp->comp.task);
else