Skip to content

Commit

Permalink
librdmacm/cmtime: Skip waiting for disconnect reply
Browse files Browse the repository at this point in the history
After the client sends a disconnect request to the server,
have it wait for the server to sync using the OOB mechanism.
As the number of connections to test gets close to 1000, it's
frequent that the DREP is not making it back to the DREQ.
The result is that the DREQ must time out completely before
the client can proceed.

Note that this appears to be exposing undesirable behavior
from the kernel CM regarding duplicate DREQ handling.
However, because the timeouts are so long, it impacts the
ability to execute the test to collect connection setup
timings.

Signed-off-by: Sean Hefty <[email protected]>
  • Loading branch information
Sean Hefty committed Apr 23, 2024
1 parent abe5dff commit 737ac31
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion librdmacm/examples/cmtime.c
Expand Up @@ -369,6 +369,8 @@ static void client_disconnect(struct work_item *item)

start_perf(n, STEP_DISCONNECT);
rdma_disconnect(n->id);
end_perf(n, STEP_DISCONNECT);
completed[STEP_DISCONNECT]++;
}

static void server_disconnect(struct work_item *item)
Expand Down Expand Up @@ -439,10 +441,16 @@ static void cma_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
exit(EXIT_FAILURE);
break;
case RDMA_CM_EVENT_DISCONNECTED:
if (is_client()) {
if (!is_client()) {
/* To fix an issue where DREQs are not responded
* to, the client completes its disconnect phase
* as soon as it calls rdma_disconnect and does
* not wait for a response from the server. The
* OOB sync handles that coordiation
end_perf(n, STEP_DISCONNECT);
completed[STEP_DISCONNECT]++;
} else {
*/
if (disc_events == 0) {
printf("\tDisconnecting\n");
start_time(STEP_DISCONNECT);
Expand Down

0 comments on commit 737ac31

Please sign in to comment.