Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librdmacm/cmtime: Add multi-thread support #1451

Merged
merged 17 commits into from
May 8, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Prev Previous commit
Next Next commit
librdmacm/cmtime: Skip waiting for disconnect reply
After the client sends a disconnect request to the server,
have it wait for the server to sync using the OOB mechanism.
As the number of connections to test gets close to 1000, it's
frequent that the DREP is not making it back to the DREQ.
The result is that the DREQ must time out completely before
the client can proceed.

Note that this appears to be exposing undesirable behavior
from the kernel CM regarding duplicate DREQ handling.
However, because the timeouts are so long, it impacts the
ability to execute the test to collect connection setup
timings.

Signed-off-by: Sean Hefty <[email protected]>
  • Loading branch information
Sean Hefty committed Apr 23, 2024
commit 737ac31fd3bb31e36c7da234597b4cc5e335b430
10 changes: 9 additions & 1 deletion librdmacm/examples/cmtime.c
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,8 @@ static void client_disconnect(struct work_item *item)

start_perf(n, STEP_DISCONNECT);
rdma_disconnect(n->id);
end_perf(n, STEP_DISCONNECT);
completed[STEP_DISCONNECT]++;
}

static void server_disconnect(struct work_item *item)
Expand Down Expand Up @@ -439,10 +441,16 @@ static void cma_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
exit(EXIT_FAILURE);
break;
case RDMA_CM_EVENT_DISCONNECTED:
if (is_client()) {
if (!is_client()) {
/* To fix an issue where DREQs are not responded
* to, the client completes its disconnect phase
* as soon as it calls rdma_disconnect and does
* not wait for a response from the server. The
* OOB sync handles that coordiation
end_perf(n, STEP_DISCONNECT);
completed[STEP_DISCONNECT]++;
} else {
*/
if (disc_events == 0) {
printf("\tDisconnecting\n");
start_time(STEP_DISCONNECT);
Expand Down