Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with the performance of All_to_All communication #335

Open
ZidongGao opened this issue Aug 9, 2022 · 1 comment
Open

Problem with the performance of All_to_All communication #335

ZidongGao opened this issue Aug 9, 2022 · 1 comment

Comments

@ZidongGao
Copy link

ZidongGao commented Aug 9, 2022

Hi,
when I am using GLOO AlltoAll in my work, I find the performance is much slower than expected.
Here is a test in my environment.

rank_num : 2

element_per_rank time_cost speed
50000 0.042867s 4.67MB/s
100000 0.08761s 4.57MB/s
500000 0.350037s 5.71MB/s
1000000 2.96007s 1.35MB/s
2000000 8.18468s 0.977MB/s
5000000 24.6651s 0.811MB/s

As it is shown, while the number of elements is larger than 1 million, the speed of AlltoAll becomes an obvious slowing down.
Is there someone else also met this problem ? I am not sure if there is something wrong with my usage, or it should be slow as I list in AlltoAll communication.

Here is my test code of alltoall

TEST(GlooCommTest, AllToAll) {
  GlooComm gloo_comm(g_nranks, g_rank);
  gloo_comm.Initialize("127.0.0.1", 12345, "127.0.0.1");
  const size_t stride = 1000000;
  std::vector<int> send(g_nranks * stride);
  std::vector<int> recv(g_nranks * stride);
  for (size_t i = 0; i < g_nranks; ++i) {
    for (size_t j = 0; j < stride; ++j) {
      send[stride * i + j] = i;
    }
  }

  gloo_comm.AllToAll(send.data(), recv.data(), stride, 30);

  for (size_t i = 0; i < g_nranks; ++i) {
    for (size_t j = 0; j < stride; ++j) {
      ASSERT_EQ(recv[stride * i + j], static_cast<int>(g_rank));
    }
  }
}
  template <typename T>
  void AllToAll(T* send, T* recv, size_t send_cnt_each, size_t timeout) {
    gloo::AlltoallOptions opts(gloo_context_);
    opts.setInput(send, send_cnt_each * nranks_);
    opts.setOutput(recv, send_cnt_each * nranks_);
    opts.setTimeout(std::chrono::milliseconds(timeout * 1000));
    gloo::alltoall(opts);
  }
@gavin1332
Copy link

I have the same problem. Is there any response?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants