Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix in stable_distribute and stable_distribute_inplace #25

Merged
merged 1 commit into from
Mar 29, 2020

Conversation

asrivast28
Copy link
Contributor

@asrivast28 asrivast28 commented Mar 28, 2020

Problem: Currently, the mxx::stable_distribute* functions may crash on a process with rank >= p if the input size on every process with rank >= p is zero and the global input size is less than the total number of processes. This is because the functions try to determine the index of the first element of the current partition using the result of the mxx::exscan which leads to a divide by zero error in the function mxx::partition::blk_dist_buf::rank_of.
Solution: A process should try to compute its send counts only if the input size on the process is non-zero.

I have also added a test that was failing before the fix.

when the total size is less than the number of processes.
Also added a test that was failing before the fix.
@@ -111,15 +111,18 @@ void stable_distribute(_InIterator begin, _InIterator end, _OutIterator out, con
size_t prefix = mxx::exscan(local_size, comm);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix may be the same as total_size if the processes with rank lower than the current process have all the input elements.

std::vector<size_t> send_counts(comm.size(), 0);
blk_dist part(total_size, comm.size(), comm.rank());
int first_p = part.rank_of(prefix);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the line that leads to an error if prefix is the same as the total_size.

size = 0;
}
// XXX: test_distribute fails with an assertion error in mxx::sort
// test_distribute<std::vector<int>>(size, gen, c);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to fail with an assertion error in sample_distribute. Is that expected?

@patflick
Copy link
Owner

LGTM thanks

@patflick patflick merged commit 794ca69 into patflick:master Mar 29, 2020
@asrivast28 asrivast28 deleted the FixStableDistribute branch June 5, 2021 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants