speed up of topk-operator #10997

asmushetzel · 2018-05-18T14:39:04Z

Description

Speedup of topk-operator. Specifically

on CPU, avoid doing 3 full sorts and instead do a single partial_sort resp a single full sort when K is reasonably large.
on CPU, parallelize over the batch size
on GPU, have a faster methods for very small K (which only works on shared memory). This is an important special case for example in beam search applications.

It is hard to put exact criteria in such code on when to switch between different versions of sort algorithms. The CPU criteria should be reasonable stable, the GPU one as well. Tests on Volta showed that a back2back full sort will start to outperform the special small-K version for some cases only (batch size 1, k > 500,000) and not by much. We can put more logic in there and try to be smarter for such cases, but not sure that it is worth it.

The work here is driven by the fact that the performance in cases with small-K of the existing code is non-acceptable. We have an important application where we resorted actually to numpy to get around the current performance issues.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

[ x] Changes are complete (i.e. I finished coding on this PR)
[x ] All changes have test coverage:
[x ] Code is well-documented:
[ x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

marcoabreu · 2018-05-19T00:36:24Z

Hi Asmus, thanks for the great work! Could you please elaborate on the test coverage? I don't think we have these exact test cases covered (the different values of N, K and batch sizes specifically). Would be great if you could add some tests for that

asmushetzel · 2018-05-21T11:32:59Z

test_order() in test_operator.py tests full sort, partial sort, different choice of axis, ascending/descending, different K (including cases where K is greater/bigger than the threshold in GPUs) etc. It is quite comprehensive and IMO sufficient.

sxjscience · 2018-05-22T11:15:25Z

Great!

asmushetzel force-pushed the topk2 branch 2 times, most recently from 89f5fd9 to 15474fd Compare May 18, 2018 16:59

speed up of topk-operator

39275ed

asmushetzel force-pushed the topk2 branch from 15474fd to 39275ed Compare May 21, 2018 11:36

marcoabreu requested review from piiswrong and szha May 21, 2018 12:57

piiswrong merged commit b746632 into apache:master May 21, 2018

jinhuang415 pushed a commit to jinhuang415/incubator-mxnet that referenced this pull request May 29, 2018

speed up of topk-operator (apache#10997)

33312e5

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018

speed up of topk-operator (apache#10997)

d712ba5

leezu mentioned this pull request Jun 14, 2018

nd.topk regression with nan values #11271

Closed

zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018

speed up of topk-operator (apache#10997)

37137a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up of topk-operator #10997

speed up of topk-operator #10997

asmushetzel commented May 18, 2018

marcoabreu commented May 19, 2018

asmushetzel commented May 21, 2018

sxjscience commented May 22, 2018

speed up of topk-operator #10997

speed up of topk-operator #10997

Conversation

asmushetzel commented May 18, 2018

Description

Checklist

Essentials

marcoabreu commented May 19, 2018

asmushetzel commented May 21, 2018

sxjscience commented May 22, 2018