Question aboout host copy #432

siposcsaba89 · 2024-09-02T11:07:26Z

Hi,

I would like to read back the data from an unordered map to a host vector. Before having custom pair implementation, the following code worked:

    thrust::host_vector<stdgpu::pair<T1, T2>> host_vector(hash_map.size());
    auto range_map = hash_map.device_range();
    thrust::copy(range_map.begin(), range_map.end(), host_vector.begin());

But now it fails to compile, that there is no assignement function which takes a thrust::reference<...> object.
The best I came up with:

    thrust::device_vector<stdgpu::pair<T1, T2>> device_vector(hash_map.size());
    auto range_map = hash_map.device_range();
    thrust::copy(range_map.begin(), range_map.end(), device_vector.begin());
    thrust::host_vector<stdgpu::pair<T1, T2>> host_vector = device_vector;

But in this way, I have to duplicate memory allocation on the gpu (device_vector), which can be an issue if the map takes lots of memory.

Do you have any solution to how to copy an unordered map back to host memory? Maybe there is a very simple solution, I cannot see.

Thanks for your help.

Best, Csaba

The text was updated successfully, but these errors were encountered:

stotko · 2024-09-02T17:26:43Z

I suspect that both code snippets you posted will likely result in the same amount of copying work. The device_range() function in fact just returns a fancy range that acts as a view to the underlying (internally sparsely distributed) values. Without digging too much in thrust::copy's implementation, I guess that its a two-step approach here it may first perform a temporary copy when it detects this fancy range/iterators and afterwards performs the actual cudaMemcpy to host memory.
(I would need to check their implementation to confirm this behavior, so this is mostly my intuition on how this could be done efficiently.)

Nonetheless, this looks like a regression and I agree that the former code snippet should at least compile and run fine. Could you share the concrete full compiler error that you observed here? Which CUDA/thrust version did you use?

siposcsaba89 · 2024-09-03T06:52:32Z

Hi, thanks for your quick reply.

I just made a small example, you can find it here: https://github.com/siposcsaba89/stdgpu_debug

In the main.cu file, there is a flag here, you can toggle between the old working code and the new one: https://github.com/siposcsaba89/stdgpu_debug/blob/6e44a52b261305f61ee880fb15e587f7116fd772/main.cu#L46

I tried both on windows 11 with CUDA 12.6 and visual studio 17.11.2, and on Ubuntu 22.04 with cuda 11.8 as well, the same compilation error occures.

This is on ubuntu:

/home/csaba/projects/stdgpu_debug/build/_deps/stdgpu-src/src/stdgpu/../stdgpu/impl/algorithm_detail.h(125): error: no operator "=" matches these operands
            operand types are: stdgpu::pair<A, B> = thrust::reference<stdgpu::pair<const A, B>, thrust::pointer<stdgpu::pair<const A, B>, thrust::system::cpp::detail::tag, thrust::use_default, thrust::use_default>, thrust::use_default>
          detected during:
            instantiation of "void stdgpu::detail::copy_functor<InputIt, OutputIt>::operator()(stdgpu::index_t) [with InputIt=thrust::pointer<stdgpu::pair<const A, B>, thrust::system::cpp::detail::tag, thrust::use_default, thrust::use_default>, OutputIt=thrust::detail::normal_iterator<stdgpu::pair<A, B> *>]"
/usr/local/cuda-11.8/targets/x86_64-linux/include/thrust/detail/function.h(125): here
....

Anyway, it would be great if there would be examples, how to copy data from gpu to host and back, maybe there are, but I wasn't able to find it.

Thanks for your help.

stotko · 2024-09-04T18:06:12Z

Thanks for the reproducer. I tested different implementations of stdgpu::pair with the following results:

using pair = thrust::pair<...> (old implementation) --> works
Duplicating old thrust::pair struct in stdgpu namespace --> same error as above
Custom struct (current implementation) --> same error

The issue seems to be related to how the namespace of copy is resolved. Due to Argument-Dependent Lookup (ADL), stdgpu::copy is called somehow as part of thrust::copy's implementation. It indeed seems to perform some temporary copies besides the actual cudaMemcpy. Our version does not support device-to-host (or vice versa) transfers and is actually only used in the tests. Perhaps, we can make it internal to circumvent ADL or more robustly perform the assignment. I will have a look into this.

Regarding an example for data copying: There is currently none for this case, but the existing one for unordered_map/unordered_set could be extended in this regard.

stotko added the bug label Sep 4, 2024

stotko mentioned this issue Sep 9, 2024

algorithm,memory,numeric: Improve thrust compatibility via ADL barriers #433

Merged

stotko closed this as completed in #433 Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question aboout host copy #432

Question aboout host copy #432

siposcsaba89 commented Sep 2, 2024

stotko commented Sep 2, 2024

siposcsaba89 commented Sep 3, 2024

stotko commented Sep 4, 2024 •

edited

Loading

Question aboout host copy #432

Question aboout host copy #432

Comments

siposcsaba89 commented Sep 2, 2024

stotko commented Sep 2, 2024

siposcsaba89 commented Sep 3, 2024

stotko commented Sep 4, 2024 • edited Loading

stotko commented Sep 4, 2024 •

edited

Loading