Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question aboout host copy #432

Closed
siposcsaba89 opened this issue Sep 2, 2024 · 3 comments · Fixed by #433
Closed

Question aboout host copy #432

siposcsaba89 opened this issue Sep 2, 2024 · 3 comments · Fixed by #433
Labels

Comments

@siposcsaba89
Copy link

Hi,

I would like to read back the data from an unordered map to a host vector. Before having custom pair implementation, the following code worked:

    thrust::host_vector<stdgpu::pair<T1, T2>> host_vector(hash_map.size());
    auto range_map = hash_map.device_range();
    thrust::copy(range_map.begin(), range_map.end(), host_vector.begin());

But now it fails to compile, that there is no assignement function which takes a thrust::reference<...> object.
The best I came up with:

    thrust::device_vector<stdgpu::pair<T1, T2>> device_vector(hash_map.size());
    auto range_map = hash_map.device_range();
    thrust::copy(range_map.begin(), range_map.end(), device_vector.begin());
    thrust::host_vector<stdgpu::pair<T1, T2>> host_vector = device_vector;

But in this way, I have to duplicate memory allocation on the gpu (device_vector), which can be an issue if the map takes lots of memory.

Do you have any solution to how to copy an unordered map back to host memory? Maybe there is a very simple solution, I cannot see.

Thanks for your help.

Best, Csaba

@stotko
Copy link
Owner

stotko commented Sep 2, 2024

I suspect that both code snippets you posted will likely result in the same amount of copying work. The device_range() function in fact just returns a fancy range that acts as a view to the underlying (internally sparsely distributed) values. Without digging too much in thrust::copy's implementation, I guess that its a two-step approach here it may first perform a temporary copy when it detects this fancy range/iterators and afterwards performs the actual cudaMemcpy to host memory.
(I would need to check their implementation to confirm this behavior, so this is mostly my intuition on how this could be done efficiently.)

Nonetheless, this looks like a regression and I agree that the former code snippet should at least compile and run fine. Could you share the concrete full compiler error that you observed here? Which CUDA/thrust version did you use?

@siposcsaba89
Copy link
Author

Hi, thanks for your quick reply.

I just made a small example, you can find it here: https://github.com/siposcsaba89/stdgpu_debug

In the main.cu file, there is a flag here, you can toggle between the old working code and the new one: https://github.com/siposcsaba89/stdgpu_debug/blob/6e44a52b261305f61ee880fb15e587f7116fd772/main.cu#L46

I tried both on windows 11 with CUDA 12.6 and visual studio 17.11.2, and on Ubuntu 22.04 with cuda 11.8 as well, the same compilation error occures.

This is on ubuntu:

/home/csaba/projects/stdgpu_debug/build/_deps/stdgpu-src/src/stdgpu/../stdgpu/impl/algorithm_detail.h(125): error: no operator "=" matches these operands
            operand types are: stdgpu::pair<A, B> = thrust::reference<stdgpu::pair<const A, B>, thrust::pointer<stdgpu::pair<const A, B>, thrust::system::cpp::detail::tag, thrust::use_default, thrust::use_default>, thrust::use_default>
          detected during:
            instantiation of "void stdgpu::detail::copy_functor<InputIt, OutputIt>::operator()(stdgpu::index_t) [with InputIt=thrust::pointer<stdgpu::pair<const A, B>, thrust::system::cpp::detail::tag, thrust::use_default, thrust::use_default>, OutputIt=thrust::detail::normal_iterator<stdgpu::pair<A, B> *>]"
/usr/local/cuda-11.8/targets/x86_64-linux/include/thrust/detail/function.h(125): here
....

Anyway, it would be great if there would be examples, how to copy data from gpu to host and back, maybe there are, but I wasn't able to find it.

Thanks for your help.

@stotko
Copy link
Owner

stotko commented Sep 4, 2024

Thanks for the reproducer. I tested different implementations of stdgpu::pair with the following results:

  • using pair = thrust::pair<...> (old implementation) --> works
  • Duplicating old thrust::pair struct in stdgpu namespace --> same error as above
  • Custom struct (current implementation) --> same error

The issue seems to be related to how the namespace of copy is resolved. Due to Argument-Dependent Lookup (ADL), stdgpu::copy is called somehow as part of thrust::copy's implementation. It indeed seems to perform some temporary copies besides the actual cudaMemcpy. Our version does not support device-to-host (or vice versa) transfers and is actually only used in the tests. Perhaps, we can make it internal to circumvent ADL or more robustly perform the assignment. I will have a look into this.

Regarding an example for data copying: There is currently none for this case, but the existing one for unordered_map/unordered_set could be extended in this regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants