Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: does ucx support FPGA to AMDGPU (ROCm ) p2p transfer? #9598

Open
littlewu2508 opened this issue Jan 13, 2024 · 3 comments
Open

Question: does ucx support FPGA to AMDGPU (ROCm ) p2p transfer? #9598

littlewu2508 opened this issue Jan 13, 2024 · 3 comments

Comments

@littlewu2508
Copy link

Hello, I'm a researcher wishing to achieve p2p data transfer from FPGA (Xilinx Alveo U50) to an AMDGPU. I read the https://rocm.docs.amd.com/en/latest/how-to/gpu-enabled-mpi.html and find that ucx is probably the direction to look into. Also, there are already implementation for FPGA-Nvidia GPU at https://github.com/RC4ML/FpgaNIC, using https://github.com/NVIDIA/gdrcopy, and I noticed there is rocm_gdr support in ucx documentation.

However, beyond these I can't find more clues about how FPGA-AMDGPU p2p can be implemented. Does anyone know about this?

@edgargabriel
Copy link
Contributor

@littlewu2508 the UCX software stack is not tested the Alveo cards at the moment, and hence it is not officially supported. I would suspect that at least tcp connection should be possible with the Alveo cards, but I cannot give an hints or advice since we do not test this scenario.

As a side note, please note that the rocm_gdr component has been removed from UCX starting from version 1.14.0, since it is not truly required for AMD GPUs. The way AMD GPUs are typically set up with large BAR support allows the CPU to map the entire GPU address space onto the host as well, hence a regular memcpy will work on GPU memory even without gdr_copy. (We might need to check and update the UCX documentation)

@littlewu2508
Copy link
Author

@littlewu2508 the UCX software stack is not tested the Alveo cards at the moment, and hence it is not officially supported. I would suspect that at least tcp connection should be possible with the Alveo cards, but I cannot give an hints or advice since we do not test this scenario.

Thank you very much! I have a question about tcp connection. Since I'm focusing on p2p transfer on PCIe bus, I don't know what the role tcp can play here?

As a side note, please note that the rocm_gdr component has been removed from UCX starting from version 1.14.0, since it is not truly required for AMD GPUs. The way AMD GPUs are typically set up with large BAR support allows the CPU to map the entire GPU address space onto the host as well, hence a regular memcpy will work on GPU memory even without gdr_copy. (We might need to check and update the UCX documentation)

That's very helpful, pointing out another way: memcpy using by via large BAR support. I found https://xilinx.github.io/XRT/2022.1/html/p2p.html and it seems that Alveo cards also support this for P2P (including with thirdparty PCIe device)

@edgargabriel
Copy link
Contributor

Thank you very much! I have a question about tcp connection. Since I'm focusing on p2p transfer on PCIe bus, I don't know what the role tcp can play here?

I unfortunately do not know enough about the Alveo cards. My comment regarding tcp was not related for the GPU to NIC transfer, but for data transfer between two nodes with Alveo cards (not sure whether the Alveo cards support verbs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants