Question: does ucx support FPGA to AMDGPU (ROCm ) p2p transfer? #9598

littlewu2508 · 2024-01-13T06:59:11Z

Hello, I'm a researcher wishing to achieve p2p data transfer from FPGA (Xilinx Alveo U50) to an AMDGPU. I read the https://rocm.docs.amd.com/en/latest/how-to/gpu-enabled-mpi.html and find that ucx is probably the direction to look into. Also, there are already implementation for FPGA-Nvidia GPU at https://github.com/RC4ML/FpgaNIC, using https://github.com/NVIDIA/gdrcopy, and I noticed there is rocm_gdr support in ucx documentation.

However, beyond these I can't find more clues about how FPGA-AMDGPU p2p can be implemented. Does anyone know about this?

The text was updated successfully, but these errors were encountered:

edgargabriel · 2024-01-13T12:48:23Z

@littlewu2508 the UCX software stack is not tested the Alveo cards at the moment, and hence it is not officially supported. I would suspect that at least tcp connection should be possible with the Alveo cards, but I cannot give an hints or advice since we do not test this scenario.

As a side note, please note that the rocm_gdr component has been removed from UCX starting from version 1.14.0, since it is not truly required for AMD GPUs. The way AMD GPUs are typically set up with large BAR support allows the CPU to map the entire GPU address space onto the host as well, hence a regular memcpy will work on GPU memory even without gdr_copy. (We might need to check and update the UCX documentation)

littlewu2508 · 2024-01-13T16:22:17Z

@littlewu2508 the UCX software stack is not tested the Alveo cards at the moment, and hence it is not officially supported. I would suspect that at least tcp connection should be possible with the Alveo cards, but I cannot give an hints or advice since we do not test this scenario.

Thank you very much! I have a question about tcp connection. Since I'm focusing on p2p transfer on PCIe bus, I don't know what the role tcp can play here?

As a side note, please note that the rocm_gdr component has been removed from UCX starting from version 1.14.0, since it is not truly required for AMD GPUs. The way AMD GPUs are typically set up with large BAR support allows the CPU to map the entire GPU address space onto the host as well, hence a regular memcpy will work on GPU memory even without gdr_copy. (We might need to check and update the UCX documentation)

That's very helpful, pointing out another way: memcpy using by via large BAR support. I found https://xilinx.github.io/XRT/2022.1/html/p2p.html and it seems that Alveo cards also support this for P2P (including with thirdparty PCIe device)

edgargabriel · 2024-01-22T21:55:44Z

Thank you very much! I have a question about tcp connection. Since I'm focusing on p2p transfer on PCIe bus, I don't know what the role tcp can play here?

I unfortunately do not know enough about the Alveo cards. My comment regarding tcp was not related for the GPU to NIC transfer, but for data transfer between two nodes with Alveo cards (not sure whether the Alveo cards support verbs)

littlewu2508 mentioned this issue Feb 1, 2024

[Question]: Does amdgpu support PCIe p2p dma copy with FPGA? ROCm/ROCK-Kernel-Driver#159

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: does ucx support FPGA to AMDGPU (ROCm ) p2p transfer? #9598

Question: does ucx support FPGA to AMDGPU (ROCm ) p2p transfer? #9598

littlewu2508 commented Jan 13, 2024

edgargabriel commented Jan 13, 2024

littlewu2508 commented Jan 13, 2024

edgargabriel commented Jan 22, 2024

Question: does ucx support FPGA to AMDGPU (ROCm ) p2p transfer? #9598

Question: does ucx support FPGA to AMDGPU (ROCm ) p2p transfer? #9598

Comments

littlewu2508 commented Jan 13, 2024

edgargabriel commented Jan 13, 2024

littlewu2508 commented Jan 13, 2024

edgargabriel commented Jan 22, 2024