diff --git a/src/memory.jl b/src/memory.jl index b26ec72b2f..4cfda7f7fa 100644 --- a/src/memory.jl +++ b/src/memory.jl @@ -127,13 +127,13 @@ const HOSTALLOC_WRITECOMBINED = CUDAdrv.CU_MEMHOSTALLOC_WRITECOMBINED Allocate `bytesize` bytes of page-locked memory on the host. This memory is accessible from the CPU, and makes it possible to perform faster memory copies to the GPU. Furthermore, if `flags` is set to `HOSTALLOC_DEVICEMAP` the memory is also accessible from the GPU. +These accesses are direct, and go through the PCI bus. If `flags` is set to `HOSTALLOC_PORTABLE`, the memory is considered mapped by all CUDA contexts, not just the one that created the memory, which is useful if the memory needs to be accessed from multiple devices. Multiple `flags` can be set at one time using a bytewise `OR`: flags = HOSTALLOC_PORTABLE | HOSTALLOC_DEVICEMAP -These accesses are direct, and go through the PCI bus. """ function alloc(::Type{HostBuffer}, bytesize::Integer, flags=0) bytesize == 0 && return HostBuffer(C_NULL, 0, CuContext(C_NULL), false) @@ -155,9 +155,9 @@ const HOSTREGISTER_IOMEMORY = CUDAdrv.CU_MEMHOSTREGISTER_IOMEMORY Page-lock the host memory pointed to by `ptr`. Subsequent transfers to and from devices will be faster, and can be executed asynchronously. If the `HOSTREGISTER_DEVICEMAP` flag is -specified, the buffer will also be accessible directly from the GPU. If the -`HOSTREGISTER_PORTABLE` flag is specified, any CUDA context can access the memory. +specified, the buffer will also be accessible directly from the GPU. These accesses are direct, and go through the PCI bus. +If the `HOSTREGISTER_PORTABLE` flag is specified, any CUDA context can access the memory. """ function register(::Type{HostBuffer}, ptr::Ptr, bytesize::Integer, flags=0) bytesize == 0 && throw(ArgumentError("Cannot register an empty range of memory."))