UCT/CUDA_IPC: Fabric memory support #9982

Akshay-Venkatesh · 2024-06-28T17:20:11Z

What/Why?

Second part of #9787 and follow up to #9867. This implements IPC-support for fabric handle associated memory using the newer import/export API and caches import operations similar to legacy IPC handles.

Note: need to solve https://redmine.mellanox.com/issues/4005305 before merging

yosefe · 2024-07-04T12:04:19Z

/azp run UCX PR

azure-pipelines · 2024-07-04T12:04:31Z

Azure Pipelines successfully started running 1 pipeline(s).

yosefe · 2024-07-04T12:08:11Z

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

+uct_cuda_ipc_rem_mpool_cache_t uct_cuda_ipc_rem_mpool_cache;
+
+static ucs_status_t
+uct_cuda_ipc_get_rem_mpool_cache(CUmemFabricHandle *fabric_handle,


can we reuse the existing IPC remote cache instead of adding a new one?

IPC mechanism with memory pools involves two steps in the import phase. One expensive step to import the actual memory pool to allows local GPU to be capable of accessing locally mapped memory pool; and another low overhead step to import remote pointer. The exporting process is expected to make several allocations and deallocations from the memory pool and pack associated pointer and memory pool handle in the rkey. Importing process checks for remote memory pool handle and performs a mapping only when not present.

The existing IPC remote cache relies on handle as key and returns a VA as value and directly uses offsets from the VA range for copy purposes. This doesn't translate to remote memory pool cache as we need imported memory and use that with the exported pointer here to import associated VA. Hence there isn't a one-to-one mapping between the two remote caches.

maybe we could still unite them, if the hash key will contain the handle type?

@yosefe Today, we use exporter-side base pointer to lookup cache and perform import operation if base poiner isn't found. This works fine for legacy cuda-ipc because there is one independent backing for a given allocation. Two allocations can never have a common backing. So performing expensive import operation for each new unique base pointer is justified (we also need to check that the handle for given base pointer matches with cached handle to detect VA recycling).

For mempools, many sub-allocations will have the same common backing (the memory pool). The first time we encounter a base pointer corresponding to a sub-allocation from the mempool, we import the whole mempool and then use that imported mempool and exported data pointer to get local translation of the associated base pointer of the sub-allocation at the importer side. If another sub-allocation from the same mempool is encountered at the importing process, if we use existing IPC remote cache logic, we will re-import an already imported mempool (paying unnecessary cost) and get local translation. Instead, it is better to check if a given mempool has been imported. Therefore we have a new cache that takes exported handle as input and returns imported handle as output. This way, any new sub-allocation from already imported memory pool will avoid import operation and just data pointer is imported. Since the old cache takes base pointer as input and the new cache introduced takes exporter handle as input, I don't see an immediate way of unifying the cache as we do need a common key type for a common cache.

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

yosefe · 2024-07-04T12:12:15Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

+ status = UCT_CUDADRV_FUNC_LOG_ERR(cuPointerGetAttribute(
+ &allowed_handle_types,
+ CU_POINTER_ATTRIBUTE_ALLOWED_HANDLE_TYPES,
+ (CUdeviceptr)addr));
+ if ((status != UCS_OK) ||
+ (!(allowed_handle_types | fabric_type))) {
+ status = UCS_ERR_NO_RESOURCE;
+ goto err;
+ }


why is this needed?

Because user may allocate memory pool without fabric capabilities in which case cuda-ipc UCT cannot support fabric handle based IPC.

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

brminich · 2024-07-03T14:44:14Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.c


 key = ucs_malloc(sizeof(*key), "uct_cuda_ipc_lkey_t");
 if (key == NULL) {
 return UCS_ERR_NO_MEMORY;
 }

+ memset(key, 0, sizeof(*key));


consider using calloc instead of malloc if it is really needed

brminich · 2024-07-03T14:59:34Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.h

+ CUmemFabricHandle vmm; /* VMM export handle */
+ CUmemFabricHandle mempool;/* MallocAsync handle */


do we really need two fields of the same type? Can we use some common name for them?

Can have a common variable fabric_handle. Will change this.

Akshay-Venkatesh · 2024-07-06T19:50:14Z

@yosefe / @brminich looks like tests are failing for reasons outside PR logic. Would you mind restarting failing tests?

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

yosefe · 2024-07-07T08:48:37Z

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

+uct_cuda_ipc_rem_mpool_cache_t uct_cuda_ipc_rem_mpool_cache;
+
+static ucs_status_t
+uct_cuda_ipc_get_rem_mpool_cache(CUmemFabricHandle *fabric_handle,


maybe we could still unite them, if the hash key will contain the handle type?

Akshay-Venkatesh · 2024-07-07T15:18:19Z

@yosefe and @brminich let me know if the changes look good and if you other comments about the PR.

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

yosefe · 2024-07-09T12:27:23Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

- goto err;
- }
-
+ if ((status == UCS_OK) && (mempool != 0)) {


could mempool be 0 even if status is OK?

Yes. mempool is 0 when buffer points to any memory that wasn't allocated using cudaMallocAsync. Strictly speaking we don't need this check because all such memory should be filtered out by ipc capability check. I decided to leave it in place in case some corner case is missed.

so maybe convert it to assertion to show this isn't expected?

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

brminich

a couple of minor styling comments

brminich · 2024-07-10T07:17:41Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.h

 /**
 * @brief cudar ipc region registered for exposure
 */
 typedef struct {
- CUipcMemHandle ph; /* Memory handle of GPU memory */
+ uct_cuda_ipc_md_handle_t ph; /* Memory handle of GPU memory */


minor: imo better to either keep comments alignment or remove extra spaces here

src/uct/cuda/cuda_ipc/cuda_ipc_md.h

brminich · 2024-07-10T07:18:51Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.h

+ CUmemPoolPtrExportData ptr;
+ uct_cuda_ipc_key_handle_t handle_type;


maybe worth aligning these two fields, because previous fields (lines 66-67) are aligned

Suggested change

CUmemPoolPtrExportData ptr;

uct_cuda_ipc_key_handle_t handle_type;

CUmemPoolPtrExportData ptr;

uct_cuda_ipc_key_handle_t handle_type;

brminich · 2024-07-10T08:00:01Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

+#if HAVE_CUDA_FABRIC
+ /* cuda_ipc can handle VMM, mallocasync, and legacy pinned device so need to
+ * pack appropriate handle */
+ legacy_handle = &key->ph.handle.legacy;


minor: maybe move it to line 78? i.e. initialize if legacy_capable only

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

brminich · 2024-07-10T08:10:15Z

src/uct/cuda/cuda_ipc/cuda_ipc_iface.c

@@ -79,8 +79,10 @@ uct_cuda_ipc_iface_is_reachable_v2(const uct_iface_h tl_iface,
 const uct_iface_is_reachable_params_t *params)
 {
 return uct_iface_is_reachable_params_addrs_valid(params) &&
+#if !HAVE_CUDA_FABRIC


maybe we can introduce an env var which would manage MNNVL enablement (i.e. make devices on different nodes unreachable if it is disabled)

it seems that after this change, we will create cuda_ipc endpoints on ALL UCP endpoints, including systems without MNNVL, just if we compiled with a recent enough cuda version. And it will lead to extra resource consumption, increase of rkey size, trying to unpack cuda_ipc keys even of x86 setup, etc. IMO we need to at least check MNNVL is supported on the system, and ideally check if remote peer is really reachable by MNNVL.

brminich · 2024-07-10T08:13:26Z

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

+ return uct_cuda_ipc_open_memhandle_legacy(key->ph.handle.legacy,
+ mapped_addr);


minor:

Suggested change

return uct_cuda_ipc_open_memhandle_legacy(key->ph.handle.legacy,

mapped_addr);

return uct_cuda_ipc_open_memhandle_legacy(key->ph.handle.legacy,

mapped_addr);

brminich · 2024-07-10T08:18:46Z

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

+ *imported_mpool = kh_val(&uct_cuda_ipc_rem_mpool_cache.hash, khiter);
+ *key_present = 1;
+ } else {
+ ucs_error("unable to use cuda_ipc remote_cache hash");


Suggested change

ucs_error("unable to use cuda_ipc remote_cache hash");

ucs_error("unable to use cuda_ipc remote_cache hash: %d", khret);

src/uct/cuda/cuda_ipc/cuda_ipc_iface.c

src/uct/cuda/cuda_ipc/cuda_ipc_md.h

yosefe · 2024-07-14T13:47:28Z

@Akshay-Venkatesh also as we discussed let's remove the caching from this PR

Akshay-Venkatesh · 2024-07-15T17:20:48Z

@Akshay-Venkatesh also as we discussed let's remove the caching from this PR

@yosefe I've removed the cache now. Please let me know if the changes look good. As discussed offline, I'll create a follow up PR to add mempool cache logic back and mnnvl enablement through user env once this PR is approved.

Akshay-Venkatesh · 2024-07-18T17:12:03Z

@yosefe As checks are passing and there aren't more comments, let me know if I can squash the comments. Thanks

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

yosefe · 2024-07-21T12:08:53Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

+ key->ph.handle_type = UCT_CUDA_IPC_KEY_HANDLE_TYPE_VMM;
+ ucs_trace("packed vmm fabric handle for %p", addr);
+ goto common_path;
+ } else {


so we assume that if cuMemRetainAllocationHandle failed, it's a mempool?
can we check it explicitly by some poster query or as part of cuPointerGetAttribute?
in any case need to add documentation in the code that describes the logic (in high level)

We conclude mempool if it's not legacy ipc memory (viz cudaMalloc), and it is not cuMemcreate/VMM memory (because retainAllocHandle failed).

src/uct/cuda/cuda_ipc/cuda_ipc_md.h

Akshay-Venkatesh · 2024-07-24T18:59:15Z

@yosefe the latest changes have enable_mnnvl env var added back and necessary reachability checks. Let me know if the changes look good. Thanks

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

src/uct/cuda/cuda_ipc/cuda_ipc_iface.c

yosefe · 2024-07-26T13:08:57Z

src/uct/cuda/cuda_ipc/cuda_ipc_iface.c

 uct_device_type_t dev_type = UCT_DEVICE_TYPE_SHM;
+#if HAVE_CUDA_FABRIC
+ uct_cuda_ipc_md_t *ipc_md = ucs_derived_of(md, uct_cuda_ipc_md_t);


ipc_md -> md

md is already declared.

per convention in other places, let's rename the parameter from md to uct_md (and ipc_md to md)

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

src/uct/cuda/cuda_ipc/cuda_ipc_md.h

yosefe · 2024-07-30T09:22:12Z

src/uct/cuda/cuda_ipc/cuda_ipc_iface.c

@@ -94,7 +94,7 @@ static int uct_cuda_ipc_iface_is_coherent()
 return 0;
 }

- return coherent;
+ return (coherent && (md->enable_mnnvl != UCS_NO));


pls remove the external ( )

yosefe · 2024-07-30T09:22:56Z

src/uct/cuda/cuda_ipc/cuda_ipc_iface.c

 uct_device_type_t dev_type = UCT_DEVICE_TYPE_SHM;
+#if HAVE_CUDA_FABRIC
+ uct_cuda_ipc_md_t *ipc_md = ucs_derived_of(md, uct_cuda_ipc_md_t);


per convention in other places, let's rename the parameter from md to uct_md (and ipc_md to md)

brminich · 2024-07-30T07:34:22Z

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

+uct_cuda_ipc_open_memhandle_mempool(uct_cuda_ipc_rkey_t *key,
+ CUdeviceptr *mapped_addr)
+{
+ CUmemAccessDesc access_desc = {};


Suggested change

CUmemAccessDesc access_desc = {};

CUmemAccessDesc access_desc = {};

brminich · 2024-07-30T07:37:29Z

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

+ goto release_va_range;
+ }
+
+ cuCtxGetDevice(&access_desc.location.id);


we need to check error code here

brminich · 2024-07-30T07:38:25Z

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

+ return status;
+ }
+
+ cuCtxGetDevice(&access_desc.location.id);


we need to check error code here
also maybe it is worth to create a small function initializing access_desc? It can be reused by uct_cuda_ipc_open_memhandle_vmm

brminich · 2024-07-30T09:17:45Z

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

@@ -134,8 +269,7 @@ static void uct_cuda_ipc_cache_invalidate_regions(uct_cuda_ipc_cache_t *cache,
 ucs_error("failed to remove address:%p from cache (%s)",
 (void *)region->key.d_bptr, ucs_status_string(status));
 }
- UCT_CUDADRV_FUNC_LOG_ERR(
- cuIpcCloseMemHandle((CUdeviceptr)region->mapped_addr));
+ uct_cuda_ipc_close_memhandle(region);


why not checking return code?

Akshay-Venkatesh

addressed comments

Akshay-Venkatesh · 2024-07-31T16:09:48Z

@yosefe shall I squash commits?

Akshay-Venkatesh · 2024-07-31T17:23:23Z

@yosefe and @brminich I created a branch #10036 by applying https://github.com/openucx/ucx/pull/9982.diff on top of current main branch. Squashing the commits in this branch proved painful after the merges in between. Hope it's ok to ignore this and just merge #10036 into main branch.

yosefe · 2024-08-01T08:27:32Z

@Akshay-Venkatesh it's better to squash and rebase in the current PR. Moving the changes to a new PR makes it harder to review the changes and track history.

Akshay-Venkatesh · 2024-08-13T17:14:06Z

@yosefe

@brminich provided the fix for https://redmine.mellanox.com/issues/4005305 and IMB/OMB tests are passing now. I've pushed the commit now. Let me know if the changes look good and if there is a further squash needed.

FYI, there were some fixes needed in openmpi collective layers and memory detection layers as well. I've started PRs for those and those fixes are needed for OMB/IMB collective tests to pass.

Akshay-Venkatesh · 2024-08-14T04:41:34Z

@yosefe / @brminich errors look unrelated. Would you mind restarting the tests?

brminich

👍 besides minor comment

brminich · 2024-08-14T08:40:26Z

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

+
+ status = uct_cuda_ipc_close_memhandle(region);
+ if (status != UCS_OK) {
+ ucs_error("failed to close memhandle for base addr:%p (%s)",


minor: maybe also print region->key.ph.handle_type

yosefe · 2024-08-14T09:01:25Z

src/uct/cuda/cuda_ipc/cuda_ipc_iface.c

+ int reachable;
+
+ reachable = uct_iface_is_reachable_params_addrs_valid(params) &&
+ (getpid() != *(pid_t*)params->iface_addr) &&
+ uct_iface_scope_is_reachable(tl_iface, params);
+
+#if HAVE_CUDA_FABRIC
+ if (uct_cuda_ipc_iface_is_mnnvl_supported(md)) {
+ return reachable;
+ }
+#endif
+
+ return reachable &&
+ (ucs_get_system_id() == *((const uint64_t*)params->device_addr));


Suggested change

int reachable;

reachable = uct_iface_is_reachable_params_addrs_valid(params) &&

(getpid() != *(pid_t*)params->iface_addr) &&

uct_iface_scope_is_reachable(tl_iface, params);

#if HAVE_CUDA_FABRIC

if (uct_cuda_ipc_iface_is_mnnvl_supported(md)) {

return reachable;

}

#endif

return reachable &&

(ucs_get_system_id() == *((const uint64_t*)params->device_addr));

if (!uct_iface_is_reachable_params_addrs_valid(params) ||

(getpid() == *(pid_t*)params->iface_addr) ||

!uct_iface_scope_is_reachable(tl_iface, params) {

return 0;

};

#if HAVE_CUDA_FABRIC

return uct_cuda_ipc_iface_is_mnnvl_supported(md);

#else

return ucs_get_system_id() == *((const uint64_t*)params->device_addr);

#endif

this does not seem to be correct. If we have HAVE_CUDA_FABRIC and mnvl is not supported (e.g. disabled by env var), this function will always return 0

yosefe · 2024-08-14T09:02:26Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

@@ -320,16 +402,21 @@ uct_cuda_ipc_md_open(uct_component_t *component, const char *md_name,
 .mem_attach = ucs_empty_function_return_unsupported,
 .detect_memory_type = ucs_empty_function_return_unsupported
 };
- uct_md_t *md;
+


remove space line

brminich · 2024-08-15T05:11:11Z

error looks relevant

libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I/__w/1/s/contrib/../src/uct/cuda -I../../.. -DCPU_FLAGS= -I/__w/1/s/contrib/../src -I/home/swx-azure-svc_azpcontainer/85835-20240814.18/build -I/home/swx-azure-svc_azpcontainer/85835-20240814.18/build/src -O3 -g -Wall -Werror -funwind-tables -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch -Wno-language-extension-token -fno-finite-math-only -Wno-recommended-option -Wno-c99-extensions -Wno-pointer-sign -Werror-implicit-function-declaration -Wno-format-zero-length -Wnested-externs -Wshadow -Werror=declaration-after-statement -MT cuda_ipc/libuct_cuda_la-cuda_ipc_cache.lo -MD -MP -MF cuda_ipc/.deps/libuct_cuda_la-cuda_ipc_cache.Tpo -c /__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c  -fPIC -DPIC -DUCX_SHARED_LIB -o cuda_ipc/.libs/libuct_cuda_la-cuda_ipc_cache.o
In file included from /__w/1/s/contrib/../src/ucs/debug/log.h:14:0,
                 from /__w/1/s/contrib/../src/uct/base/uct_iface.h:20,
                 from /__w/1/s/contrib/../src/uct/cuda/base/cuda_md.h:9,
                 from /__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_md.h:10,
                 from /__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.h:14,
                 from /__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c:11:
/__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c: In function 'uct_cuda_ipc_cache_invalidate_regions':
/__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c:285:65: error: 'uct_cuda_ipc_md_handle_t' has no member named 'handle_type'
                       (void *)region->key.d_bptr, region->key.ph.handle_type,
                                                                 ^
/__w/1/s/contrib/../src/ucs/debug/log_def.h:35:84: note: in definition of macro 'ucs_log_component'
                              (ucs_log_level_t)(_level), _comp_log_config, _fmt, ## __VA_ARGS__); \
                                                                                    ^
/__w/1/s/contrib/../src/ucs/debug/log_def.h:44:37: note: in expansion of macro 'ucs_log'
 #define ucs_error(_fmt, ...)        ucs_log(UCS_LOG_LEVEL_ERROR, _fmt, ## __VA_ARGS__)
                                     ^
/__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c:284:13: note: in expansion of macro 'ucs_error'
             ucs_error("failed to close memhandle for base addr:%p type:%d (%s)",
             ^
/__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c: At top level:
cc1: error: unrecognized command line option "-Wno-c99-extensions" [-Werror]

Akshay-Venkatesh · 2024-08-15T15:50:36Z

error looks relevant

libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I/__w/1/s/contrib/../src/uct/cuda -I../../.. -DCPU_FLAGS= -I/__w/1/s/contrib/../src -I/home/swx-azure-svc_azpcontainer/85835-20240814.18/build -I/home/swx-azure-svc_azpcontainer/85835-20240814.18/build/src -O3 -g -Wall -Werror -funwind-tables -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-label -Wno-long-long -Wno-endif-labels -Wno-sign-compare -Wno-multichar -Wno-deprecated-declarations -Winvalid-pch -Wno-language-extension-token -fno-finite-math-only -Wno-recommended-option -Wno-c99-extensions -Wno-pointer-sign -Werror-implicit-function-declaration -Wno-format-zero-length -Wnested-externs -Wshadow -Werror=declaration-after-statement -MT cuda_ipc/libuct_cuda_la-cuda_ipc_cache.lo -MD -MP -MF cuda_ipc/.deps/libuct_cuda_la-cuda_ipc_cache.Tpo -c /__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c  -fPIC -DPIC -DUCX_SHARED_LIB -o cuda_ipc/.libs/libuct_cuda_la-cuda_ipc_cache.o
In file included from /__w/1/s/contrib/../src/ucs/debug/log.h:14:0,
                 from /__w/1/s/contrib/../src/uct/base/uct_iface.h:20,
                 from /__w/1/s/contrib/../src/uct/cuda/base/cuda_md.h:9,
                 from /__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_md.h:10,
                 from /__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.h:14,
                 from /__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c:11:
/__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c: In function 'uct_cuda_ipc_cache_invalidate_regions':
/__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c:285:65: error: 'uct_cuda_ipc_md_handle_t' has no member named 'handle_type'
                       (void *)region->key.d_bptr, region->key.ph.handle_type,
                                                                 ^
/__w/1/s/contrib/../src/ucs/debug/log_def.h:35:84: note: in definition of macro 'ucs_log_component'
                              (ucs_log_level_t)(_level), _comp_log_config, _fmt, ## __VA_ARGS__); \
                                                                                    ^
/__w/1/s/contrib/../src/ucs/debug/log_def.h:44:37: note: in expansion of macro 'ucs_log'
 #define ucs_error(_fmt, ...)        ucs_log(UCS_LOG_LEVEL_ERROR, _fmt, ## __VA_ARGS__)
                                     ^
/__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c:284:13: note: in expansion of macro 'ucs_error'
             ucs_error("failed to close memhandle for base addr:%p type:%d (%s)",
             ^
/__w/1/s/contrib/../src/uct/cuda/cuda_ipc/cuda_ipc_cache.c: At top level:
cc1: error: unrecognized command line option "-Wno-c99-extensions" [-Werror]

@brminich that handle_type field was missing a guard. Can you please approve/reject again?

Akshay-Venkatesh · 2024-08-18T19:41:51Z

@yosefe gentle ping to check if latest commits look good

yosefe · 2024-08-19T12:28:25Z

src/uct/cuda/cuda_ipc/cuda_ipc_iface.c

-
- return reachable &&
- (ucs_get_system_id() == *((const uint64_t*)params->device_addr));
+ /* Not fabric capable or multi-node nvlink disabled, so iface has to be on


space line before

yosefe · 2024-08-19T12:29:02Z

src/uct/cuda/cuda_ipc/cuda_ipc_cache.c

@@ -281,8 +281,14 @@ static void uct_cuda_ipc_cache_invalidate_regions(uct_cuda_ipc_cache_t *cache,

 status = uct_cuda_ipc_close_memhandle(region);
 if (status != UCS_OK) {
+#if HAVE_CUDA_FABRIC
+ ucs_error("failed to close memhandle for base addr:%p type:%d (%s)",


can we put only the access to region->key.ph.handle_type in the ifdef?

Akshay-Venkatesh · 2024-08-21T15:21:42Z

@yosefe @brminich These errors look unrelated.

Akshay-Venkatesh requested review from yosefe and brminich June 28, 2024 17:20

Akshay-Venkatesh marked this pull request as ready for review June 30, 2024 20:08

yosefe mentioned this pull request Jul 4, 2024

UCT/CUDA_IPC: Add VMM/mallocasync support over fabric handles #9787

Open

yosefe reviewed Jul 4, 2024

View reviewed changes

brminich reviewed Jul 4, 2024

View reviewed changes

yosefe reviewed Jul 7, 2024

View reviewed changes

yosefe reviewed Jul 9, 2024

View reviewed changes

brminich reviewed Jul 10, 2024

View reviewed changes

yosefe reviewed Jul 14, 2024

View reviewed changes

src/uct/cuda/cuda_ipc/cuda_ipc_iface.c Outdated Show resolved Hide resolved

src/uct/cuda/cuda_ipc/cuda_ipc_md.h Outdated Show resolved Hide resolved

src/uct/cuda/cuda_ipc/cuda_ipc_md.h Outdated Show resolved Hide resolved

yosefe reviewed Jul 21, 2024

View reviewed changes

yosefe reviewed Jul 26, 2024

View reviewed changes

yosefe reviewed Jul 30, 2024

View reviewed changes

brminich reviewed Jul 30, 2024

View reviewed changes

Akshay-Venkatesh commented Jul 30, 2024

View reviewed changes

brminich previously approved these changes Jul 31, 2024

View reviewed changes

yosefe previously approved these changes Jul 31, 2024

View reviewed changes

Akshay-Venkatesh mentioned this pull request Jul 31, 2024

UCT/CUDA_IPC: Fabric memory support #10036

Closed

Akshay-Venkatesh dismissed yosefe’s stale review via cbf2539 August 1, 2024 19:09

Akshay-Venkatesh dismissed brminich’s stale review via cbf2539 August 1, 2024 19:09

Akshay-Venkatesh force-pushed the topic/cuda-ipc-fabric-support branch from a59aafa to cbf2539 Compare August 1, 2024 19:09

brminich reviewed Aug 14, 2024

View reviewed changes

yosefe reviewed Aug 14, 2024

View reviewed changes

brminich previously approved these changes Aug 15, 2024

View reviewed changes

Akshay-Venkatesh dismissed brminich’s stale review via 4407aa9 August 15, 2024 15:49

brminich previously approved these changes Aug 15, 2024

View reviewed changes

yosefe reviewed Aug 19, 2024

View reviewed changes

Akshay-Venkatesh dismissed brminich’s stale review via 0385a85 August 19, 2024 16:12

yosefe approved these changes Aug 20, 2024

View reviewed changes

UCT/CUDA_IPC: Fabric memory support

a3b7625

Akshay-Venkatesh force-pushed the topic/cuda-ipc-fabric-support branch from 0385a85 to a3b7625 Compare August 20, 2024 16:00

yosefe approved these changes Aug 20, 2024

View reviewed changes

yosefe enabled auto-merge August 20, 2024 16:16

yosefe merged commit f29ba60 into openucx:master Aug 22, 2024
142 checks passed

Akshay-Venkatesh mentioned this pull request Aug 22, 2024

UCT/CUDA_IPC: Cache for mempool import operation #10082

Open

		CUmemFabricHandle vmm; /* VMM export handle */
		CUmemFabricHandle mempool;/* MallocAsync handle */

		CUmemPoolPtrExportData ptr;
		uct_cuda_ipc_key_handle_t handle_type;

		return uct_cuda_ipc_open_memhandle_legacy(key->ph.handle.legacy,
		mapped_addr);

	ucs_error("unable to use cuda_ipc remote_cache hash");
	ucs_error("unable to use cuda_ipc remote_cache hash: %d", khret);

	CUmemAccessDesc access_desc = {};
	CUmemAccessDesc access_desc = {};

UCT/CUDA_IPC: Fabric memory support #9982

UCT/CUDA_IPC: Fabric memory support #9982

Conversation

Akshay-Venkatesh commented Jun 28, 2024 • edited by yosefe Loading

What/Why?

yosefe commented Jul 4, 2024

azure-pipelines bot commented Jul 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Akshay-Venkatesh commented Jul 6, 2024

Choose a reason for hiding this comment

Akshay-Venkatesh commented Jul 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brminich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yosefe commented Jul 14, 2024

Akshay-Venkatesh commented Jul 15, 2024 • edited Loading

Akshay-Venkatesh commented Jul 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Akshay-Venkatesh commented Jul 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Akshay-Venkatesh left a comment

Choose a reason for hiding this comment

Akshay-Venkatesh commented Jul 31, 2024

Akshay-Venkatesh commented Jul 31, 2024

yosefe commented Aug 1, 2024

Akshay-Venkatesh commented Aug 13, 2024

Akshay-Venkatesh commented Aug 14, 2024 • edited Loading

brminich left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brminich commented Aug 15, 2024

Akshay-Venkatesh commented Aug 15, 2024

Akshay-Venkatesh commented Aug 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Akshay-Venkatesh commented Aug 21, 2024

Akshay-Venkatesh commented Jun 28, 2024 •

edited by yosefe

Loading

Akshay-Venkatesh commented Jul 15, 2024 •

edited

Loading

Akshay-Venkatesh commented Aug 14, 2024 •

edited

Loading