-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/CORE: Use rcache overhead env for protocol perf estimation. #9847
Conversation
src/ucp/core/ucp_mm.c
Outdated
@@ -1568,6 +1568,9 @@ ucs_status_t ucp_mem_rcache_init(ucp_context_h context, | |||
} | |||
} | |||
|
|||
context->config.ext.rcache_overhead = ucs_linear_func_make( | |||
ucs_time_units_to_sec(rcache_config->overhead, 50.0e-9), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems the prev default was 180e-9, why changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it is changed to 50 since before that patch that value was mostly used for UCT level rcache, but now it is used for UCP level rcache, and the latest measurements on Selene shown that it is 50 nanoseconds which is exposed inside UCP_RCACHE_LOOKUP_FUNC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we reuse (or delete) UCP_RCACHE_LOOKUP_FUNC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated macro.
@@ -315,7 +315,7 @@ void ucp_proto_init_memreg_time(const ucp_proto_common_init_params_t *params, | |||
if (context->rcache != NULL) { | |||
perf_node = ucp_proto_perf_node_new_data("rcache lookup", ""); | |||
|
|||
*memreg_time = UCP_RCACHE_LOOKUP_FUNC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only another usage of UCP_RCACHE_LOOKUP_FUNC
is inside ucp_wireup_mem_reg_cost
, shouldn't we use the value from the config there too? If so, then UCP_RCACHE_LOOKUP_FUNC
can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shouldn't it be same values? Maybe it worth applying the new value by wire compatible way (with peer version check)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have it defined in wire-compatible way in wireup (ucp_wireup_mem_reg_cost
). here wire compatibility is not important.
src/ucp/core/ucp_context.h
Outdated
@@ -181,6 +181,8 @@ typedef struct ucp_context_config { | |||
double proto_overhead_rndv_rtr; | |||
double proto_overhead_sw; | |||
double proto_overhead_rkey_ptr; | |||
/** Registration cache lookup overhead estimation */ | |||
ucs_linear_func_t rcache_overhead; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe save it as "double" and not a linear func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to double.
src/ucp/core/ucp_mm.c
Outdated
@@ -1568,6 +1568,9 @@ ucs_status_t ucp_mem_rcache_init(ucp_context_h context, | |||
} | |||
} | |||
|
|||
context->config.ext.rcache_overhead = ucs_linear_func_make( | |||
ucs_time_units_to_sec(rcache_config->overhead, 50.0e-9), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we reuse (or delete) UCP_RCACHE_LOOKUP_FUNC?
085488a
to
42b2bd8
Compare
What
Removed
uct_md_rcache_overhead
. The method became unused after merging #8225.Added logic to use previously unused
UCX_RCACHE_OVERHEAD
environment variable. The variable can be used to configure thercache lookup
node (applied suggestion #8995 (comment)).Added
UCX_RCACHE_OVERHEAD
to[Fujitsu ARM]
section to restore adjustments made by #7583.