Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/CORE: Use rcache overhead env for protocol perf estimation. #9847

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

rakhmets
Copy link
Collaborator

What

Removed uct_md_rcache_overhead. The method became unused after merging #8225.
Added logic to use previously unused UCX_RCACHE_OVERHEAD environment variable. The variable can be used to configure the rcache lookup node (applied suggestion #8995 (comment)).
Added UCX_RCACHE_OVERHEAD to [Fujitsu ARM] section to restore adjustments made by #7583.

@@ -1568,6 +1568,9 @@ ucs_status_t ucp_mem_rcache_init(ucp_context_h context,
}
}

context->config.ext.rcache_overhead = ucs_linear_func_make(
ucs_time_units_to_sec(rcache_config->overhead, 50.0e-9), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the prev default was 180e-9, why changed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it is changed to 50 since before that patch that value was mostly used for UCT level rcache, but now it is used for UCP level rcache, and the latest measurements on Selene shown that it is 50 nanoseconds which is exposed inside UCP_RCACHE_LOOKUP_FUNC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we reuse (or delete) UCP_RCACHE_LOOKUP_FUNC?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated macro.

@@ -315,7 +315,7 @@ void ucp_proto_init_memreg_time(const ucp_proto_common_init_params_t *params,
if (context->rcache != NULL) {
perf_node = ucp_proto_perf_node_new_data("rcache lookup", "");

*memreg_time = UCP_RCACHE_LOOKUP_FUNC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only another usage of UCP_RCACHE_LOOKUP_FUNC is inside ucp_wireup_mem_reg_cost, shouldn't we use the value from the config there too? If so, then UCP_RCACHE_LOOKUP_FUNC can be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that it will break wire compatibility.
@brminich fixed similar issue here: #9273.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But shouldn't it be same values? Maybe it worth applying the new value by wire compatible way (with peer version check)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have it defined in wire-compatible way in wireup (ucp_wireup_mem_reg_cost). here wire compatibility is not important.

config/ucx.conf Show resolved Hide resolved
brminich
brminich previously approved these changes Jun 3, 2024
ivankochin
ivankochin previously approved these changes Jun 4, 2024
@rakhmets rakhmets requested a review from yosefe June 4, 2024 12:42
@@ -181,6 +181,8 @@ typedef struct ucp_context_config {
double proto_overhead_rndv_rtr;
double proto_overhead_sw;
double proto_overhead_rkey_ptr;
/** Registration cache lookup overhead estimation */
ucs_linear_func_t rcache_overhead;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe save it as "double" and not a linear func?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to double.

@@ -1568,6 +1568,9 @@ ucs_status_t ucp_mem_rcache_init(ucp_context_h context,
}
}

context->config.ext.rcache_overhead = ucs_linear_func_make(
ucs_time_units_to_sec(rcache_config->overhead, 50.0e-9), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we reuse (or delete) UCP_RCACHE_LOOKUP_FUNC?

@rakhmets rakhmets dismissed stale reviews from ivankochin and brminich via 085488a June 4, 2024 15:50
@yosefe yosefe enabled auto-merge June 5, 2024 06:56
@yosefe yosefe merged commit 18cf1dd into openucx:master Jun 5, 2024
148 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants