Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/IB/MLX5: suggest user to increase PF_LOG_BAR_SIZE when UAR allocation fails #9986

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

michal-shalev
Copy link
Contributor

@michal-shalev michal-shalev commented Jul 1, 2024

What ?

Improve error logging for UAR allocation failure in uct_ib_mlx5_devx_alloc_uar.

Why ?

To inform users to increase PF_LOG_BAR_SIZE and reboot when UAR allocation fails due to insufficient memory.
see also: open-mpi/ompi#6854

How ?

Update the error message in uct_ib_mlx5_devx_alloc_uar to suggest increasing PF_LOG_BAR_SIZE and mention the reboot requirement.

@@ -554,7 +554,9 @@ uct_ib_mlx5_devx_alloc_uar(uct_ib_mlx5_md_t *md, unsigned flags, int log_level,
uar = mlx5dv_devx_alloc_uar(md->super.dev.ibv_context, flags);
if (uar == NULL) {
sprintf(buf, "mlx5dv_devx_alloc_uar(device=%s, flags=0x%x(%s)) "
"failed: %m", uct_ib_device_name(&md->super.dev), flags, title);
"failed: %m. This may be due to insufficient UAR space. "
"Consider increasing PF_LOG_BAR_SIZE from the default value of 5.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. "Consider increasing PF_LOG_BAR_SIZE using mlxconfig tool",
  2. should we also check errno to have more specific error condition?
  3. i think there could be other places that may fail because of the same reason, for example "mlx5dv_devx_obj_create(CQ)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. added the "mlxconfig tool" to the string.
  2. do you mean check "ENOMEM"?
  3. added this, checking for other places

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes

Copy link
Contributor Author

@michal-shalev michal-shalev Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. As you mentioned here: openmpi-3.1.1 np=256 ppn=128 UCX_TLS=rc fail open-mpi/ompi#6854 this requires a reboot so I added it to the string as well.
  2. It looks like mlx5dv_devx_alloc_uar -> _mlx5dv_devx_alloc_uar -> mlx5_attach_dedicated_uar -> mlx5_alloc_dyn_uar -> execute_ioctl -> ioctl
    it returns the errno but I don't know if we can know for sure that it'll set ENOMEM when PF_LOG_BAR_SIZE is insufficient.
  3. mlx5dv_devx_obj_create(CQ) doesn't fail because of PF_LOG_BAR_SIZE, in the email "
    After applying the PF_LOG_BAR_SIZE parameter update".
    I removed the commit for it.

Regarding non-DEVX flows
uct_worker_tl_data_get -> uct_ib_mlx5_iface_get_res_domain -> uct_ib_mlx5_res_domain_init -> ibv_alloc_td -> mlx5_alloc_td -> mlx5_attach_dedicated_uar-> mlx5_alloc_dyn_uar -> execute_ioctl -> ioctl
If that fails due to insufficient PF_LOG_BAR_SIZE, uct_ib_mlx5_res_domain_init function does not fail. Instead, it returns UCS_OK and logs a debug message.
The fallback mechanism uses the PD directly from the MD structure. This ensures continued operation, though with potentially reduced performance, especially for users sending a high volume of messages to different destinations.
This is because the shared UAR may not handle the high message rate as efficiently.

I can add this if ibv_alloc_td fails in uct_ib_mlx5_res_domain_init:
ucs_log(log_level, "Consider increasing PF_LOG_BAR_SIZE using mlxconfig tool");
However, this log might not be noticed because the entire flow does not fail.

Another suggestion is to change the log level from ucs_debug to ucs_warn:
ucs_warn("ibv_alloc_td() on %s failed: %m. Falling back to using the PD directly from the MD structure, which may reduce performance.", uct_ib_device_name(&md->dev))
This would make the suggestion to increase PF_LOG_BAR_SIZE more noticeable and highlight the potential performance impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michal-shalev per our offline discussion,

  • (2) ENOMEM - please check whether it indeed returns.
  • Other flows - IMO generally we shouldn't change the behavior in this PR. Flows should not turn from failing to succeeding as a result of this PR, neither logging levels should change. Only add a suggestion for the user, in the already existing log level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gleon99 I'll write a test that calls mlx5dv_devx_alloc_uar until it fails and checks the errno to make sure what it returns exactly.
Other flows - I'll add a suggestion.

@michal-shalev michal-shalev marked this pull request as draft July 2, 2024 09:48
@michal-shalev michal-shalev added the WIP-DNM Work in progress / Do not review label Jul 2, 2024
@michal-shalev michal-shalev force-pushed the suggest-pf-log-bar-size branch 4 times, most recently from 6964498 to afd177f Compare July 8, 2024 18:37
@michal-shalev michal-shalev marked this pull request as ready for review July 8, 2024 18:46
@michal-shalev michal-shalev removed the WIP-DNM Work in progress / Do not review label Jul 8, 2024
@michal-shalev michal-shalev changed the title UCT/IB/MLX5: suggest user to increase PF_LOG_BAR_SIZE when can't allocate UAR UCT/IB/MLX5: suggest user to increase PF_LOG_BAR_SIZE when UAR allocation fails Jul 8, 2024
@michal-shalev michal-shalev force-pushed the suggest-pf-log-bar-size branch 2 times, most recently from d505e5c to 379d502 Compare July 11, 2024 09:47
@michal-shalev michal-shalev marked this pull request as draft July 11, 2024 12:28
@michal-shalev michal-shalev added the WIP-DNM Work in progress / Do not review label Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP-DNM Work in progress / Do not review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants