Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIP environment variables #3515

Open
wants to merge 6 commits into
base: docs/develop
Choose a base branch
from
Open

HIP environment variables #3515

wants to merge 6 commits into from

Conversation

neon60
Copy link
Contributor

@neon60 neon60 commented Jun 6, 2024

The important HIP env variables.

@neon60 neon60 changed the base branch from develop to docs/develop June 6, 2024 18:02
@neon60 neon60 requested a review from lpaoletti June 6, 2024 18:03
@neon60 neon60 marked this pull request as draft June 6, 2024 18:10
@neon60 neon60 marked this pull request as ready for review June 7, 2024 08:41
@neon60 neon60 force-pushed the hip_env_variables branch 2 times, most recently from 77517f1 to 3fd0c3e Compare June 8, 2024 12:13
@neon60 neon60 force-pushed the hip_env_variables branch 3 times, most recently from 6d6d95e to f7248d5 Compare June 16, 2024 19:26
HIP environment variables
*************************************************************

In this section, the reader can find all the important HIP environment variables. The full collection of the ROCm environment variables is on the :doc:`ROCm environment variables page<rocm:reference/env-variables>`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could use a little clarification as to the environment variables displayed here versus the additional variables shown on the referenced pages. Why are these environment variables worth looking at here, but not the others?

Also, I think the reference needs a space:
:doc:ROCm environment variables page <rocm:reference/env-variables>


* - | ``ROCR_VISIBLE_DEVICES``
| A list of device indices or UUIDs that will be exposed to applications.
- Example: ``0,GPU-DEADBEEFDEADBEEF``

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is GPU-DEADBEEFDEADBEEF and actual UUID or is it intended to represent something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's funny and kudos for the original author. I can change it to a more realistic one.

GPU isolation variables
=======================

The GPU isolation environment variables in HIP are collected in the next table. For more information, check :doc:`GPU isolation page <rocm:conceptual/gpu-isolation>`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link page has 5 environment variables while here we see only three. Why the different numbers and what do these three have that the others lack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMP_DEFAULT_DEVICE variable is for OpenMP runtime, HIP_VISIBLE_DEVICES and CUDA_VISIBLE_DEVICES added as one line.


This documentation set is open source. Consider adding your own tips to this section. They will be reviewed by the AMD ROCm documentation team before being committed to the documentation.

* The performance can be improved at `GROMACS <https://github.com/ROCM/Gromacs>`_ HIP backend, when the ``ROC_ACTIVE_WAIT_TIMEOUT=0`` and ``ROC_USE_FGS_KERNARG=0`` environment variables are set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is interesting to me these tips mention environment variables that are not discussed on this environment variables page. Where do I find out about ROC_ACTIVE_WAIT_TIMEOUT and ROC_USE_FGS_KERNARG?

| 0x2000: Show raw bytes of AQL packet.
| 0x4000: Show code creation debug.
| 0x8000: More detailed command info, including barrier commands.
| 0x10000: Log message location.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more levels here. Please refer to clr/rocclr/device/util/debug.hpp
LOG_MEM = 131072, //!< (0x20000) Memory allocation
LOG_MEM_POOL = 262144, //!< (0x40000) Memory pool allocation, including memory in graphs
LOG_TS = 524288, //!< (0x80000) Timestamp details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Fix typo

Add tipps section.


Minor fixes.
Co-authored-by: Leo Paoletti <[email protected]>

Update docs/reference/env_variables.rst

Co-authored-by: Leo Paoletti <[email protected]>

PR feedbacks

PR feedbacks
Fix typo

Fix minor problems

Remove white space characters from the end of lines

Fix minor problems

Fix units

Minor fix
Fixed AMD_LOG_MASK
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants