Releases · ROCm/aomp

These are the release notes for AOMP 18.0-0. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or/while contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and the use of RPATH for runtime libraries.

For AOMP 18.0-0, the last trunk commit is c3b979e6512b00a5bd9c3e0d4ed986cf500630 on Sept 8, 2023. The last amd-only commit is def7057717b5098f6a9f773fc6e7b2a7f59cdd50 on Sept 11, 2023 . These commits forms a frozen branch now called "aomp-18.0-0". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-18.0-0.

The integrated ROCm components for this AOMP release were built with ROCM 5.6.1 sources.
This is the 1st AOMP release based on LLVM 18 development.

The changes from 17.0-3 to 18.0-0 include:

New driver default (opaque offload linker)
- This driver uses clang-offload-packager to create and extract heterogeneous objects.
- For amdgpu, the final link phase steps through a series of commands instead of making a single call to clang-linker-wrapper. clang-linker-wrapper obscures the process of linking and embedding offload and host objects. To use clang-linker-wrapper, use command line option --no-opaque-offload-linker.
- Fix support for multi-arch.
- Optimizations to remove initial hostexec malloc.
- This driver uses clang-offload-packager to build and extract heterogeneous objects.
Zero copy support for MI300A.
Fixed data_share2 smoke test regression.
Fix new DeviceRTL schedule clause intermittent fail.
Support HIP bundles.
Upstream convergence (3490 lines removed)
- Remove old plugin code.
- Remove the hostRPC code.
- DeviceRTL cleanup - Synchronized threads
Set default OpenMP to 5.1.
Restore safe buffer usage warnings for MIOpen GTest.
Fix build to use LLVM-project mono-repo components, ROCm devicelibs and comgr.

Errata:

smoke tests flang-272343-3 and flang-299043 get seg faults, both have PARALLEL DO with ENTER MAP and EXIT MAP
fprintf intermittent fails (~15%) when writing to open file descriptor, no problems with fprintf to stderr.
The non-default option --no-opaque-offload-linker often fails because of problems with clang-linker-wrapper.

Assets 9

29 Aug 20:06

rocm-ci

rocm-5.6.1

73bc125

rocm-5.6.1

ROCm release v5.6.1

Assets 2

17 Jul 23:39

estewart08

rel_17.0-3

4b021ca

AOMP Release 17.0-3

These are the release notes for AOMP 17.0-3. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or/while contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.

For AOMP 17.0-3, the last trunk commit is ec6b40ab9b577e6e9bf000ccd19d85a9753b6ca8 on JULY 13, 2023. The last amd-only commit is f959ea5d8d1e5aef4b6d06727a9698316d3d33cd on JULY 14, 2023 . These commits form a frozen branch now called "aomp-17.0-3". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-3.

The integrated ROCm components for this AOMP release were built with ROCM 5.6.0 sources.
This is the 4th AOMP release based on LLVM 17 development.
The changes from 17.0-2 to 17.0-3 include:

Non-compiler components are built with ROCm 5.6.0 sources
Support code object version 5. The libomptarget device library is now generated for both code object version 4 and code object version 5.
flang is no longer a symbolic link to clang. A new binary called flang-legacy has the driver support for flang. This is because the clang driver support for flang is going away. The new driver binary is called flang-legacy which uses a frozen set of driver support from ROCm 5.6 now found in the flang repository.
Enabled Big Jump Loop by default.
Improved target teams loop transform.
Removed the link from flang to clang. Replace it with flang-legacy.
Implemented dynamic LDS accesses from non-kernel functions.
Performance improvements for small kernels via lazy HSA queue creation and tracking of busy queues.
Restored GPU_MAX_HW_QUEUES in AMDGPU nextgen plugin.
Extended environment variable ompx_apu_maps to MI200.
Added --archive to the clang-offload-packager which repackages the extracted files into a new static library. This allows a fat binary static library to become a static library for a single architecture.
Disabled PIE in llvm until build issues in centos and sles are resolved.

Errata:

Bug in hip 5.6.0 sources when using code object v5 and -O0 causes program to crash.
flang compilations require -fPIC (need fix in flang-legacy for 17.0-4)
Smoke test failures
fprintf (non-deterministic)
complex_reduction (non-deterministic)
schedule (non-deterministic)
flang-274983
flang-274983-2
xteamr

Assets 9

28 Jun 23:07

rocm-ci

rocm-5.6.0

73bc125

rocm-5.6.0

ROCm release v5.6.0

Assets 2

24 May 17:25

rocm-ci

rocm-5.5.1

f7d7686

rocm-5.5.1

ROCm release v5.5.1

Assets 2

01 May 19:50

rocm-ci

rocm-5.5.0

f7d7686

rocm-5.5.0

ROCm release v5.5.0

Assets 2

28 Apr 22:15

estewart08

rel_17.0-2

85ad992

AOMP Release 17.0-2

These are the release notes for AOMP 17.0-2. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.

For AOMP 17.0-2, the last trunk commit is 921b45a855f09afe99ea9c0c173794ee4ccd5872 on April 27, 2023. The last amd-only commit is ad7b5d7a69c62dab21332cba131054d2b8a713cc on April 26, 2023 . These commits forms a frozen branch now called "aomp-17.0-2". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-2.

The integrated ROCm components for this AOMP release were built with ROCM 5.4.4 sources.
This is the 3rd AOMP release based on LLVM 17 development.
These are the changes from 17.0-1 to 17.0-2 include:

Changed gpurun to set value of both GPU_MAX_HW_QUEUES and LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES to 1 if there is shared use of GPU by multiple mpi ranks. Also, it is set to 1 ONLY if it was not already set by caller.
Added environment variables LIBOMPTARGET_AMDGPU_ KERNEL_BUSYWAIT and LIBOMPTARGET_AMDGPU_DATA_BUSYWAIT to control how much time to wait in an active state for kernel completion and data transfer completion respectively. The default is 0 which means to wait indefinitely in blocked state. If set, and the specified timeout expires, the waiting runtime jumps to waiting for signal in blocked state.
Changed run_babelstream.sh to set LIBOMPTARGET_AMDGPU_KERNEL_BUSYWAIT and LIBOMPTARGET_AMDGPU_DATA_BUSYWAIT to improve performance.
Fixed the amdgpu nextgen plugin to work for cov5 (code object version 5). The default code object version is cov4.
Fixed the amdgpu nextgen plugin to work with OMPT (OpenMP Tools environment).
Fixed the amdgpu nextgen plugin to work for multiple architectures supported in same image. Additional patches needed to support device clause on target region to properly offload to the correct gpu when using different architectures from the same vendor.

Assets 9

14 Apr 13:57

estewart08

rel_17.0-1

1da915a

AOMP Release 17.0-1

These are the release notes for AOMP 17.0-1. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.

For AOMP 17.0-1, the last trunk commit is 3712dd73a1d50b76624ee6a520be2b1ca94c02ee on April 11th, 2023. The last amd-only commit is
1d8def5772d16c64652d68daac1b12af99fe3770 on April 12th, 2023 . These commits forms a frozen branch now called "aomp-17.0-1". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-1.

The integrated ROCm components for this AOMP release were built with ROCM 5.4.4 sources.
This is the 2nd AOMP release based on LLVM 17 development.
These are the changes from 17.0-0 to 17.0-1 include:

Switch to nextgen plugin as default. This has shown significant performance improvements. To revert to the old plugin set LIBOMPTARGET_NEXTGEN_PLUGINS=OFF
Switch from hostrpc to hostexec. hostexec is a significant rewrite of hostrpc. The device hostexec_invoke is now written in OpenMP for portability to other platforms. The names of the wrapper (stub) to run a host function has changed to hostexec() and hostexec_<ReturnType>() . hostexec also uses a global variable to find the transfer payload buffer instead of AMD implicit kernel args. This will support portability of hostexec, printf, and fprintf to other platforms. The update to this device global is made with global variable services in the nextgen plugin.
An example on the use of hostexec to run MPI_Send and MPI_Recv in a target region is given. This example demonstrates how library owners can build a supplemental header file to enable transparent host execution of selected library functions within an OpenMP target regions with the same host interface. This eliminates the need for any source changes in the user code when host execution from a target region is desired. Before hostexec, users would typically have to end their target region, execute a host-only function, then start another target region. This feature significantly increases general purpose computing capabilities of OpenMP on GPGPU platforms.
OMPT target support is incomplete with the nextgen plugin. To use OMPT, set the environment variable LIBOMPTARGET_NEXTGEN_PLUGINS=OFF
Set GPU_MAX_HW_QUEUES in gpurun to 1 when multiple ranks per GPU. This limits GPU concurrency when the GPU is already getting shared usage. This should only set if caller (of gpurun or mpirun) did not already set it. In other words, this should trust the user if they set a value. This will be fixed in next release. Also, OpenMP nextgen plugin does not use GPU_MAX_HW_QUEUES. It uses env variable LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES.
Critical regions created via the critical directive are now more efficient: by relaxing the semantics of locks and combining that with the use of acquire and release fences we can limit the flushing of the GPU caches to every time the lock is acquired instead of at every lock check.
When inlining functions called from the kernel, move allocas for their arguments in the kernel entry block instead of leaving them at launch point.
Respect environment variable to force synchronous target region executions. Available via OMPX_FORCE_SYNC_REGIONS=1.

Errata:

smoke test "schedule" occasionally fails with memory fault or wrong ordering
AMD code object version 5 does not work with nextgen plugin. When testing cov5, use LIBOMPTARGET_NEXTGEN_PLUGINS=OFF

Assets 9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ROCm/aomp

rocm-5.7.1

rocm-5.7.0

AOMP Release 18.0-0

rocm-5.6.1

AOMP Release 17.0-3

rocm-5.6.0

rocm-5.5.1

rocm-5.5.0

AOMP Release 17.0-2

AOMP Release 17.0-1