[Tracking] ROCm packages #197885

Madouura · 2022-10-26T10:38:47Z

Tracking issue for ROCm derivations.

Key

Package
- Dependencies

WIP

Ready

TODO

Add CUDA options to all derivations that can use a CUDA backend
Implement tests using a system like nixos/tests: add ROCm and amdvlk tests #200757
- Use the solution from ginkgo-hpc: init at 1.6.0 #261155
Implement building any missed documentation
Implement ROCm into tensorflow
Add hipBLASLt (required by ZLUDA: init at 3 #288644 (comment))
Optimize closures
- rocmPackages: drop symlinks & split outputs #276846
- llvmPackages_rocm.llvm explore splitting the output #242401
Build failure: rocblas times out on hydra #301937

Merged

ROCm-related

Notes

Update command: nix-shell maintainers/scripts/update.nix --argstr commit true --argstr keep-going true --arg predicate '(path: pkg: builtins.elem (pkg.pname or null) [ "rocm-llvm-llvm" "rocm-core" "rocm-cmake" "rocm-thunk" "rocm-smi" "rocm-device-libs" "rocm-runtime" "rocm-comgr" "rocminfo" "clang-ocl" "rdc" "rocm-docs-core" "hip-common" "hipcc" "clr" "hipify" "rocprofiler" "roctracer" "rocgdb" "rocdbgapi" "rocr-debug-agent" "rocprim" "rocsparse" "rocthrust" "rocrand" "rocfft" "rccl" "hipcub" "hipsparse" "hipfort" "hipfft" "tensile" "rocblas" "rocsolver" "rocwmma" "rocalution" "rocmlir" "hipsolver" "hipblas" "miopengemm" "composable_kernel" "half" "miopen" "migraphx" "rpp-hip" "mivisionx-hip" "hsa-amd-aqlprofile-bin" ])'

Won't implement

ROCmValidationSuite
- Too many assumptions, not going to rewrite half the cmake files
rocm_bandwidth_test
- Not really needed, will implement on request
atmi
- Out-of-date
aomp
- We basically already do this
Implement strictDeps for all derivations
- Seems pointless now and I don't see many other derivations doing this

The text was updated successfully, but these errors were encountered:

Madouura · 2022-10-30T15:11:45Z

Updating to 5.3.1, marking all WIP until pushed to their respective PRs and verified.

Madouura · 2022-10-30T17:33:26Z

~~If anyone is interested in helping me debug rocBLAS, here's the current derivation~~
Already fixed.

Flakebi · 2022-10-31T09:46:25Z

Hi, thanks a lot for your work on ROCm packages!

So far, the updates where all aggregated in a single rocm: 5.a.b -> 5.x.y pr. I think that makes more sense than splitting the package updates into single prs for a couple of reasons:

Often, packages have backward- (and forward-) incompatible changes, i.e. a the 5.3.0 version of rocm-runtime only works with 5.3.0 of rocm-comgr, but not with 5.2.0 or 5.4.0 (made up example).
Nobody tests a mixture of versions, i.e. only all packages at the same version are known to work.
If I want to test hip, OpenCL and other things for an update, it’s easier to do it one time (and compile everything a single time), rather than 10 times.

tl;dr, do you mind merging all your 5.3.1 updates into a single PR?

PS: Not sure how you did the update, I usually do it with for f in rocm-smi rocm-cmake rocm-thunk rocm-runtime rocm-opencl-runtime rocm-device-libs rocm-comgr rocclr rocminfo llvmPackages_rocm.llvm hip; nix-shell maintainers/scripts/update.nix --argstr commit true --argstr package $f; end.

Madouura · 2022-10-31T09:50:00Z

I was actually afraid of the opposite being true so I split them up.
Got it, I'll aggregate them.
Thanks for the tip on the update script, that would have saved me a lot of time.

Madouura · 2022-10-31T09:51:08Z

Hip I think should stay separate though, since there are other changes.
Actually never mind it's just an extra dependency so should be fine to split it.

kurnevsky · 2023-10-23T15:46:35Z

ok, thanks. I assume you use a different GPU? Maybe it's a problem specifically with 7900 XTX...

Madouura · 2023-10-23T15:48:18Z

It's possible your GPU may not be fully supported yet.
I believe your GPU is GFX11? I wonder if that's why.

kurnevsky · 2023-10-23T15:54:12Z

I believe your GPU is GFX11?

Yes.

Madouura · 2023-10-28T01:31:50Z

New tensorflow-rocm WIP at Madouura@344aa78.
Current blocking factor is an LLVM mismatch.
Most likely, tensorflow 2.13.0 isn't nearly up-to-date enough with rocm 5.7.1.

Madouura · 2023-10-30T02:13:35Z

@Flakebi I have some basic impureTests stuff at https://github.com/Madouura/nixpkgs/blob/pr/rocm/pkgs/development/rocm-modules/5/rocm-thunk/generic.nix as well as some other stuff.
Tell me if you think this is the best way to go forward please.

Flakebi · 2023-10-31T17:35:18Z

Nice!
I think we shouldn’t add anything to <package>.tests that is not also runnable as a (pure) nix test because these get parsed by scripts and bots.
Why not set the testScript = "${rocmPackages_5.rocm-smi-variants.shared}/bin/rocm-smi"?
That would be easier to build most tests :)

I think the rocminfo test can check the output that it actually detected something (like rocminfo | grep -E 'Device Type: +GPU' and rocm_agent_enumerator | grep -E 'gfx[^0]'). That makes sure we don’t ship something that’s unable to find GPUs.

Madouura · 2023-11-15T13:47:26Z

I'm going to take a bit of a break from ROCm and work on another project.
I'll try to work on the major updates/upgrades here and there, but until early-mid next year the other project is going to be my focus.
If there's any major issues or if you just need something explained, don't hesitate to ping me.

gjz010 · 2023-12-15T19:33:07Z

Hi. Thanks for maintaining rocm for nix!

When I try to use torchWithRocm I got the following error:

MIOpen(HIP): Error [Compile] 'hiprtcCompileProgram(prog.get(), c_options.size(), c_options.data())' naive_conv.cpp: HIPRTC_ERROR_COMPILATION (6)
MIOpen(HIP): Error [BuildHip] HIPRTC status = HIPRTC_ERROR_COMPILATION (6), source file: naive_conv.cpp
MIOpen(HIP): Warning [BuildHip] hip runtime failed to load.
Error: Please provide architecture for which code is to be generated.
MIOpen Error: /build/source/src/hipoc/hipoc_program.cpp:304: Code object build failed. Source: naive_conv.cpp

Any idea what should be in the environment? I tried adding recent meta.rocm-all but it didn't help.

Same problem here with same GPU (7900 XTX). After some strace on your minimal example I noticed that:

openat(AT_FDCWD, "/nix/store/mkih90ygzxczv4k0fn6gapgi7i7wy292-rocm-llvm-libunwind-5.7.1/lib/libamdhip64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
...
openat(AT_FDCWD, "./libamdhip64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "./libamdhip64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
write(2, "MIOpen(HIP): Error [Compile] 'hi"..., 145MIOpen(HIP): Error [Compile] 'hiprtcCompileProgram(prog.get(), c_options.size(), c_options.data())' naive_conv.cpp: HIPRTC_ERROR_COMPILATION (6)
) = 145

It appears that some important libamdhip64.so is not added to runtime library path:

ls $(dirname $(nix-shell -p rocmPackages.meta.rocm-hip-runtime --run "which hipcc"))/../lib/libamdhip64.so
# /nix/store/09ic1qizx0aacml0vi83k9lgq23fz0wg-rocm-hip-runtime-meta/bin/../lib/libamdhip64.so

By setting environment variable manually:

export LD_LIBRARY_PATH=/nix/store/bz15zrilgr04ghdiz4cd73sam5wvmhhw-clr-5.7.1/lib/

The problem is temporarily fixed and I can now run Stable Diffusion WebUI.

Madouura · 2023-12-17T09:49:38Z

ROCm 6.0.0 has been released.
rocmPackages_5 is now in maintenance-mode.
I will eventually backport the changes I am making with rocmPackages_6 to rocmPackages_5, however it is not a high priority.

kurnevsky · 2023-12-17T22:26:52Z

By setting environment variable manually

Interesting - now pytorch works for me, but it doesn't seem to work correctly. I'm trying to generate an image from sdxl+lora with diffusers, and it generates an incorrect image...

I tried identical code and model with manually defined seeds in google colab with cuda - it works there. Also seems to work locally on cpu with f32 types.

(or it might be some problem in one of the libs, since locally I use all python libs from nix)

sersorrel · 2024-02-07T23:28:44Z

The export LD_LIBRARY_PATH=/nix/store/...-clr-5.7.1/lib solution fixed the same torchWithRocm problem for me, also with a 7900 XTX. I couldn't see how you got that path – it's returned by nix build --print-out-paths nixpkgs#rocmPackages.clr, right?

ScatteredRay · 2024-02-16T02:09:18Z

Hey, giving this a try. Still very much WIP, but it's working so far for my current project.

dwf · 2024-03-21T23:16:19Z

@Madouura First, thanks for all your work on this front.

You left a comment to the effect that rocBLASLt is "Very broken with Tensile at the moment, only supports GFX9". It looks like other platforms might be supported now, but I wondered if you might be able to elaborate with the "very broken with Tensile" part. I notice that they ship a vendored "Tensilelite", was that what you were trying to use?

Any pointers you have on how I might manage to build this would be useful. I'm currently eyeing the rocBLAS derivation as a potentially good starting point.

Edit: no longer a priority for me

yshui · 2024-04-01T16:13:59Z

pytorch now fails to build after 5 -> 6 transition, because it depends on miopengemm which was removed.

SomeoneSerge · 2024-04-09T23:06:31Z

I edited the description to add an entry for rocblaslt. It's, apparently, a dependency for zluda

jalil-salame · 2024-05-19T17:53:14Z

Apparently pytorch now requires hipBLASLt:

python3.11-torch> CMake Error at cmake/public/LoadHIP.cmake:37 (find_package):
python3.11-torch>   By not providing "Findhipblaslt.cmake" in CMAKE_MODULE_PATH this project
python3.11-torch>   has asked CMake to find a package configuration file provided by
python3.11-torch>   "hipblaslt", but CMake did not find one.
python3.11-torch>   Could not find a package configuration file provided by "hipblaslt" with
python3.11-torch>   any of the following names:
python3.11-torch>     hipblasltConfig.cmake
python3.11-torch>     hipblaslt-config.cmake
python3.11-torch>   Add the installation prefix of "hipblaslt" to CMAKE_PREFIX_PATH or set
python3.11-torch>   "hipblaslt_DIR" to a directory containing one of the above files.  If
python3.11-torch>   "hipblaslt" provides a separate development package or SDK, be sure it has
python3.11-torch>   been installed.
python3.11-torch> Call Stack (most recent call first):
python3.11-torch>   cmake/public/LoadHIP.cmake:160 (find_package_and_print_version)
python3.11-torch>   cmake/Dependencies.cmake:1258 (include)
python3.11-torch>   CMakeLists.txt:754 (include)
python3.11-torch>
python3.11-torch> -- Configuring incomplete, errors occurred!

ony · 2024-06-16T06:34:30Z

As per pytorch/pytorch#119081 (comment) in 2.4.0+ (future release) it should be possible to use something like:

  pythonPackagesExtensions = prev.pythonPackagesExtensions ++ [
    (python-final: python-prev: {
      torch = python-prev.torch.overrideDerivation (oldAttrs: {
        TORCH_BLAS_PREFER_HIPBLASLT = 0;  # not yet in nixpkgs
      });
    })
  ];

AngryLoki · 2024-06-20T01:31:40Z

@ony , TORCH_BLAS_PREFER_HIPBLASLT is environment variable for runtime; pytorch still links and requires hipblaslt, even when unused. pytorch/pytorch#120551 should help, but I have no idea whether and when it could be accepted.

By the way, hipblaslt is not difficult to build. Just don't build 6.0 release, skip directly to 6.1. When I tried, bundled TensileLine in 6.0 generated wall of unreadable errors, while 6.1 worked from first attempt.

nixos-discourse · 2024-07-11T23:15:50Z

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/testing-gpu-compute-on-amd-apu-nixos/47060/4

Madouura added 9.needs: package (new) 6.topic: hardware labels Oct 26, 2022

Madouura self-assigned this Oct 26, 2022

Madouura changed the title ~~[Tracking] ROCM packages~~ [Tracking] ROCm packages Oct 26, 2022

Madouura mentioned this issue Oct 31, 2022

rocm-related: 5.3.0 -> 5.3.1 & aggregate #198770

Merged

13 tasks

Madouura mentioned this issue Oct 24, 2023

Make openai-triton/pytorch free again (and various fixes) #263048

Merged

13 tasks

ConnorBaker added the 6.topic: rocm label Dec 3, 2023

Madouura mentioned this issue Dec 17, 2023

rocmPackages_5: disallow further version upgrades, fixup links #274980

Merged

13 tasks

SomeoneSerge mentioned this issue Dec 26, 2023

rocmPackages: drop symlinks & split outputs #276846

Open

ca5ua1 mentioned this issue Jan 31, 2024

Update request: rocmPackages.hipblas 5.7.1 → 6.0.0 #285279

Closed

1 task

mschwaig mentioned this issue Feb 10, 2024

rocmPackages.* 5.7.1→ 6.0.2 #287846

Merged

13 tasks

ScatteredRay mentioned this issue Feb 16, 2024

rocmPackages_6: init at 6.0.2(?) #289187

Closed

13 tasks

dwf mentioned this issue Mar 30, 2024

rocmPackages_[56].miopen tests fail to build, rocmPackages_6.miopen is broken #299156

Open

dwf mentioned this issue Apr 8, 2024

Update request: jaxlib with ROCm support #302676

Open

errnoh mentioned this issue Apr 10, 2024

ZLUDA: init at 3 #288644

Merged

13 tasks

samueldr added the 5. scope: tracking Long-lived issue tracking long-term fixes or multiple sub-problems label Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking] ROCm packages #197885

[Tracking] ROCm packages #197885

Madouura commented Oct 26, 2022 •

edited by SomeoneSerge

Loading

Madouura commented Oct 30, 2022 •

edited

Loading

Madouura commented Oct 30, 2022 •

edited

Loading

Flakebi commented Oct 31, 2022

Madouura commented Oct 31, 2022

Madouura commented Oct 31, 2022 •

edited

Loading

kurnevsky commented Oct 23, 2023

Madouura commented Oct 23, 2023 •

edited

Loading

kurnevsky commented Oct 23, 2023 •

edited

Loading

Madouura commented Oct 28, 2023 •

edited

Loading

Madouura commented Oct 30, 2023 •

edited

Loading

Flakebi commented Oct 31, 2023

Madouura commented Nov 15, 2023

gjz010 commented Dec 15, 2023

Madouura commented Dec 17, 2023 •

edited

Loading

kurnevsky commented Dec 17, 2023 •

edited

Loading

sersorrel commented Feb 7, 2024

ScatteredRay commented Feb 16, 2024

dwf commented Mar 21, 2024 •

edited

Loading

yshui commented Apr 1, 2024

SomeoneSerge commented Apr 9, 2024

jalil-salame commented May 19, 2024

ony commented Jun 16, 2024

AngryLoki commented Jun 20, 2024

nixos-discourse commented Jul 11, 2024

[Tracking] ROCm packages #197885

[Tracking] ROCm packages #197885

Comments

Madouura commented Oct 26, 2022 • edited by SomeoneSerge Loading

Key

WIP

Ready

TODO

Merged

ROCm-related

Notes

Won't implement

Madouura commented Oct 30, 2022 • edited Loading

Madouura commented Oct 30, 2022 • edited Loading

Flakebi commented Oct 31, 2022

Madouura commented Oct 31, 2022

Madouura commented Oct 31, 2022 • edited Loading

kurnevsky commented Oct 23, 2023

Madouura commented Oct 23, 2023 • edited Loading

kurnevsky commented Oct 23, 2023 • edited Loading

Madouura commented Oct 28, 2023 • edited Loading

Madouura commented Oct 30, 2023 • edited Loading

Flakebi commented Oct 31, 2023

Madouura commented Nov 15, 2023

gjz010 commented Dec 15, 2023

Madouura commented Dec 17, 2023 • edited Loading

kurnevsky commented Dec 17, 2023 • edited Loading

sersorrel commented Feb 7, 2024

ScatteredRay commented Feb 16, 2024

dwf commented Mar 21, 2024 • edited Loading

yshui commented Apr 1, 2024

SomeoneSerge commented Apr 9, 2024

jalil-salame commented May 19, 2024

ony commented Jun 16, 2024

AngryLoki commented Jun 20, 2024

nixos-discourse commented Jul 11, 2024

Madouura commented Oct 26, 2022 •

edited by SomeoneSerge

Loading

Madouura commented Oct 30, 2022 •

edited

Loading

Madouura commented Oct 30, 2022 •

edited

Loading

Madouura commented Oct 31, 2022 •

edited

Loading

Madouura commented Oct 23, 2023 •

edited

Loading

kurnevsky commented Oct 23, 2023 •

edited

Loading

Madouura commented Oct 28, 2023 •

edited

Loading

Madouura commented Oct 30, 2023 •

edited

Loading

Madouura commented Dec 17, 2023 •

edited

Loading

kurnevsky commented Dec 17, 2023 •

edited

Loading

dwf commented Mar 21, 2024 •

edited

Loading