Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Renoir Zen 2 CPUs #36826

Closed
jebej opened this issue Jul 27, 2020 · 15 comments · Fixed by #36831
Closed

Support for Renoir Zen 2 CPUs #36826

jebej opened this issue Jul 27, 2020 · 15 comments · Fixed by #36831

Comments

@jebej
Copy link
Contributor

jebej commented Jul 27, 2020

On a laptop with a Ryzen 7 4700U CPU running Windows 10, this is what is being reported (also reported on Discourse under WSL2):

julia> versioninfo()
Julia Version 1.5.0-rc2.0
Commit 7f0ee122d7 (2020-07-27 15:24 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 7 4700U with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver1)

julia> LinearAlgebra.versioninfo()
BLAS: libopenblas (OpenBLAS 0.3.9  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott MAX_THREADS=32)
LAPACK: libopenblas64_

As far as I can tell, LLVM 9 should support znver2, and OpenBLAS should as well.

@yuyichao
Copy link
Contributor

#36502

@jebej
Copy link
Contributor Author

jebej commented Jul 27, 2020

This patch seems to have been included in rc2, which was the version I tried...

@yuyichao
Copy link
Contributor

No it's not. Only the bug fix part of it was.

@jebej
Copy link
Contributor Author

jebej commented Jul 27, 2020

I see, so support will only come in 1.6 then?

@yuyichao
Copy link
Contributor

It should be in 1.6.

@jebej
Copy link
Contributor Author

jebej commented Jul 27, 2020

I tried a nighly and got this:

julia> versioninfo()
Julia Version 1.6.0-DEV.548
Commit 9267bbf1fc (2020-07-27 16:57 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 7 4700U with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, btver1)

julia> LinearAlgebra.versioninfo()
BLAS: libopenblas (OpenBLAS 0.3.9  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott MAX_THREADS=32)
LAPACK: libopenblas64_

@yuyichao
Copy link
Contributor

AFAICT our detection code is identical to LLVM's. And previously you saw a "better" result since the detection code misidentify it as zen1 and bypassing LLVM ones. If you can find out your CPUID it'll be pretty easy to add it to the list of znver2 ones.

Also, for the openblas issue, you should report to https://github.com/xianyi/OpenBLAS instead. AFAICT we are using a version witht identical AMD CPU detectiton as the latest master there.

@yuyichao yuyichao reopened this Jul 27, 2020
@jebej
Copy link
Contributor Author

jebej commented Jul 28, 2020

CPUID:

CPUID signature: 860F01
Family: 23 (017h)
Model: 96 (060h)
Stepping: 1 (01h)

Full info uploaded to cpu-world: https://www.cpu-world.com/cgi-bin/CPUID.pl?CPUID=71632

@jebej
Copy link
Contributor Author

jebej commented Jul 28, 2020

Regarding OpenBLAS, that might not necessarily be their fault, on Discourse, the OP installed the package from AUR and the architecture got detected properly:

Update: I reinstalled and reconfigured the libopenblas from AUR, now the system Julia has:

julia> LinearAlgebra.versioninfo()
BLAS: libopenblas (OpenBLAS 0.3.10 NO_AFFINITY USE_OPENMP ZEN MAX_THREADS=12)
LAPACK: liblapack

EDIT: nevermind that, see OpenMathLib/OpenBLAS#2738

yuyichao added a commit that referenced this issue Jul 28, 2020
* Missing feature from Apple A13
* Enable Cortex-A78 and Cortex-X1 on LLVM 11

    llvm/llvm-project@954db63
    https://reviews.llvm.org/D83206

* More relaxed Zen detection: treat all family 23 as Zen* and treat all model >= 0x30 as Zen2.

    GCC uses a similar fallback structure albeit based on feature.
    This should still generate **correct** code since that is always controlled by
    available features. It should be as good a scheduling model estimate as anything else.

    Fix #36826
yuyichao added a commit that referenced this issue Jul 30, 2020
* Missing feature from Apple A13
* Enable Cortex-A78 and Cortex-X1 on LLVM 11

    llvm/llvm-project@954db63
    https://reviews.llvm.org/D83206

* More relaxed Zen detection: treat all family 23 as Zen* and treat all model >= 0x30 as Zen2.

    GCC uses a similar fallback structure albeit based on feature.
    This should still generate **correct** code since that is always controlled by
    available features. It should be as good a scheduling model estimate as anything else.

    Fix #36826
@jebej
Copy link
Contributor Author

jebej commented Jul 30, 2020

Thanks! Do you think the fix could be backported to 1.5?

@yuyichao
Copy link
Contributor

#36831 is for master only so it won't be backported unless is. And you aren't missing out much on 1.5. It's identified as zen 1 which affects the scheduling module a little bit but none of the feature detection are affected.

KristofferC pushed a commit that referenced this issue Aug 4, 2020
* Missing feature from Apple A13
* Enable Cortex-A78 and Cortex-X1 on LLVM 11

    llvm/llvm-project@954db63
    https://reviews.llvm.org/D83206

* More relaxed Zen detection: treat all family 23 as Zen* and treat all model >= 0x30 as Zen2.

    GCC uses a similar fallback structure albeit based on feature.
    This should still generate **correct** code since that is always controlled by
    available features. It should be as good a scheduling model estimate as anything else.

    Fix #36826

(cherry picked from commit cd3fb4d)
@stillyslalom
Copy link
Contributor

I have a Renoir processor that's correctly detected as znver2 by versioninfo(), but it's still described as Prescott in LinearAlgebra.versioninfo() on latest nightly, which leads to terrible BLAS performance:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.0-DEV.580 (2020-08-04)
 _/ |\__'_|_|_|\__'_|  |  Commit 8a6656016b (0 days old master)
|__/

julia> versioninfo()
Julia Version 1.6.0-DEV.580
Commit 8a6656016b (2020-08-04 16:02 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: AMD Ryzen 9 4900HS with Radeon Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, znver2)
Environment:
  JULIA_NUM_THREADS = 8

julia> using LinearAlgebra

julia> LinearAlgebra.versioninfo()
BLAS: libopenblas (OpenBLAS 0.3.9  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott MAX_THREADS=32)

julia> BLAS.set_num_threads(1)

julia> peakflops()/1e9
18.47128632768085

@yuyichao
Copy link
Contributor

yuyichao commented Aug 4, 2020

Yes, that's OpenBLAS issue. Ref OpenMathLib/OpenBLAS#2738 . We need to carry patch and/or bump openblas version.

Also, openblas is in general way too conservative on the dispatch. It appears to never dispatch based on features and only look for exact uarch match without a gental fallback...

@jebej
Copy link
Contributor Author

jebej commented Aug 4, 2020

In the meantime you can use an environment variable as explained on discourse: OPENBLAS_CORETYPE=ZEN.

@stillyslalom
Copy link
Contributor

Can confirm, I'm seeing (roughly) expected speeds now:

julia> BLAS.set_num_threads(1)

julia> peakflops(4000)/1e9
61.15865161036225

julia> BLAS.set_num_threads(8)

julia> peakflops(4000)/1e9
225.9833945035955

julia> BLAS.set_num_threads(16)

julia> peakflops(4000)/1e9
239.07667095448187

simeonschaub pushed a commit to simeonschaub/julia that referenced this issue Aug 11, 2020
* Missing feature from Apple A13
* Enable Cortex-A78 and Cortex-X1 on LLVM 11

    llvm/llvm-project@954db63
    https://reviews.llvm.org/D83206

* More relaxed Zen detection: treat all family 23 as Zen* and treat all model >= 0x30 as Zen2.

    GCC uses a similar fallback structure albeit based on feature.
    This should still generate **correct** code since that is always controlled by
    available features. It should be as good a scheduling model estimate as anything else.

    Fix JuliaLang#36826
KristofferC pushed a commit that referenced this issue Aug 19, 2020
* Missing feature from Apple A13
* Enable Cortex-A78 and Cortex-X1 on LLVM 11

    llvm/llvm-project@954db63
    https://reviews.llvm.org/D83206

* More relaxed Zen detection: treat all family 23 as Zen* and treat all model >= 0x30 as Zen2.

    GCC uses a similar fallback structure albeit based on feature.
    This should still generate **correct** code since that is always controlled by
    available features. It should be as good a scheduling model estimate as anything else.

    Fix #36826

(cherry picked from commit cd3fb4d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants