Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPUID] Add ISA entries for A64FX and M1 #44194

Merged
merged 5 commits into from
Feb 20, 2022

Conversation

giordano
Copy link
Contributor

On A64FX we get

julia> Base.BinaryPlatforms.CPUID.cpu_isa()
Base.BinaryPlatforms.CPUID.ISA(Set(UInt32[0x00000016, 0x00000004, 0x00000006, 0x00000007, 0x0000000c, 0x00000008]))

which corresponds to the following set

julia> @eval Base.BinaryPlatforms.CPUID JL_AArch64_lse, JL_AArch64_crc, JL_AArch64_rdm, JL_AArch64_aes, JL_AArch64_sha2, JL_AArch64_sve
(0x00000008, 0x00000007, 0x0000000c, 0x00000004, 0x00000006, 0x00000016)

which is literally the armv8.2-a+crypto set + JL_AArch64_sve.

I'd even go as far as removing armv8.4-a+crypto+sve, I don't think there is any existing CPU at the moment with all those capabilities, it isn't very useful there.

Also, what about backporting to v1.6 and v1.7?

@giordano giordano added the system:arm ARMv7 and AArch64 label Feb 16, 2022
@gbaraldi
Copy link
Member

gbaraldi commented Feb 16, 2022

Why add a separate ISA entry just for the a64fx? Is it needed somewhere upstream? Also since I think only the a64fx has that ISA combinations I guess you could add fullfp16 too. Besides that LGTM.

@staticfloat
Copy link
Sponsor Member

I'd even go as far as removing armv8.4-a+crypto+sve, I don't think there is any existing CPU at the moment with all those capabilities, it isn't very useful there.

Agreed; it's probably more useful to define something like armv8.5-a+fullfp16+fp16ml as an Apple M1 spec. Unfortunately, our CPU autodetection code doesn't work yet, so we'd need to improve that on the C side, in order for this to have an effect on the M1.

@KristofferC KristofferC added backport 1.6 Change should be backported to release-1.6 backport 1.7 backport 1.8 Change should be backported to release-1.8 labels Feb 16, 2022
@gbaraldi
Copy link
Member

The M1 Feature detection has been merged #41924, so that should be working right now, just needs the ISA then.

@giordano
Copy link
Contributor Author

giordano commented Feb 16, 2022

Why add a separate ISA entry just for the a64fx? Is it needed somewhere upstream?

This ia mainly used in BinaryBuilder to target specific microarchitectures. We definitely don't want to target all CPUs out there, but we need prototypes of somewhat relevant CPU families (like for the Intel chips above).

Also since I think only the a64fx has that ISA combinations I guess you could add fullfp16 too.

It doesn't have it?

For the record, with #41924, on the base M1 I get

julia> Base.BinaryPlatforms.CPUID.cpu_isa()
Base.BinaryPlatforms.CPUID.ISA(Set(UInt32[0x00000004, 0x00000006, 0x00000007, 0x00000014, 0x0000000c, 0x00000008, 0x00000017]))

which corresponds to the set

julia> @eval Base.BinaryPlatforms.CPUID JL_AArch64_aes, JL_AArch64_sha2, JL_AArch64_crc, JL_AArch64_dotprod, JL_AArch64_rdm, JL_AArch64_lse, JL_AArch64_fp16fml
(0x00000004, 0x00000006, 0x00000007, 0x00000014, 0x0000000c, 0x00000008, 0x00000017)

For reference, features enabled by Apple Clang on this CPU are

"target-features"="+aes,+crc,+crypto,+dotprod,+fp-armv8,+fp16fml,+fullfp16,+lse,+neon,+ras,+rcpc,+rdm,+sha2,+sha3,+sm4,+v8.5a,+zcm,+zcz"

I'll update the PR later

@gbaraldi
Copy link
Member

gbaraldi commented Feb 16, 2022

It doesn't have it?

constexpr auto fujitsu_a64fx = armv8_2a | get_feature_masks(sha2, fullfp16, sve, complxnum);

then this is wrong. Though I checked upstream llvm and I agreed with it.

@giordano
Copy link
Contributor Author

I mean, I showed above what we detect with cpu_isa() and fullfp16

JL_FEATURE_DEF(fullfp16, 9, 0) // HWCAP_FPHP
isn't there, so at very least we can't detect it.

Also, for reference these are the features enabled by the Fujitsu compiler on clang mode (based on LLVM 7):

"target-features"="+crc,+crypto,+fp-armv8,+lse,+neon,+ras,+rdm,+sve,+v8.2a"

fullfp16 isn't here either (while it is on the M1 for Apple Clang, see above)

@yuyichao
Copy link
Contributor

yuyichao commented Feb 16, 2022

@giordano
Copy link
Contributor Author

Alright:

julia> CPUID.test_cpu_feature(CPUID.JL_AArch64_fullfp16)
true

The ARM C/C++ compiler reference says that sve implies fullfp16 (at least for their compiler, but that probably means it's true in general).

I think we need to refine

all_features = last(last(get(ISAs_by_family, normalize_arch(String(Sys.ARCH)), "" => [ISA(Set{UInt32}())]))).features
which at the moment assumes the ISAs are listed in order of increasing capabilities, which is true for our x86_64 family, but not for aarch64, which is more varied.

@giordano
Copy link
Contributor Author

giordano commented Feb 16, 2022

For completeness, these are all the features we can detect on A64FX, among all those we know:

julia> using Base.BinaryPlatforms.CPUID

julia> aarch64_features = filter!(n -> startswith(String(n), "JL_AArch64"), (names(CPUID; all=true)));

julia> filter!(x -> last(x), [(feat, CPUID.test_cpu_feature(getfield(CPUID, feat))) for feat in aarch64_features])
11-element Vector{Tuple{Symbol, Bool}}:
 (:JL_AArch64_aes, 1)
 (:JL_AArch64_ccpp, 1)
 (:JL_AArch64_complxnum, 1)
 (:JL_AArch64_crc, 1)
 (:JL_AArch64_fullfp16, 1)
 (:JL_AArch64_lse, 1)
 (:JL_AArch64_rdm, 1)
 (:JL_AArch64_sha2, 1)
 (:JL_AArch64_sve, 1)
 (:JL_AArch64_v8_1a, 1)
 (:JL_AArch64_v8_2a, 1)

@giordano giordano force-pushed the mg/cpuid-a64fx-isa branch 2 times, most recently from 201fc6c to d485afb Compare February 16, 2022 20:34
base/cpuid.jl Outdated Show resolved Hide resolved
base/cpuid.jl Outdated
@eval function cpu_isa()
return ISA(Set{UInt32}(feat for feat in $(ALL_FEATURES[normalize_arch(String(Sys.ARCH))]) if test_cpu_feature(feat)))
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised we can avoid always recomputing the list of features for the current architecture and just inline it at precompile time. On my laptop, before:

julia> @benchmark CPUID.cpu_isa()
BenchmarkTools.Trial: 10000 samples with 48 evaluations.
 Range (min … max):  895.604 ns … 90.301 μs  ┊ GC (min … max): 0.00% … 98.34%
 Time  (median):     984.552 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.153 μs ±  2.866 μs  ┊ GC (mean ± σ):  9.56% ±  3.80%

   ▃▆█▆▃▁                                                       
  ▇██████▇▆▄▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  896 ns          Histogram: frequency by time            2 μs <

 Memory estimate: 1.41 KiB, allocs estimate: 17.

after:

julia> @benchmark CPUID.cpu_isa()
BenchmarkTools.Trial: 10000 samples with 155 evaluations.
 Range (min … max):  679.871 ns …  18.916 μs  ┊ GC (min … max): 0.00% … 89.46%
 Time  (median):     745.200 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   849.687 ns ± 709.196 ns  ┊ GC (mean ± σ):  4.12% ±  4.95%

  ▁▆█▇▅▃▃▃▃▃▃▂▂▁▁▁                                           ▂▃ ▂
  █████████████████▇█▇▇▇▆▆▇▆▆▆▆▆▆▄▅▅▄▁▅▄▅▅▅▃▁▁▄▁▃▁▁▃▄▃▃▄▃▁▁▄▆██ █
  680 ns        Histogram: log(frequency) by time       1.97 μs <

 Memory estimate: 848 bytes, allocs estimate: 7.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Less allocations, always a good thing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest version:

julia> @benchmark Base.BinaryPlatforms.CPUID.cpu_isa()
BenchmarkTools.Trial: 10000 samples with 196 evaluations.
 Range (min … max):  480.342 ns …  12.980 μs  ┊ GC (min … max): 0.00% … 94.89%
 Time  (median):     527.505 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   598.004 ns ± 670.168 ns  ┊ GC (mean ± σ):  6.65% ±  5.69%

   ▄▆███▇▆▄▄▄▃▄▃▂▂▂▂▁▁▁▁▁▁▁▂▁▁▁▂▁                               ▂
  ▆██████████████████████████████▇▇▇▆█▇█▆▇▆▇▇▇▆▅▅▇▇▅▅▃▆▆▃▆▆▅▄▄▅ █
  480 ns        Histogram: log(frequency) by time          1 μs <

 Memory estimate: 848 bytes, allocs estimate: 7.

I believe it's a bit faster because the new version collects only the features we are interested in, instead of all of those for the given architecture, so we're just doing fewer iterations. The new version is also closer in spirit to what we're currently doing.

@giordano
Copy link
Contributor Author

@yuyichao do you know whether A64FX requires aes? The Fujitsu compiler uses for this chip +crypto, which should imply aes. However I'm on a cluster where

julia> CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)
false

while on Fugaku I get

julia> CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)
true

Also, https://github.com/llvm/llvm-project/blob/97c151de3de0266b896bb01e98b005fb31f6d3cd/llvm/lib/Target/AArch64/AArch64.td#L984-L986 lists only sha2. Fugaku has the chips with 2.0/2.2 GHz, while the other cluster has the 1.8 GHz, I was wondering if it's possible the 1.8 GHz chips don't have aes.

@giordano giordano changed the title [CPUID] Add ISA entry for A64FX [CPUID] Add ISA entries for A64FX and M1 Feb 17, 2022
@yuyichao
Copy link
Contributor

do you know whether A64FX requires aes? The Fujitsu compiler uses for this chip +crypto, which should imply aes. However I'm on a cluster where

As for the spec I only know as much as the llvm and gcc target feature set says....

An independent way to check the feature set would be LD_SHOW_AUXV=1 /bin/true, which should agree with /proc/cpuinfo.

Fugaku has the chips with 2.0/2.2 GHz, while the other cluster has the 1.8 GHz, I was wondering if it's possible the 1.8 GHz chips don't have aes.

I have no idea, but do the two have the same midr? Their values should be available under /sys/devices/system/cpu/cpu<n>/regs/identification/ (or /proc/cpuinfo). It's possible for different chip variance to have different feature set though I sure hope it's at least distinguishable from the cpuid...

@giordano
Copy link
Contributor Author

giordano commented Feb 19, 2022

Isambard:

$ cat /sys/devices/system/cpu/cpu1/regs/identification/midr_el1
0x00000000461f0010
$ LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO_EHDR: 0xffffa28f0000
AT_??? (0x33): 0x1270
AT_HWCAP:        415fe7
AT_PAGESZ:       65536
AT_CLKTCK:       100
AT_PHDR:         0xaaaaaf000040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0xffffa2900000
AT_FLAGS:        0x0
AT_ENTRY:        0xaaaaaf0016e0
AT_UID:          415400694
AT_EUID:         415400694
AT_GID:          415400694
AT_EGID:         415400694
AT_SECURE:       0
AT_RANDOM:       0xffffd9c9a258
AT_EXECFN:       /bin/true
AT_PLATFORM:     aarch64
$ head -n8 /proc/cpuinfo
processor       : 0
BogoMIPS        : 200.00
Features        : fp asimd evtstrm sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve
CPU implementer : 0x46
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0x001
CPU revision    : 0
$ julia -E 'using Base.BinaryPlatforms.CPUID; CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)'
false

So it looks like AES is indeed not here on this chip?

For comparison, on Fugaku:

$ cat /sys/devices/system/cpu/cpu1/regs/identification/midr_el1
0x00000000461f0010
$ LD_SHOW_AUXV=1 /bin/true
AT_SYSINFO_EHDR: 0x400000070000
AT_??? (0x33): 0x1270
AT_HWCAP:        415fff
AT_PAGESZ:       65536
AT_CLKTCK:       100
AT_PHDR:         0xaaaaaaaa0040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0x400000000000
AT_FLAGS:        0x0
AT_ENTRY:        0xaaaaaaaa16e0
AT_UID:          14463
AT_EUID:         14463
AT_GID:          14026
AT_EGID:         14026
AT_SECURE:       0
AT_RANDOM:       0xffffffffe1f8
AT_EXECFN:       /bin/true
AT_PLATFORM:     aarch64
$ head -n8 /proc/cpuinfo
processor       : 0
BogoMIPS        : 200.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve
CPU implementer : 0x46
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0x001
CPU revision    : 0
$ julia -E 'using Base.BinaryPlatforms.CPUID; CPUID.test_cpu_feature(CPUID.JL_AArch64_aes)'
true

@yuyichao
Copy link
Contributor

yuyichao commented Feb 19, 2022

Does seem like it... But I suspect it's mainly a question for Fujitsu. Their online document shows aes instructions in the performance section but didn't seem to mention anywhere in there if the support for it is conditional. Nor does it mention the values for the system registers the same way the one from ARM does...

@giordano
Copy link
Contributor Author

Ok, thanks for confirming it, then I'll remove AES, as it appears not to be always there (and LLVM doesn't seem to require it either).

@KristofferC KristofferC mentioned this pull request Feb 19, 2022
50 tasks
@giordano giordano merged commit f45b6ad into JuliaLang:master Feb 20, 2022
@giordano giordano deleted the mg/cpuid-a64fx-isa branch February 20, 2022 16:17
KristofferC pushed a commit that referenced this pull request Feb 22, 2022
* [CPUID] Rework how current ISA is determined

* [CPUID] Add ISA entry for A64FX

* [CPUID] Add ISA entry for Apple Silicon M1

* [CPUID] Simplify collection of full set of features for architecture

* [CPUID] Remove AES from A64FX ISA, not all chips appear to have it

(cherry picked from commit f45b6ad)
KristofferC pushed a commit that referenced this pull request Feb 23, 2022
* [CPUID] Rework how current ISA is determined

* [CPUID] Add ISA entry for A64FX

* [CPUID] Add ISA entry for Apple Silicon M1

* [CPUID] Simplify collection of full set of features for architecture

* [CPUID] Remove AES from A64FX ISA, not all chips appear to have it

(cherry picked from commit f45b6ad)
@KristofferC KristofferC mentioned this pull request Feb 23, 2022
40 tasks
@KristofferC KristofferC removed the backport 1.8 Change should be backported to release-1.8 label Feb 24, 2022
staticfloat pushed a commit to JuliaCI/julia-buildkite-testing that referenced this pull request Mar 2, 2022
* [CPUID] Rework how current ISA is determined

* [CPUID] Add ISA entry for A64FX

* [CPUID] Add ISA entry for Apple Silicon M1

* [CPUID] Simplify collection of full set of features for architecture

* [CPUID] Remove AES from A64FX ISA, not all chips appear to have it
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
* [CPUID] Rework how current ISA is determined

* [CPUID] Add ISA entry for A64FX

* [CPUID] Add ISA entry for Apple Silicon M1

* [CPUID] Simplify collection of full set of features for architecture

* [CPUID] Remove AES from A64FX ISA, not all chips appear to have it
KristofferC pushed a commit that referenced this pull request Mar 15, 2022
* [CPUID] Rework how current ISA is determined

* [CPUID] Add ISA entry for A64FX

* [CPUID] Add ISA entry for Apple Silicon M1

* [CPUID] Simplify collection of full set of features for architecture

* [CPUID] Remove AES from A64FX ISA, not all chips appear to have it

(cherry picked from commit f45b6ad)
@giordano
Copy link
Contributor Author

For the record, other people have observed the same differences between A64FX on Isambard and Fugaku: archspec/archspec-json#23 At least I'm glad it isn't just julia 🙂

@KristofferC KristofferC removed the backport 1.6 Change should be backported to release-1.6 label May 16, 2022
staticfloat pushed a commit that referenced this pull request Dec 23, 2022
* [CPUID] Rework how current ISA is determined

* [CPUID] Add ISA entry for A64FX

* [CPUID] Add ISA entry for Apple Silicon M1

* [CPUID] Simplify collection of full set of features for architecture

* [CPUID] Remove AES from A64FX ISA, not all chips appear to have it

(cherry picked from commit f45b6ad)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:arm ARMv7 and AArch64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants