Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPUID] Add ISA entries for A64FX and M1 #44194

Merged
merged 5 commits into from
Feb 20, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[CPUID] Rework how current ISA is determined
  • Loading branch information
giordano committed Feb 16, 2022
commit 763146e04a8ccfdf844d5f3def8e710bc21301d8
6 changes: 3 additions & 3 deletions base/Base.jl
Original file line number Diff line number Diff line change
Expand Up @@ -276,9 +276,6 @@ include("weakkeydict.jl")

include("env.jl")

# BinaryPlatforms, used by Artifacts
include("binaryplatforms.jl")

# functions defined in Random
function rand end
function randn end
Expand Down Expand Up @@ -336,6 +333,9 @@ using .Order
include("sort.jl")
using .Sort

# BinaryPlatforms, used by Artifacts. Needs `Sort`.
include("binaryplatforms.jl")

# Fast math
include("fastmath.jl")
using .FastMath
Expand Down
29 changes: 22 additions & 7 deletions base/cpuid.jl
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,9 @@ const ISAs_by_family = Dict(
"aarch64" => [
# Implicit in all sets, because always required: fp, asimd
"armv8.0-a" => ISA(Set{UInt32}()),
"armv8.1-a" => ISA(Set((JL_AArch64_lse, JL_AArch64_crc, JL_AArch64_rdm))),
"armv8.2-a+crypto" => ISA(Set((JL_AArch64_lse, JL_AArch64_crc, JL_AArch64_rdm, JL_AArch64_aes, JL_AArch64_sha2))),
"armv8.4-a+crypto+sve" => ISA(Set((JL_AArch64_lse, JL_AArch64_crc, JL_AArch64_rdm, JL_AArch64_fp16fml, JL_AArch64_aes, JL_AArch64_sha2, JL_AArch64_dotprod, JL_AArch64_sve))),
"armv8.1-a" => ISA(Set((JL_AArch64_v8_1a, JL_AArch64_lse, JL_AArch64_crc, JL_AArch64_rdm))),
"armv8.2-a+crypto" => ISA(Set((JL_AArch64_v8_2a, JL_AArch64_lse, JL_AArch64_crc, JL_AArch64_rdm, JL_AArch64_aes, JL_AArch64_sha2))),
"armv8.4-a+crypto+sve" => ISA(Set((JL_AArch64_v8_4a, JL_AArch64_lse, JL_AArch64_crc, JL_AArch64_rdm, JL_AArch64_fp16fml, JL_AArch64_aes, JL_AArch64_sha2, JL_AArch64_dotprod, JL_AArch64_sve))),
],
"powerpc64le" => [
# We have no way to test powerpc64le features yet, so we're only going to declare the lowest ISA:
Expand Down Expand Up @@ -88,14 +88,29 @@ function normalize_arch(arch::String)
return arch
end

const ALL_FEATURES = let
get_features(prefix::String) =
getfield.(Ref(@__MODULE__), filter(n -> startswith(String(n), prefix), (names(@__MODULE__; all=true))))
giordano marked this conversation as resolved.
Show resolved Hide resolved
Dict(
"i686" => get_features("JL_X86"),
"x86_64" => get_features("JL_X86"),
"armv6l" => get_features("JL_AArch32"),
"armv7l" => get_features("JL_AArch32"),
"aarch64" => get_features("JL_AArch64"),
"powerpc64le" => UInt32[],
)
end

# Use `@eval` to statically determine the list of features for the current architecture.
@eval function cpu_isa()
return ISA(Set{UInt32}(feat for feat in $(ALL_FEATURES[normalize_arch(String(Sys.ARCH))]) if test_cpu_feature(feat)))
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised we can avoid always recomputing the list of features for the current architecture and just inline it at precompile time. On my laptop, before:

julia> @benchmark CPUID.cpu_isa()
BenchmarkTools.Trial: 10000 samples with 48 evaluations.
 Range (min … max):  895.604 ns … 90.301 μs  ┊ GC (min … max): 0.00% … 98.34%
 Time  (median):     984.552 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.153 μs ±  2.866 μs  ┊ GC (mean ± σ):  9.56% ±  3.80%

   ▃▆█▆▃▁                                                       
  ▇██████▇▆▄▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  896 ns          Histogram: frequency by time            2 μs <

 Memory estimate: 1.41 KiB, allocs estimate: 17.

after:

julia> @benchmark CPUID.cpu_isa()
BenchmarkTools.Trial: 10000 samples with 155 evaluations.
 Range (min … max):  679.871 ns …  18.916 μs  ┊ GC (min … max): 0.00% … 89.46%
 Time  (median):     745.200 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   849.687 ns ± 709.196 ns  ┊ GC (mean ± σ):  4.12% ±  4.95%

  ▁▆█▇▅▃▃▃▃▃▃▂▂▁▁▁                                           ▂▃ ▂
  █████████████████▇█▇▇▇▆▆▇▆▆▆▆▆▆▄▅▅▄▁▅▄▅▅▅▃▁▁▄▁▃▁▁▃▄▃▃▄▃▁▁▄▆██ █
  680 ns        Histogram: log(frequency) by time       1.97 μs <

 Memory estimate: 848 bytes, allocs estimate: 7.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Less allocations, always a good thing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest version:

julia> @benchmark Base.BinaryPlatforms.CPUID.cpu_isa()
BenchmarkTools.Trial: 10000 samples with 196 evaluations.
 Range (min … max):  480.342 ns …  12.980 μs  ┊ GC (min … max): 0.00% … 94.89%
 Time  (median):     527.505 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   598.004 ns ± 670.168 ns  ┊ GC (mean ± σ):  6.65% ±  5.69%

   ▄▆███▇▆▄▄▄▃▄▃▂▂▂▂▁▁▁▁▁▁▁▂▁▁▁▂▁                               ▂
  ▆██████████████████████████████▇▇▇▆█▇█▆▇▆▇▇▇▆▅▅▇▇▅▅▃▆▆▃▆▆▅▄▄▅ █
  480 ns        Histogram: log(frequency) by time          1 μs <

 Memory estimate: 848 bytes, allocs estimate: 7.

I believe it's a bit faster because the new version collects only the features we are interested in, instead of all of those for the given architecture, so we're just doing fewer iterations. The new version is also closer in spirit to what we're currently doing.


"""
cpu_isa()

Return the [`ISA`](@ref) (instruction set architecture) of the current CPU.
"""
function cpu_isa()
all_features = last(last(get(ISAs_by_family, normalize_arch(String(Sys.ARCH)), "" => [ISA(Set{UInt32}())]))).features
return ISA(Set{UInt32}(feat for feat in all_features if test_cpu_feature(feat)))
end
cpu_isa

end # module CPUID