Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@fastmath maximum segfaults for Float16 on master #49907

Closed
matthias314 opened this issue May 21, 2023 · 8 comments · Fixed by #50639
Closed

@fastmath maximum segfaults for Float16 on master #49907

matthias314 opened this issue May 21, 2023 · 8 comments · Fixed by #50639
Labels
compiler:llvm For issues that relate to LLVM domain:float16 domain:fold sum, maximum, reduce, foldl, etc. kind:regression Regression in behavior compared to a previous version kind:upstream The issue is with an upstream dependency, e.g. LLVM
Milestone

Comments

@matthias314
Copy link
Contributor

matthias314 commented May 21, 2023

On master I get

julia> @fastmath maximum(Float16[1,2,3]; init = Float16(0))

LLVM ERROR: Cannot select: 0xe31218: v16f16 = X86ISD::FMAX nnan ninf nsz arcp contract afn reassoc 0xe4a4b0, 0xe1ddc8, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
  0xe4a4b0: v16f16,ch = CopyFromReg 0x91a8d8, Register:v16f16 %10, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
    0xe28c10: v16f16 = Register %10
  0xe1ddc8: v16f16,ch = CopyFromReg 0x91a8d8, Register:v16f16 %11, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
    0xe1e0a0: v16f16 = Register %11
In function: julia_maximum_fast_16
Complete output
julia> @fastmath maximum(Float16[1,2,3]; init = Float16(0))
LLVM ERROR: Cannot select: 0x258e0d8: v16f16 = X86ISD::FMAX nnan ninf nsz arcp contract afn reassoc 0x25a7370, 0x2572848, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
  0x25a7370: v16f16,ch = CopyFromReg 0x205e688, Register:v16f16 %10, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
    0x2587730: v16f16 = Register %10
  0x2572848: v16f16,ch = CopyFromReg 0x205e688, Register:v16f16 %11, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 @[ reducedim.jl:362 @[ reducedim.jl:357 @[ reducedim.jl:357 @[ reducedim.jl:406 @[ reducedim.jl:406 @[ fastmath.jl:380 @[ fastmath.jl:380 ] ] ] ] ] ] ] ] ] ]
    0x2572b20: v16f16 = Register %11
In function: julia_maximum_fast_19

[92681] signal (6.-6): Aborted
in expression starting at REPL[1]:1
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel15CannotYetSelectEPNS_6SDNodeE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel16SelectCodeCommonEPNS_6SDNodeEPKhj at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel6SelectEPN4llvm6SDNodeE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel22DoInstructionSelectionEv at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE.part.950 at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel20runOnMachineFunctionERN4llvm15MachineFunctionE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE.part.68 at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc14SimpleCompilerclERNS_6ModuleE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
operator() at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:1272
_ZN4llvm3orc14IRCompileLayer4emitESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EENS0_16ThreadSafeModuleE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16IRTransformLayer4emitESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EENS0_16ThreadSafeModuleE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
emit at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:690
_ZN4llvm3orc31BasicIRLayerMaterializationUnit11materializeESt10unique_ptrINS0_29MaterializationResponsibilityESt14default_deleteIS3_EE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc19MaterializationTask3runEv at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm6detail18UniqueFunctionBaseIvJSt10unique_ptrINS_3orc4TaskESt14default_deleteIS4_EEEE8CallImplIPFvS7_EEEvPvRS7_ at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession22dispatchOutstandingMUsEv at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession17OL_completeLookupESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EESt10shared_ptrINS0_23AsynchronousSymbolQueryEESt8functionIFvRKNS_8DenseMapIPNS0_8JITDylibENS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISF_vEEEENSG_ISD_vEENS_6detail12DenseMapPairISD_SI_EEEEEE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc25InProgressFullLookupState8completeESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession19OL_applyQueryPhase1ESt10unique_ptrINS0_21InProgressLookupStateESt14default_deleteIS3_EENS_5ErrorE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS0_10LookupKindERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS8_EENS0_15SymbolLookupSetENS0_11SymbolStateENS_15unique_functionIFvNS_8ExpectedINS_8DenseMapINS0_15SymbolStringPtrENS_18JITEvaluatedSymbolENS_12DenseMapInfoISI_vEENS_6detail12DenseMapPairISI_SJ_EEEEEEEEESt8functionIFvRKNSH_IS6_NS_8DenseSetISI_SL_EENSK_IS6_vEENSN_IS6_SV_EEEEEE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EENS0_15SymbolLookupSetENS0_10LookupKindENS0_11SymbolStateESt8functionIFvRKNS_8DenseMapIS5_NS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISI_vEEEENSJ_IS5_vEENS_6detail12DenseMapPairIS5_SL_EEEEEE at /tmp/julia-d2f5bbd7cf/bin/../lib/julia/libLLVM-15jl.so (unknown line)
addModule at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:1491
jl_add_to_ee at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:1896
_jl_compile_codeinst at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:243
jl_generate_fptr_impl at /cache/build/default-amdci4-2/julialang/julia-master/src/jitlayers.cpp:493
jl_compile_method_internal at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2475 [inlined]
jl_compile_method_internal at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2364
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2880 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:3070
jl_apply at /cache/build/default-amdci4-2/julialang/julia-master/src/julia.h:1961 [inlined]
do_call at /cache/build/default-amdci4-2/julialang/julia-master/src/interpreter.c:125
eval_value at /cache/build/default-amdci4-2/julialang/julia-master/src/interpreter.c:222
eval_stmt_value at /cache/build/default-amdci4-2/julialang/julia-master/src/interpreter.c:173 [inlined]
eval_body at /cache/build/default-amdci4-2/julialang/julia-master/src/interpreter.c:602
jl_interpret_toplevel_thunk at /cache/build/default-amdci4-2/julialang/julia-master/src/interpreter.c:760
jl_toplevel_eval_flex at /cache/build/default-amdci4-2/julialang/julia-master/src/toplevel.c:911
jl_toplevel_eval_flex at /cache/build/default-amdci4-2/julialang/julia-master/src/toplevel.c:854
ijl_toplevel_eval_in at /cache/build/default-amdci4-2/julialang/julia-master/src/toplevel.c:970
eval at ./boot.jl:383 [inlined]
eval_user_input at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
repl_backend_loop at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
#start_repl_backend#46 at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
start_repl_backend at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:228
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2888 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:3070
#run_repl#59 at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:376
run_repl at /cache/build/default-amdci4-2/julialang/julia-master/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:362
jfptr_run_repl_89532.1 at /tmp/julia-d2f5bbd7cf/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2888 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:3070
#997 at ./client.jl:421
jfptr_YY.997_82669.1 at /tmp/julia-d2f5bbd7cf/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2888 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:3070
jl_apply at /cache/build/default-amdci4-2/julialang/julia-master/src/julia.h:1961 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-2/julialang/julia-master/src/builtins.c:812
#invokelatest#2 at ./essentials.jl:863 [inlined]
invokelatest at ./essentials.jl:860 [inlined]
run_main_repl at ./client.jl:405
exec_options at ./client.jl:322
_start at ./client.jl:541
jfptr__start_82698.1 at /tmp/julia-d2f5bbd7cf/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:2888 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-2/julialang/julia-master/src/gf.c:3070
jl_apply at /cache/build/default-amdci4-2/julialang/julia-master/src/julia.h:1961 [inlined]
true_main at /cache/build/default-amdci4-2/julialang/julia-master/src/jlapi.c:582
jl_repl_entrypoint at /cache/build/default-amdci4-2/julialang/julia-master/src/jlapi.c:734
main at /cache/build/default-amdci4-2/julialang/julia-master/cli/loader_exe.c:58
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 2832 (Pool: 2823; Big: 9); GC: 0
Aborted (core dumped)

It works fine with Julia 1.9.0. Float32 and Float64 don't seem to be affected, and without @fastmath it also works for Float16.

Julia Version 1.10.0-DEV.1347
Commit d2f5bbd7cfb (2023-05-20 10:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 4 virtual cores
@giordano
Copy link
Contributor

Likely due to #48153 (CC @mcabbott)

@giordano giordano added kind:regression Regression in behavior compared to a previous version domain:fold sum, maximum, reduce, foldl, etc. domain:float16 labels May 21, 2023
@giordano giordano added this to the 1.10 milestone May 21, 2023
@mcabbott
Copy link
Contributor

mcabbott commented May 21, 2023

Can reproduce. Note that it can be triggered by @fastmath reduce(max, x; init), but not without the init, nor by @fastmath max(x, y):

julia> versioninfo()
Julia Version 1.10.0-DEV.1351
Commit a6ad9ea099f (2023-05-21 08:01 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, broadwell)
  Threads: 5 on 12 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/cuda/lib64
  JULIA_NUM_THREADS = 4

julia> @fastmath max(Float16(1), Float16(2))
Float16(2.0)

julia> @fastmath reduce(max, Float16[1,2,3])
Float16(3.0)

julia> @fastmath reduce(max, Float16[1,2,3]; init = Float16(0))
LLVM ERROR: Cannot select: 0x204fef8: v16f16 = X86ISD::FMAX nnan ninf nsz arcp contract afn reassoc 0x2031f30, 0x1eec5d0, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 ] ] ]
  0x2031f30: v16f16,ch = CopyFromReg 0x1d83098, Register:v16f16 %9, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 ] ] ]

Also triggered by some foldl calls:

julia> foldl(Base.FastMath.max_fast, Float16[1, 2, 3])
LLVM ERROR: Cannot select: 0x248ba98: v16f16 = X86ISD::FMAX nnan ninf nsz arcp contract afn reassoc 0x24abec0, 0x24819c8, array.jl:938 @[ reduce.jl:60 @[ reduce.jl:48 @[ reduce.jl:44 ] ] ]

On the same machine, a version from before #48153 does not have the problem:

julia> @fastmath reduce(max, Float16[1,2,3]; init = Float16(0))
Float16(3.0)

julia> versioninfo()
Julia Version 1.10.0-DEV.220
Commit 9ded051e9f8 (2022-12-29 10:05 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)

On an M1 mac, the problem does not seem to occur:

julia> @fastmath reduce(max, Float16[1,2,3]; init = Float16(0))
Float16(3.0)

julia> versioninfo()
Julia Version 1.10.0-DEV.1351
Commit a6ad9ea099 (2023-05-21 08:01 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.6.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
  Threads: 5 on 4 virtual cores

@gbaraldi
Copy link
Member

This is probably an issue with the demote float16 pass. It would be cool to see the LLVM IR generated on the function that crashes.

@giordano
Copy link
Contributor

julia> @code_llvm Base.FastMath.maximum_fast(Float16[1, 2, 3]; init = Float16(0))
;  @ fastmath.jl:380 within `maximum_fast`
define half @julia_maximum_fast_289([1 x half]* nocapture noundef nonnull readonly align 2 dereferenceable(2) %0, {}* noundef nonnull align 16 dereferenceable(40) %1) #0 {
top:
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #9
  %ppgcstack_i8 = getelementptr i8, i8* %thread_ptr, i64 -8
  %ppgcstack = bitcast i8* %ppgcstack_i8 to {}****
  %pgcstack = load {}***, {}**** %ppgcstack, align 8
  %ptls_field16 = getelementptr inbounds {}**, {}*** %pgcstack, i64 2
  %2 = bitcast {}*** %ptls_field16 to i64***
  %ptls_load1718 = load i64**, i64*** %2, align 8
  %3 = getelementptr inbounds i64*, i64** %ptls_load1718, i64 2
  %safepoint = load i64*, i64** %3, align 8
  fence syncscope("singlethread") seq_cst
  %4 = load volatile i64, i64* %safepoint, align 8
  fence syncscope("singlethread") seq_cst
; ┌ @ fastmath.jl:380 within `#maximum_fast#1`
; │┌ @ reducedim.jl:406 within `reduce`
; ││┌ @ reducedim.jl:406 within `#reduce#811`
; │││┌ @ reducedim.jl:357 within `mapreduce`
      %5 = getelementptr inbounds [1 x half], [1 x half]* %0, i64 0, i64 0
; ││││┌ @ reducedim.jl:357 within `#mapreduce#809`
; │││││┌ @ reducedim.jl:362 within `_mapreduce_dim`
; ││││││┌ @ reduce.jl:44 within `mapfoldl_impl`
; │││││││┌ @ reduce.jl:48 within `foldl_impl`
; ││││││││┌ @ reduce.jl:56 within `_foldl_impl`
; │││││││││┌ @ array.jl:938 within `iterate` @ array.jl:938
; ││││││││││┌ @ essentials.jl:10 within `length`
             %6 = bitcast {}* %1 to { i8*, i64, i16, i16, i32 }*
             %7 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %6, i64 0, i32 1
             %8 = load i64, i64* %7, align 8
; ││││││││││└
; ││││││││││┌ @ int.jl:520 within `<` @ int.jl:513
             %.not = icmp eq i64 %8, 0
; ││││││││││└
            br i1 %.not, label %L19, label %L20

L19:                                              ; preds = %top
            %9 = load half, half* %5, align 2
            br label %L55

L20:                                              ; preds = %top
; ││││││││││┌ @ essentials.jl:13 within `getindex`
             %10 = bitcast {}* %1 to half**
             %11 = load half*, half** %10, align 8
             %12 = load half, half* %11, align 2
; │││││││││└└
; │││││││││ @ reduce.jl:58 within `_foldl_impl`
; │││││││││┌ @ reduce.jl:86 within `BottomRF`
; ││││││││││┌ @ fastmath.jl:251 within `max_fast`
; │││││││││││┌ @ fastmath.jl:191 within `gt_fast`
; ││││││││││││┌ @ fastmath.jl:189 within `lt_fast`
               %13 = load half, half* %5, align 2
; │││││││││││└└
; │││││││││││┌ @ essentials.jl:621 within `ifelse`
              %.inv = fcmp fast olt half %13, %12
              %14 = select fast i1 %.inv, half %12, half %13
; │││││││││└└└
; │││││││││ @ reduce.jl:60 within `_foldl_impl`
; │││││││││┌ @ array.jl:938 within `iterate`
; ││││││││││┌ @ int.jl:520 within `<` @ int.jl:513
             %.not1926.not = icmp eq i64 %8, 1
; ││││││││││└
            br i1 %.not1926.not, label %L55, label %iter.check

iter.check:                                       ; preds = %L20
            %15 = add nsw i64 %8, -1
            %min.iters.check = icmp ult i64 %15, 8
            br i1 %min.iters.check, label %vec.epilog.scalar.ph, label %vector.main.loop.iter.check

vector.main.loop.iter.check:                      ; preds = %iter.check
            %min.iters.check29 = icmp ult i64 %15, 32
            br i1 %min.iters.check29, label %vec.epilog.ph, label %vector.ph

vector.ph:                                        ; preds = %vector.main.loop.iter.check
            %n.vec = and i64 %15, -32
            %minmax.ident.splatinsert = insertelement <16 x half> poison, half %14, i64 0
            %minmax.ident.splat = shufflevector <16 x half> %minmax.ident.splatinsert, <16 x half> poison, <16 x i32> zeroinitializer
            br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
            %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
            %vec.phi = phi <16 x half> [ %minmax.ident.splat, %vector.ph ], [ %22, %vector.body ]
            %vec.phi30 = phi <16 x half> [ %minmax.ident.splat, %vector.ph ], [ %23, %vector.body ]
            %offset.idx = or i64 %index, 1
; ││││││││││┌ @ essentials.jl:13 within `getindex`
             %16 = getelementptr inbounds half, half* %11, i64 %offset.idx
             %17 = bitcast half* %16 to <16 x half>*
             %wide.load = load <16 x half>, <16 x half>* %17, align 2
             %18 = getelementptr inbounds half, half* %16, i64 16
             %19 = bitcast half* %18 to <16 x half>*
             %wide.load31 = load <16 x half>, <16 x half>* %19, align 2
; │││││││││└└
; │││││││││ @ reduce.jl:62 within `_foldl_impl`
; │││││││││┌ @ reduce.jl:86 within `BottomRF`
; ││││││││││┌ @ fastmath.jl:251 within `max_fast`
; │││││││││││┌ @ essentials.jl:621 within `ifelse`
              %20 = fcmp fast olt <16 x half> %vec.phi, %wide.load
              %21 = fcmp fast olt <16 x half> %vec.phi30, %wide.load31
              %22 = select <16 x i1> %20, <16 x half> %wide.load, <16 x half> %vec.phi
              %23 = select <16 x i1> %21, <16 x half> %wide.load31, <16 x half> %vec.phi30
              %index.next = add nuw i64 %index, 32
              %24 = icmp eq i64 %index.next, %n.vec
              br i1 %24, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body
; │││││││││└└└
; │││││││││ @ reduce.jl:60 within `_foldl_impl`
; │││││││││┌ @ array.jl:938 within `iterate`
            %25 = call fast <16 x half> @llvm.maxnum.v16f16(<16 x half> %22, <16 x half> %23)
            %26 = call fast half @llvm.vector.reduce.fmax.v16f16(<16 x half> %25)
            %cmp.n = icmp eq i64 %15, %n.vec
            br i1 %cmp.n, label %L55, label %vec.epilog.iter.check

vec.epilog.iter.check:                            ; preds = %middle.block
            %ind.end36 = or i64 %n.vec, 2
            %ind.end34 = or i64 %n.vec, 1
            %n.vec.remaining = and i64 %15, 24
            %min.epilog.iters.check = icmp eq i64 %n.vec.remaining, 0
            br i1 %min.epilog.iters.check, label %vec.epilog.scalar.ph, label %vec.epilog.ph

vec.epilog.ph:                                    ; preds = %vec.epilog.iter.check, %vector.main.loop.iter.check
            %bc.merge.rdx = phi half [ %14, %vector.main.loop.iter.check ], [ %26, %vec.epilog.iter.check ]
            %vec.epilog.resume.val = phi i64 [ 0, %vector.main.loop.iter.check ], [ %n.vec, %vec.epilog.iter.check ]
            %n.vec33 = and i64 %15, -8
            %ind.end = or i64 %n.vec33, 1
            %ind.end35 = or i64 %n.vec33, 2
            %minmax.ident.splatinsert41 = insertelement <8 x half> poison, half %bc.merge.rdx, i64 0
            %minmax.ident.splat42 = shufflevector <8 x half> %minmax.ident.splatinsert41, <8 x half> poison, <8 x i32> zeroinitializer
            br label %vec.epilog.vector.body

vec.epilog.vector.body:                           ; preds = %vec.epilog.vector.body, %vec.epilog.ph
            %index39 = phi i64 [ %vec.epilog.resume.val, %vec.epilog.ph ], [ %index.next45, %vec.epilog.vector.body ]
            %vec.phi40 = phi <8 x half> [ %minmax.ident.splat42, %vec.epilog.ph ], [ %30, %vec.epilog.vector.body ]
            %offset.idx43 = or i64 %index39, 1
; ││││││││││┌ @ essentials.jl:13 within `getindex`
             %27 = getelementptr inbounds half, half* %11, i64 %offset.idx43
             %28 = bitcast half* %27 to <8 x half>*
             %wide.load44 = load <8 x half>, <8 x half>* %28, align 2
; │││││││││└└
; │││││││││ @ reduce.jl:62 within `_foldl_impl`
; │││││││││┌ @ reduce.jl:86 within `BottomRF`
; ││││││││││┌ @ fastmath.jl:251 within `max_fast`
; │││││││││││┌ @ essentials.jl:621 within `ifelse`
              %29 = fcmp fast olt <8 x half> %vec.phi40, %wide.load44
              %30 = select <8 x i1> %29, <8 x half> %wide.load44, <8 x half> %vec.phi40
              %index.next45 = add nuw i64 %index39, 8
              %31 = icmp eq i64 %index.next45, %n.vec33
              br i1 %31, label %vec.epilog.middle.block, label %vec.epilog.vector.body

vec.epilog.middle.block:                          ; preds = %vec.epilog.vector.body
; │││││││││└└└
; │││││││││ @ reduce.jl:60 within `_foldl_impl`
; │││││││││┌ @ array.jl:938 within `iterate`
            %32 = call fast half @llvm.vector.reduce.fmax.v8f16(<8 x half> %30)
            %cmp.n38 = icmp eq i64 %15, %n.vec33
            br i1 %cmp.n38, label %L55, label %vec.epilog.scalar.ph

vec.epilog.scalar.ph:                             ; preds = %vec.epilog.middle.block, %vec.epilog.iter.check, %iter.check
            %bc.resume.val = phi i64 [ %ind.end, %vec.epilog.middle.block ], [ %ind.end34, %vec.epilog.iter.check ], [ 1, %iter.check ]
            %bc.resume.val37 = phi i64 [ %ind.end35, %vec.epilog.middle.block ], [ %ind.end36, %vec.epilog.iter.check ], [ 2, %iter.check ]
            %bc.merge.rdx46 = phi half [ %32, %vec.epilog.middle.block ], [ %26, %vec.epilog.iter.check ], [ %14, %iter.check ]
            br label %L42

L42:                                              ; preds = %L42, %vec.epilog.scalar.ph
            %33 = phi i64 [ %value_phi628, %L42 ], [ %bc.resume.val, %vec.epilog.scalar.ph ]
            %value_phi628 = phi i64 [ %36, %L42 ], [ %bc.resume.val37, %vec.epilog.scalar.ph ]
            %value_phi527 = phi half [ %37, %L42 ], [ %bc.merge.rdx46, %vec.epilog.scalar.ph ]
; ││││││││││┌ @ essentials.jl:13 within `getindex`
             %34 = getelementptr inbounds half, half* %11, i64 %33
             %35 = load half, half* %34, align 2
; ││││││││││└
; ││││││││││┌ @ int.jl:87 within `+`
             %36 = add nuw nsw i64 %value_phi628, 1
; │││││││││└└
; │││││││││ @ reduce.jl:62 within `_foldl_impl`
; │││││││││┌ @ reduce.jl:86 within `BottomRF`
; ││││││││││┌ @ fastmath.jl:251 within `max_fast`
; │││││││││││┌ @ essentials.jl:621 within `ifelse`
              %.inv20 = fcmp fast olt half %value_phi527, %35
              %37 = select fast i1 %.inv20, half %35, half %value_phi527
; │││││││││└└└
; │││││││││ @ reduce.jl:60 within `_foldl_impl`
; │││││││││┌ @ array.jl:938 within `iterate`
; ││││││││││┌ @ int.jl:520 within `<` @ int.jl:513
             %exitcond.not = icmp eq i64 %value_phi628, %8
; ││││││││││└
            br i1 %exitcond.not, label %L55, label %L42

L55:                                              ; preds = %L42, %vec.epilog.middle.block, %middle.block, %L20, %L19
            %value_phi4 = phi half [ %9, %L19 ], [ %14, %L20 ], [ %26, %middle.block ], [ %32, %vec.epilog.middle.block ], [ %37, %L42 ]
; └└└└└└└└└└
  ret half %value_phi4
}

So the problem is that those halfs are not demoted to floats?

@gbaraldi
Copy link
Member

gbaraldi commented May 21, 2023

I love that llvm creates an intrinsic that it doesn't know how to lower.
What confuses me is that putting the same unoptimized IR into opt-15 doesn't do this.
This is a LLVM-15 regression, @mcabbott's PR just exposed it.
@pchintalapudi

@gbaraldi
Copy link
Member

This is llvm/llvm-project#59258, which got fixed in https://reviews.llvm.org/D139078. I saw that @maleadt was adding some patches so could we get this on as well?

@maleadt
Copy link
Member

maleadt commented Jul 20, 2023

I just finished rebuilding all of LLVM 😅

@gbaraldi
Copy link
Member

I'm so sorry

@giordano giordano added kind:upstream The issue is with an upstream dependency, e.g. LLVM compiler:llvm For issues that relate to LLVM labels Jul 20, 2023
@maleadt maleadt linked a pull request Jul 22, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:llvm For issues that relate to LLVM domain:float16 domain:fold sum, maximum, reduce, foldl, etc. kind:regression Regression in behavior compared to a previous version kind:upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants