Add an LLVM pass to remove Julia-specific address space information. #35645

maleadt · 2020-04-29T17:12:12Z

julia> @code_llvm debuginfo=:none [1]

define nonnull %jl_value_t addrspace(10)* @julia_vect_135(i64) {
top:
  %1 = call %jl_value_t addrspace(10)* inttoptr (i64 140092205758784 to %jl_value_t addrspace(10)* (%jl_value_t addrspace(10)*, i64)*)(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 140091990275760 to %jl_value_t*) to %jl_value_t addrspace(10)*), i64 1)
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to i64 addrspace(13)* addrspace(11)*
  %4 = load i64 addrspace(13)*, i64 addrspace(13)* addrspace(11)* %3, align 8
  store i64 %0, i64 addrspace(13)* %4, align 8
  ret %jl_value_t addrspace(10)* %1
}

becomes

define nonnull %jl_value_t* @julia_vect_353(i64) {
top:
  %1 = call %jl_value_t* inttoptr (i64 140357775323744 to %jl_value_t* (%jl_value_t*, i64)*)(%jl_value_t* inttoptr (i64 140357553911312 to %jl_value_t*), i64 1)
  %2 = bitcast %jl_value_t* %1 to i64**
  %3 = load i64*, i64** %2, align 8
  store i64 %0, i64* %3, align 8
  ret %jl_value_t* %1
}

Maintains target-specific address spaces:

julia> function foo(a)
           @inbounds a[1] = 1
           return
       end
foo (generic function with 1 method)

julia> CUDAnative.code_llvm(foo, Tuple{CuDeviceArray{Float32,2,AS.Global}; debuginfo=:none)

define dso_local void @julia_foo_4061({ [2 x i64], i64 } addrspace(11)* nocapture nonnull readonly dereferenceable(24)) {
top:
  %1 = getelementptr inbounds { [2 x i64], i64 }, { [2 x i64], i64 } addrspace(11)* %0, i64 0, i32 1
  %2 = bitcast i64 addrspace(11)* %1 to float* addrspace(11)*
  %3 = load float*, float* addrspace(11)* %2, align 8
  %4 = addrspacecast float* %3 to float addrspace(1)*
  store float 1.000000e+00, float addrspace(1)* %4, align 4
  ret void
}

becomes:

define dso_local void @julia_foo_4055({ [2 x i64], i64 }* nocapture nonnull readonly dereferenceable(24)) {
top:
  %1 = getelementptr inbounds { [2 x i64], i64 }, { [2 x i64], i64 }* %0, i64 0, i32 1
  %2 = bitcast i64* %1 to float**
  %3 = load float*, float** %2, align 8
  %4 = addrspacecast float* %3 to float addrspace(1)*
  store float 1.000000e+00, float addrspace(1)* %4, align 4
  ret void
}

vchuravy

I like how short this is! Would be good if we can run @code_llvm with AS and without so that in the case one needs to debug AS information it is still easily accessible.

src/aotcompile.cpp

maleadt · 2020-04-30T10:45:30Z

Would be good if we can run @code_llvm with AS and without so that in the case one needs to debug AS information it is still easily accessible.

I'm not sure it's worth a dedicated option, like we don't have one for e.g. not lowering GC intrinsics (which are similarly internal to Julia's IR).

maleadt · 2020-04-30T14:06:44Z

Should be good to go, so if people want to review. I'd like this to be part of 1.5 as it is necessary for the SPIR-V compiler (and I'd rather not patch yet another back-end). Related to that, this PR should make it possible to remove those address space-related LLVM patches we've been carrying.

Keno · 2020-04-30T18:44:02Z

This is probably fine, but I do have to say I'm not a fan of all the extra memory traffic and latency this introduces.

vchuravy · 2020-04-30T19:03:36Z

If we switch to GlobalISel we might have to do this (https://www.youtube.com/watch?v=Oj1BNoL1jpM&t=143s) or backends need to generally start documenting which AddressSpaces are okay to be used by frontends. We can turn this off by default, but on the other hand it makes the IR much nicer to read ;)

maleadt · 2020-04-30T21:06:45Z

Yeah, the pass is not cheap:, around 2% of total LLVM pass time:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 33.9226 seconds (33.9245 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   7.0328 ( 22.7%)   0.6318 ( 21.1%)   7.6646 ( 22.6%)   7.6644 ( 22.6%)  X86 DAG->DAG Instruction Selection
   2.2434 (  7.3%)   0.5846 ( 19.5%)   2.8279 (  8.3%)   2.8283 (  8.3%)  X86 Assembly Printer
   2.3667 (  7.7%)   0.1848 (  6.2%)   2.5515 (  7.5%)   2.5501 (  7.5%)  Combine redundant instructions
   1.4999 (  4.8%)   0.1041 (  3.5%)   1.6040 (  4.7%)   1.6035 (  4.7%)  Greedy Register Allocator
   1.1356 (  3.7%)   0.0420 (  1.4%)   1.1776 (  3.5%)   1.1762 (  3.5%)  Global Value Numbering
   1.0332 (  3.3%)   0.0721 (  2.4%)   1.1052 (  3.3%)   1.1043 (  3.3%)  Induction Variable Simplification
   0.9135 (  3.0%)   0.0449 (  1.5%)   0.9584 (  2.8%)   0.9575 (  2.8%)  Loop Strength Reduction
   0.8980 (  2.9%)   0.0308 (  1.0%)   0.9288 (  2.7%)   0.9275 (  2.7%)  Combine redundant instructions #2
   0.7757 (  2.5%)   0.0312 (  1.0%)   0.8069 (  2.4%)   0.8058 (  2.4%)  Combine redundant instructions #3
-- 0.6786 (  2.2%)   0.0724 (  2.4%)   0.7510 (  2.2%)   0.7501 (  2.2%)  Remove IR address space information.
   0.6703 (  2.2%)   0.0741 (  2.5%)   0.7444 (  2.2%)   0.7429 (  2.2%)  Machine Instruction Scheduler
   0.5619 (  1.8%)   0.0247 (  0.8%)   0.5866 (  1.7%)   0.5858 (  1.7%)  Loop Invariant Code Motion
   0.3563 (  1.2%)   0.0301 (  1.0%)   0.3864 (  1.1%)   0.3852 (  1.1%)  CodeGen Prepare
   0.3377 (  1.1%)   0.0147 (  0.5%)   0.3525 (  1.0%)   0.3517 (  1.0%)  SLP Vectorizer
   0.3007 (  1.0%)   0.0365 (  1.2%)   0.3372 (  1.0%)   0.3362 (  1.0%)  Late Lower GCFrame Pass
...
  30.9312 (100.0%)   2.9914 (100.0%)  33.9226 (100.0%)  33.9245 (100.0%)  Total

We could whitelist it for targets that are known not to care about random address spaces (like X86). But I'd definitely enable it for reflection. I'll have a look, maybe I can optimize the implementation a little.

maleadt · 2020-05-01T10:39:43Z

I removed the pass from the default pipeline, it's now only called during reflection. We can always add it back when its needed, or if specific targets require it. I also removed the NVPTX and WASM address space patches, since these aren't host targets and we can just run the address space removal pass from the respective compilers (cc @tshort, see JuliaGPU/GPUCompiler.jl#22).

Keno · 2020-05-03T06:36:46Z

I do still think we should start a discussion with upstream about reserving some AS numbers for frontend usage and requiring the backends to ignore them. Any amount of overhead seems sad for something that could be easily fixed by an if/else in the backend. The reflection use case is legitimate and not performance sensitive of course.

src/aotcompile.cpp

vtjnash · 2020-05-07T20:22:32Z

src/llvm-remove-addrspaces.cpp

+ {
+ }
+
+ Type *remapType(Type *SrcTy)


This algorithm here doesn't appear to have a guaranteed fixed point (it doesn't deal with the fact that PointerType->getElementType can form cycles in the type tree). We avoid emitting any code like that, so it's probably okay for us, but probably not for upstream.

How does LLVM typically deal with that? A Seen set?

from @vtjnash: https://llvm.org/doxygen/IRMover_8cpp_source.html#l00235

vtjnash · 2020-05-07T20:44:01Z

src/llvm-remove-addrspaces.cpp

+ Type *SrcTy = remapType(
+ cast<PointerType>(Src->getType()->getScalarType())
+ ->getElementType());


Suggested change

Type *SrcTy = remapType(

cast<PointerType>(Src->getType()->getScalarType())

->getElementType());

Type *SrcTy = remapType(cast<GetElementPtrConstantExpr>(CE)

->getSourceElementType());

I don't think we can use GetElementPtrConstantExpr, it looks like a private class?

maleadt added the compiler:codegen Generation of LLVM IR and native code label Apr 29, 2020

maleadt marked this pull request as draft April 29, 2020 17:12

vchuravy approved these changes Apr 29, 2020

View reviewed changes

src/aotcompile.cpp Outdated Show resolved Hide resolved

maleadt force-pushed the tb/strip_addrspace branch from 66b3c35 to 1a9408d Compare April 30, 2020 10:42

maleadt marked this pull request as ready for review April 30, 2020 11:29

Add an LLVM pass to remove Julia-specific address space information.

87e7a33

maleadt force-pushed the tb/strip_addrspace branch from 1a9408d to 87e7a33 Compare April 30, 2020 13:22

maleadt mentioned this pull request Apr 30, 2020

Adapt to removal of address spaces JuliaGPU/GPUCompiler.jl#22

Merged

maleadt requested review from vtjnash and Keno April 30, 2020 14:13

jpsamaroo mentioned this pull request Apr 30, 2020

Fix llvm-propagate-addrspacecasts for targets with DL greater than 0 #35668

Closed

maleadt added 2 commits May 1, 2020 12:33

Only remove address spaces for reflection.

cc84a12

Remove address space patches for non-host targets.

dcd7b72

maleadt merged commit b8d9e78 into master May 3, 2020

maleadt deleted the tb/strip_addrspace branch May 3, 2020 06:32

vtjnash reviewed May 7, 2020

View reviewed changes

src/aotcompile.cpp Show resolved Hide resolved

vtjnash reviewed May 7, 2020

View reviewed changes

tkf mentioned this pull request Nov 25, 2020

Support recursive struct in llvm-remove-addrspaces #38573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an LLVM pass to remove Julia-specific address space information. #35645

Add an LLVM pass to remove Julia-specific address space information. #35645

maleadt commented Apr 29, 2020 •

edited

Loading

vchuravy left a comment

maleadt commented Apr 30, 2020

maleadt commented Apr 30, 2020

Keno commented Apr 30, 2020

vchuravy commented Apr 30, 2020

maleadt commented Apr 30, 2020

maleadt commented May 1, 2020

Keno commented May 3, 2020

vtjnash May 7, 2020

maleadt Jun 19, 2020

maleadt Jun 23, 2020

vtjnash May 7, 2020

maleadt Jun 19, 2020

Add an LLVM pass to remove Julia-specific address space information. #35645

Add an LLVM pass to remove Julia-specific address space information. #35645

Conversation

maleadt commented Apr 29, 2020 • edited Loading

vchuravy left a comment

Choose a reason for hiding this comment

maleadt commented Apr 30, 2020

maleadt commented Apr 30, 2020

Keno commented Apr 30, 2020

vchuravy commented Apr 30, 2020

maleadt commented Apr 30, 2020

maleadt commented May 1, 2020

Keno commented May 3, 2020

vtjnash May 7, 2020

Choose a reason for hiding this comment

maleadt Jun 19, 2020

Choose a reason for hiding this comment

maleadt Jun 23, 2020

Choose a reason for hiding this comment

vtjnash May 7, 2020

Choose a reason for hiding this comment

maleadt Jun 19, 2020

Choose a reason for hiding this comment

maleadt commented Apr 29, 2020 •

edited

Loading