Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade LLVM to 10 #35318

Merged
merged 1 commit into from
Sep 24, 2020
Merged

upgrade LLVM to 10 #35318

merged 1 commit into from
Sep 24, 2020

Conversation

vchuravy
Copy link
Sponsor Member

@vchuravy vchuravy commented Mar 31, 2020

and so it goes, on and on and on. We might either do it now and iron out the kinks
during 1.5 or do it very early in 1.6.

@vchuravy vchuravy force-pushed the vc/upgrade_llvm_10 branch 2 times, most recently from 174ecd2 to 8ae92cc Compare March 31, 2020 01:17
@vchuravy
Copy link
Sponsor Member Author

This might be the first upgrade where Windows passes at the first attempt...

@Keno can I bother you with updating the GCAnalyzer?

@IanButterworth
Copy link
Sponsor Member

What’s the current thinking on when this might merge?

Just wondering, regarding llvm support for new arm cpus related to #36485, a few of which were added in 10.

I guess it’d need performance benchmarking given the feared 10% that hit rust https://nikic.github.io/2020/05/10/Make-LLVM-fast-again.html

@vchuravy
Copy link
Sponsor Member Author

vchuravy commented Jul 1, 2020

My current thinking is to wait of 10.0.1 or 11, but yes additional performance benchmarking of the compile time would be good.

@vchuravy
Copy link
Sponsor Member Author

vchuravy commented Jul 1, 2020

Also #35460 needs to be fixed first. @ianshmean if you want to try your hand at writing an LLVM patch ;)

@IanButterworth
Copy link
Sponsor Member

I'm happy to take a look in some of my fun time. Can't promise much though..

@vchuravy would you mind rebasing this?

@yuyichao
Copy link
Contributor

yuyichao commented Jul 1, 2020

.... How many relocator does LLVM have = = .......

@vchuravy vchuravy added the external dependencies Involves LLVM, OpenBLAS, or other linked libraries label Jul 1, 2020
@vchuravy
Copy link
Sponsor Member Author

vchuravy commented Jul 1, 2020

.... How many relocator does LLVM have = = .......

https://xkcd.com/927/

@yuyichao
Copy link
Contributor

yuyichao commented Jul 1, 2020

Well, I assume they are not "competing".....

Just that I felt like I've fixed a more complex aarch64 relocatioin before, and it's definitely not either R_AARCH64_ABS32 or R_AARCH64_ABS64.

Is there an LLVM IR that triggers this? That'll make testing significantly easier.

edit: ah, found the ppc one in #35460

@yuyichao
Copy link
Contributor

yuyichao commented Jul 14, 2020

Is there a patch for this (#35460) already? I finally have a (actually 2) working aarch64 systems again so if no one is working on it I can.

@vchuravy
Copy link
Sponsor Member Author

No patch that I am aware off.

@vchuravy
Copy link
Sponsor Member Author

I did some timings on my laptop (so take it with a grain of salt)

TLDR

| Task | LLVM Version | Time in opt | Total time |
| .ji | 9.0.1 | 21.6s | 1:12 |
| .ji | 10.0.1 | 20.5s | 1:07 |
| .o | 9.0.1 | 67.6s | 1:23 |
| .o | 10.0.1 | 72.3s | 1:28 |

Timing compilation of base

LLVM 10

Make.user

JULIA_PRECOMPILE=0

.ji generation

[vchuravy@thor julia-llvm10]$ export JULIA_LLVM_ARGS="-time-passes"
[vchuravy@thor julia-llvm10]$ cd /home/vchuravy/src/julia/base && JULIA_BINDIR=/home/vchuravy/builds/julia-llvm10/usr/bin time /home/vchuravy/builds/julia-llvm10/usr/bin/julia -g1 -O0 -C "native" --output-ji /home/vchuravy/builds/julia-llvm10/usr/lib/julia/sys.ji.tmp  --startup-file=no --warn-overwrite=yes --sysimage /home/vchuravy/builds/julia-llvm10/usr/lib/julia/corecompiler.ji sysimg.jl ../../../builds/julia-llvm10/base/
Base  ─────────── 21.935258 seconds
Base64  ─────────  3.164876 seconds
CRC32c  ─────────  0.006769 seconds
SHA  ────────────  0.166173 seconds
FileWatching  ───  0.087520 seconds
Unicode  ────────  0.004910 seconds
Mmap  ───────────  0.070412 seconds
Serialization  ──  0.314565 seconds
Libdl  ──────────  0.001383 seconds
Printf  ─────────  0.233093 seconds
Markdown  ───────  1.026469 seconds
LibGit2  ────────  1.516319 seconds
Logging  ────────  0.027973 seconds
Sockets  ────────  0.319219 seconds
Profile  ────────  0.224095 seconds
Dates  ──────────  1.960759 seconds
DelimitedFiles  ─  0.092659 seconds
Random  ─────────  0.474244 seconds
UUIDs  ──────────  0.011979 seconds
Future  ─────────  0.003785 seconds
LinearAlgebra  ──  7.283317 seconds
SparseArrays  ───  3.237229 seconds
SuiteSparse  ────  0.708540 seconds
Distributed  ────  0.776062 seconds
SharedArrays  ───  0.136102 seconds
Pkg  ──────────── 10.328175 seconds
Test  ───────────  0.218332 seconds
REPL  ───────────  0.000173 seconds
Statistics  ─────  0.151643 seconds
Stdlibs total  ── 32.561612 seconds
Sysimage built. Summary:
Total ───────  54.499388 seconds 
Base: ───────  21.935258 seconds 40.2486%
Stdlibs: ────  32.561612 seconds 59.7467%
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 20.5503 seconds (20.4719 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   5.9188 ( 35.6%)   2.3682 ( 60.7%)   8.2870 ( 40.3%)   8.2570 ( 40.3%)  X86 Assembly Printer
   6.7844 ( 40.8%)   0.9535 ( 24.4%)   7.7379 ( 37.7%)   7.7094 ( 37.7%)  X86 DAG->DAG Instruction Selection
   0.8588 (  5.2%)   0.0918 (  2.4%)   0.9507 (  4.6%)   0.9483 (  4.6%)  Late Lower GCFrame Pass
   0.3884 (  2.3%)   0.0486 (  1.2%)   0.4370 (  2.1%)   0.4345 (  2.1%)  Fast Register Allocator
   0.3709 (  2.2%)   0.0348 (  0.9%)   0.4057 (  2.0%)   0.4039 (  2.0%)  Live DEBUG_VALUE analysis
   0.2675 (  1.6%)   0.0484 (  1.2%)   0.3158 (  1.5%)   0.3138 (  1.5%)  Simplify the CFG
   0.2065 (  1.2%)   0.0256 (  0.7%)   0.2321 (  1.1%)   0.2304 (  1.1%)  Prologue/Epilogue Insertion & Frame Finalization
   0.1161 (  0.7%)   0.0158 (  0.4%)   0.1318 (  0.6%)   0.1312 (  0.6%)  Two-Address instruction pass
   0.0908 (  0.5%)   0.0181 (  0.5%)   0.1089 (  0.5%)   0.1081 (  0.5%)  Inliner for always_inline functions
   0.0909 (  0.5%)   0.0164 (  0.4%)   0.1073 (  0.5%)   0.1066 (  0.5%)  Final GC intrinsic lowering pass
   0.0858 (  0.5%)   0.0146 (  0.4%)   0.1003 (  0.5%)   0.0994 (  0.5%)  MemCpy Optimization
   0.0634 (  0.4%)   0.0100 (  0.3%)   0.0733 (  0.4%)   0.0731 (  0.4%)  Dominator Tree Construction #2
   0.0615 (  0.4%)   0.0124 (  0.3%)   0.0739 (  0.4%)   0.0729 (  0.4%)  Insert stack protectors
   0.0577 (  0.3%)   0.0129 (  0.3%)   0.0706 (  0.3%)   0.0703 (  0.3%)  Dominator Tree Construction #4
   0.0569 (  0.3%)   0.0099 (  0.3%)   0.0668 (  0.3%)   0.0664 (  0.3%)  Dominator Tree Construction
   0.0456 (  0.3%)   0.0173 (  0.4%)   0.0629 (  0.3%)   0.0623 (  0.3%)  Free MachineFunction
   0.0539 (  0.3%)   0.0082 (  0.2%)   0.0621 (  0.3%)   0.0619 (  0.3%)  MachineDominator Tree Construction
   0.0474 (  0.3%)   0.0069 (  0.2%)   0.0543 (  0.3%)   0.0541 (  0.3%)  Dominator Tree Construction #3
   0.0454 (  0.3%)   0.0071 (  0.2%)   0.0525 (  0.3%)   0.0525 (  0.3%)  Dominator Tree Construction #5
   0.0433 (  0.3%)   0.0078 (  0.2%)   0.0510 (  0.2%)   0.0509 (  0.2%)  Natural Loop Information
   0.0446 (  0.3%)   0.0062 (  0.2%)   0.0508 (  0.2%)   0.0504 (  0.2%)  Eliminate PHI nodes for register allocation
   0.0420 (  0.3%)   0.0066 (  0.2%)   0.0486 (  0.2%)   0.0482 (  0.2%)  Expand Atomic instructions
   0.0406 (  0.2%)   0.0073 (  0.2%)   0.0479 (  0.2%)   0.0474 (  0.2%)  CallGraph Construction
   0.0420 (  0.3%)   0.0054 (  0.1%)   0.0474 (  0.2%)   0.0471 (  0.2%)  Post-RA pseudo instruction expansion pass
   0.0363 (  0.2%)   0.0057 (  0.1%)   0.0420 (  0.2%)   0.0418 (  0.2%)  LowerPTLS Pass
   0.0369 (  0.2%)   0.0044 (  0.1%)   0.0414 (  0.2%)   0.0412 (  0.2%)  X86 vzeroupper inserter
   0.0352 (  0.2%)   0.0052 (  0.1%)   0.0404 (  0.2%)   0.0401 (  0.2%)  X86 EFLAGS copy lowering
   0.0337 (  0.2%)   0.0049 (  0.1%)   0.0386 (  0.2%)   0.0384 (  0.2%)  Finalize ISel and expand pseudo-instructions
   0.0332 (  0.2%)   0.0051 (  0.1%)   0.0383 (  0.2%)   0.0381 (  0.2%)  Lower constant intrinsics
   0.0307 (  0.2%)   0.0038 (  0.1%)   0.0344 (  0.2%)   0.0343 (  0.2%)  X86 pseudo instruction expansion pass
   0.0274 (  0.2%)   0.0055 (  0.1%)   0.0328 (  0.2%)   0.0329 (  0.2%)  Merge Duplicate Global Constants
   0.0292 (  0.2%)   0.0036 (  0.1%)   0.0329 (  0.2%)   0.0328 (  0.2%)  Check CFA info and insert CFI instructions if needed
   0.0245 (  0.1%)   0.0044 (  0.1%)   0.0288 (  0.1%)   0.0285 (  0.1%)  Function Alias Analysis Results
   0.0240 (  0.1%)   0.0041 (  0.1%)   0.0280 (  0.1%)   0.0279 (  0.1%)  Scalarize Masked Memory Intrinsics
   0.0222 (  0.1%)   0.0037 (  0.1%)   0.0259 (  0.1%)   0.0258 (  0.1%)  Expand reduction intrinsics
   0.0196 (  0.1%)   0.0038 (  0.1%)   0.0234 (  0.1%)   0.0232 (  0.1%)  Remove non-integral address space.
   0.0193 (  0.1%)   0.0038 (  0.1%)   0.0231 (  0.1%)   0.0230 (  0.1%)  GC Invariant Verification Pass
   0.0187 (  0.1%)   0.0043 (  0.1%)   0.0230 (  0.1%)   0.0228 (  0.1%)  Exception handling preparation
   0.0176 (  0.1%)   0.0029 (  0.1%)   0.0205 (  0.1%)   0.0205 (  0.1%)  Remove unreachable blocks from the CFG
   0.0164 (  0.1%)   0.0029 (  0.1%)   0.0193 (  0.1%)   0.0193 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0157 (  0.1%)   0.0027 (  0.1%)   0.0183 (  0.1%)   0.0182 (  0.1%)  Bundle Machine CFG Edges
   0.0131 (  0.1%)   0.0044 (  0.1%)   0.0175 (  0.1%)   0.0174 (  0.1%)  Assumption Cache Tracker
   0.0142 (  0.1%)   0.0027 (  0.1%)   0.0168 (  0.1%)   0.0170 (  0.1%)  Basic Alias Analysis (stateless AA impl) #2
   0.0138 (  0.1%)   0.0029 (  0.1%)   0.0167 (  0.1%)   0.0165 (  0.1%)  Lazy Branch Probability Analysis
   0.0135 (  0.1%)   0.0024 (  0.1%)   0.0159 (  0.1%)   0.0159 (  0.1%)  Memory Dependence Analysis
   0.0134 (  0.1%)   0.0025 (  0.1%)   0.0159 (  0.1%)   0.0159 (  0.1%)  Expand indirectbr instructions
   0.0132 (  0.1%)   0.0025 (  0.1%)   0.0157 (  0.1%)   0.0157 (  0.1%)  X86 Indirect Branch Tracking
   0.0118 (  0.1%)   0.0022 (  0.1%)   0.0140 (  0.1%)   0.0139 (  0.1%)  X86 PIC Global Base Reg Initialization
   0.0112 (  0.1%)   0.0021 (  0.1%)   0.0134 (  0.1%)   0.0134 (  0.1%)  Machine Optimization Remark Emitter
   0.0112 (  0.1%)   0.0021 (  0.1%)   0.0133 (  0.1%)   0.0133 (  0.1%)  Insert fentry calls
   0.0113 (  0.1%)   0.0020 (  0.1%)   0.0132 (  0.1%)   0.0132 (  0.1%)  Phi Values Analysis
   0.0100 (  0.1%)   0.0021 (  0.1%)   0.0122 (  0.1%)   0.0122 (  0.1%)  Lower Julia Exception Handlers
   0.0100 (  0.1%)   0.0020 (  0.1%)   0.0120 (  0.1%)   0.0122 (  0.1%)  Lazy Block Frequency Analysis
   0.0100 (  0.1%)   0.0019 (  0.0%)   0.0120 (  0.1%)   0.0120 (  0.1%)  Contiguously Lay Out Funclets
   0.0097 (  0.1%)   0.0022 (  0.1%)   0.0119 (  0.1%)   0.0119 (  0.1%)  Machine Optimization Remark Emitter #2
   0.0097 (  0.1%)   0.0018 (  0.0%)   0.0116 (  0.1%)   0.0116 (  0.1%)  X86 speculative load hardening
   0.0098 (  0.1%)   0.0018 (  0.0%)   0.0116 (  0.1%)   0.0115 (  0.1%)  Insert XRay ops
   0.0098 (  0.1%)   0.0019 (  0.0%)   0.0117 (  0.1%)   0.0115 (  0.1%)  X86 Indirect Thunks
   0.0096 (  0.1%)   0.0018 (  0.0%)   0.0114 (  0.1%)   0.0114 (  0.1%)  X86 FP Stackifier
   0.0095 (  0.1%)   0.0018 (  0.0%)   0.0114 (  0.1%)   0.0113 (  0.1%)  Implement the 'patchable-function' attribute
   0.0094 (  0.1%)   0.0018 (  0.0%)   0.0112 (  0.1%)   0.0113 (  0.1%)  Local Stack Slot Allocation
   0.0095 (  0.1%)   0.0019 (  0.0%)   0.0113 (  0.1%)   0.0113 (  0.1%)  StackMap Liveness Analysis
   0.0094 (  0.1%)   0.0017 (  0.0%)   0.0111 (  0.1%)   0.0112 (  0.1%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining)
   0.0094 (  0.1%)   0.0017 (  0.0%)   0.0111 (  0.1%)   0.0111 (  0.1%)  X86 Load Value Injection (LVI) Load Hardening (Unoptimized)
   0.0092 (  0.1%)   0.0018 (  0.0%)   0.0110 (  0.1%)   0.0111 (  0.1%)  X86 WinAlloca Expander
   0.0092 (  0.1%)   0.0018 (  0.0%)   0.0110 (  0.1%)   0.0110 (  0.1%)  Analyze Machine Code For Garbage Collection
   0.0092 (  0.1%)   0.0018 (  0.0%)   0.0110 (  0.1%)   0.0110 (  0.1%)  X86 Load Value Injection (LVI) Ret-Hardening
   0.0092 (  0.1%)   0.0018 (  0.0%)   0.0110 (  0.1%)   0.0109 (  0.1%)  Lazy Machine Block Frequency Analysis
   0.0088 (  0.1%)   0.0021 (  0.1%)   0.0109 (  0.1%)   0.0109 (  0.1%)  Lazy Machine Block Frequency Analysis #2
   0.0090 (  0.1%)   0.0017 (  0.0%)   0.0107 (  0.1%)   0.0108 (  0.1%)  X86 Discriminate Memory Operands
   0.0085 (  0.1%)   0.0019 (  0.0%)   0.0105 (  0.1%)   0.0106 (  0.1%)  Safe Stack instrumentation pass
   0.0088 (  0.1%)   0.0017 (  0.0%)   0.0105 (  0.1%)   0.0104 (  0.1%)  X86 Insert Cache Prefetches
   0.0087 (  0.1%)   0.0016 (  0.0%)   0.0103 (  0.1%)   0.0103 (  0.1%)  Shadow Stack GC Lowering
   0.0085 (  0.1%)   0.0015 (  0.0%)   0.0100 (  0.0%)   0.0101 (  0.0%)  Lower Garbage Collection Instructions
   0.0072 (  0.0%)   0.0013 (  0.0%)   0.0085 (  0.0%)   0.0085 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0058 (  0.0%)   0.0011 (  0.0%)   0.0069 (  0.0%)   0.0070 (  0.0%)  LowerSIMDLoop Pass
   0.0055 (  0.0%)   0.0010 (  0.0%)   0.0065 (  0.0%)   0.0065 (  0.0%)  Rewrite Symbols
   0.0051 (  0.0%)   0.0010 (  0.0%)   0.0061 (  0.0%)   0.0061 (  0.0%)  A No-Op Barrier Pass
   0.0048 (  0.0%)   0.0008 (  0.0%)   0.0057 (  0.0%)   0.0057 (  0.0%)  LowerSIMDLoop Pass #2
   0.0021 (  0.0%)   0.0006 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  Target Pass Configuration
   0.0020 (  0.0%)   0.0006 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  Create Garbage Collector Module Metadata
   0.0021 (  0.0%)   0.0006 (  0.0%)   0.0028 (  0.0%)   0.0027 (  0.0%)  Target Transform Information
   0.0020 (  0.0%)   0.0006 (  0.0%)   0.0026 (  0.0%)   0.0027 (  0.0%)  Machine Module Information
   0.0021 (  0.0%)   0.0006 (  0.0%)   0.0027 (  0.0%)   0.0027 (  0.0%)  Profile summary info
   0.0020 (  0.0%)   0.0006 (  0.0%)   0.0027 (  0.0%)   0.0027 (  0.0%)  Target Library Information
   0.0020 (  0.0%)   0.0006 (  0.0%)   0.0026 (  0.0%)   0.0026 (  0.0%)  Machine Branch Probability Analysis
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Dominator Tree Construction #6
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Natural Loop Information #2
  16.6486 (100.0%)   3.9016 (100.0%)  20.5503 (100.0%)  20.4719 (100.0%)  Total

62.14user 5.68system 1:07.76elapsed 100%CPU (0avgtext+0avgdata 808736maxresident)k
7096inputs+157960outputs (10major+290167minor)pagefaults 0swaps

.o generation

[vchuravy@thor julia-llvm10]$ cd /home/vchuravy/src/julia/base &&  JULIA_BINDIR=/home/vchuravy/builds/julia-llvm10/usr/bin time /home/vchuravy/builds/julia-llvm10/usr/bin/julia -O3 -C "native" --output-o /home/vchuravy/builds/julia-llvm10/usr/lib/julia/sys-o.a.tmp  --startup-file=no --warn-overwrite=yes --sysimage /home/vchuravy/builds/julia-llvm10/usr/lib/julia/sys.ji /home/vchuravy/src/julia/contrib/generate_precompile.jl 0
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 72.3439 seconds (73.0409 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  10.2414 ( 16.2%)   1.6668 ( 18.2%)  11.9082 ( 16.5%)  11.9465 ( 16.4%)  X86 DAG->DAG Instruction Selection #2
   5.4743 (  8.7%)   1.4756 ( 16.1%)   6.9499 (  9.6%)   6.9702 (  9.5%)  X86 Assembly Printer #2
   2.4807 (  3.9%)   1.0775 ( 11.8%)   3.5581 (  4.9%)   3.5641 (  4.9%)  X86 Assembly Printer
   3.1646 (  5.0%)   0.3436 (  3.8%)   3.5082 (  4.8%)   3.5199 (  4.8%)  X86 DAG->DAG Instruction Selection
   2.2626 (  3.6%)   0.1227 (  1.3%)   2.3853 (  3.3%)   2.5037 (  3.4%)  Combine redundant instructions
   2.3029 (  3.6%)   0.0866 (  0.9%)   2.3894 (  3.3%)   2.3932 (  3.3%)  Global Value Numbering #2
   2.2222 (  3.5%)   0.1242 (  1.4%)   2.3463 (  3.2%)   2.3662 (  3.2%)  Global Value Numbering
   2.2053 (  3.5%)   0.1548 (  1.7%)   2.3601 (  3.3%)   2.3656 (  3.2%)  Greedy Register Allocator
   1.9264 (  3.0%)   0.1341 (  1.5%)   2.0605 (  2.8%)   2.0727 (  2.8%)  Combine redundant instructions #4
   1.2113 (  1.9%)   0.1504 (  1.6%)   1.3616 (  1.9%)   1.3636 (  1.9%)  Machine Instruction Scheduler
   0.9254 (  1.5%)   0.0831 (  0.9%)   1.0085 (  1.4%)   1.0177 (  1.4%)  Combine redundant instructions #2
   0.9179 (  1.5%)   0.0486 (  0.5%)   0.9665 (  1.3%)   0.9680 (  1.3%)  Late Lower GCFrame Pass #2
   0.8471 (  1.3%)   0.0946 (  1.0%)   0.9416 (  1.3%)   0.9598 (  1.3%)  CodeGen Prepare
   0.8457 (  1.3%)   0.0665 (  0.7%)   0.9122 (  1.3%)   0.9133 (  1.3%)  ReachingDefAnalysis
   0.7692 (  1.2%)   0.0301 (  0.3%)   0.7994 (  1.1%)   0.8147 (  1.1%)  Loop Strength Reduction
   0.7413 (  1.2%)   0.0629 (  0.7%)   0.8042 (  1.1%)   0.8114 (  1.1%)  Combine redundant instructions #3
   0.6270 (  1.0%)   0.0768 (  0.8%)   0.7038 (  1.0%)   0.7053 (  1.0%)  Live Variable Analysis
   0.6341 (  1.0%)   0.0558 (  0.6%)   0.6898 (  1.0%)   0.6958 (  1.0%)  Induction Variable Simplification
   0.5560 (  0.9%)   0.1354 (  1.5%)   0.6914 (  1.0%)   0.6923 (  0.9%)  Module Verifier #2
   0.4538 (  0.7%)   0.0631 (  0.7%)   0.5169 (  0.7%)   0.5281 (  0.7%)  Module Verifier
   0.4630 (  0.7%)   0.0478 (  0.5%)   0.5108 (  0.7%)   0.5116 (  0.7%)  Live Interval Analysis
   0.4468 (  0.7%)   0.0413 (  0.5%)   0.4880 (  0.7%)   0.4892 (  0.7%)  Simple Register Coalescing
   0.4259 (  0.7%)   0.0430 (  0.5%)   0.4689 (  0.6%)   0.4754 (  0.7%)  SLP Vectorizer
   0.4148 (  0.7%)   0.0334 (  0.4%)   0.4483 (  0.6%)   0.4489 (  0.6%)  Late Lower GCFrame Pass
   0.3650 (  0.6%)   0.0522 (  0.6%)   0.4173 (  0.6%)   0.4209 (  0.6%)  Memory SSA
   0.3175 (  0.5%)   0.0430 (  0.5%)   0.3605 (  0.5%)   0.3610 (  0.5%)  Machine Common Subexpression Elimination
   0.2676 (  0.4%)   0.0465 (  0.5%)   0.3141 (  0.4%)   0.3347 (  0.5%)  Early CSE
   0.3041 (  0.5%)   0.0294 (  0.3%)   0.3335 (  0.5%)   0.3343 (  0.5%)  Live DEBUG_VALUE analysis #2
   0.2940 (  0.5%)   0.0335 (  0.4%)   0.3274 (  0.5%)   0.3282 (  0.4%)  X86 Byte/Word Instruction Fixup
   0.2810 (  0.4%)   0.0417 (  0.5%)   0.3227 (  0.4%)   0.3229 (  0.4%)  Prologue/Epilogue Insertion & Frame Finalization #2
   0.2728 (  0.4%)   0.0326 (  0.4%)   0.3054 (  0.4%)   0.3071 (  0.4%)  Machine Copy Propagation Pass
   0.2809 (  0.4%)   0.0235 (  0.3%)   0.3044 (  0.4%)   0.3063 (  0.4%)  Loop Invariant Code Motion
   0.2791 (  0.4%)   0.0175 (  0.2%)   0.2966 (  0.4%)   0.3022 (  0.4%)  Induction Variable Users
   0.2616 (  0.4%)   0.0342 (  0.4%)   0.2958 (  0.4%)   0.2981 (  0.4%)  Memory SSA #2
   0.2623 (  0.4%)   0.0204 (  0.2%)   0.2827 (  0.4%)   0.2946 (  0.4%)  Simplify the CFG #2
   0.2234 (  0.4%)   0.0301 (  0.3%)   0.2535 (  0.4%)   0.2714 (  0.4%)  Jump Threading
   0.2221 (  0.4%)   0.0268 (  0.3%)   0.2489 (  0.3%)   0.2496 (  0.3%)  Machine Copy Propagation Pass #2
   0.2245 (  0.4%)   0.0218 (  0.2%)   0.2463 (  0.3%)   0.2487 (  0.3%)  Dead Store Elimination
   0.2291 (  0.4%)   0.0175 (  0.2%)   0.2467 (  0.3%)   0.2485 (  0.3%)  Jump Threading #2
   0.2140 (  0.3%)   0.0152 (  0.2%)   0.2292 (  0.3%)   0.2424 (  0.3%)  SROA
   0.1868 (  0.3%)   0.0251 (  0.3%)   0.2119 (  0.3%)   0.2259 (  0.3%)  JuliaMultiVersioning Pass
   0.1818 (  0.3%)   0.0405 (  0.4%)   0.2223 (  0.3%)   0.2225 (  0.3%)  Branch Probability Analysis #3
   0.1919 (  0.3%)   0.0119 (  0.1%)   0.2038 (  0.3%)   0.2134 (  0.3%)  Simplify the CFG #3
   0.1895 (  0.3%)   0.0203 (  0.2%)   0.2099 (  0.3%)   0.2097 (  0.3%)  Machine code sinking
   0.1758 (  0.3%)   0.0272 (  0.3%)   0.2029 (  0.3%)   0.2033 (  0.3%)  Peephole Optimizations
   0.1871 (  0.3%)   0.0138 (  0.2%)   0.2009 (  0.3%)   0.2031 (  0.3%)  Loop Invariant Code Motion #2
   0.1792 (  0.3%)   0.0196 (  0.2%)   0.1988 (  0.3%)   0.1994 (  0.3%)  Control Flow Optimizer
   0.1761 (  0.3%)   0.0200 (  0.2%)   0.1961 (  0.3%)   0.1961 (  0.3%)  Branch Probability Basic Block Placement
   0.1803 (  0.3%)   0.0123 (  0.1%)   0.1926 (  0.3%)   0.1937 (  0.3%)  Simplify the CFG #6
   0.1584 (  0.3%)   0.0227 (  0.2%)   0.1811 (  0.3%)   0.1842 (  0.3%)  Branch Probability Analysis #2
   0.1638 (  0.3%)   0.0191 (  0.2%)   0.1829 (  0.3%)   0.1832 (  0.3%)  Virtual Register Rewriter
   0.1425 (  0.2%)   0.0381 (  0.4%)   0.1805 (  0.2%)   0.1808 (  0.2%)  Insert stack protectors #2
   0.1416 (  0.2%)   0.0252 (  0.3%)   0.1668 (  0.2%)   0.1777 (  0.2%)  Remove redundant instructions
   0.1480 (  0.2%)   0.0181 (  0.2%)   0.1661 (  0.2%)   0.1775 (  0.2%)  Recognize loop idioms
   0.1596 (  0.3%)   0.0176 (  0.2%)   0.1772 (  0.2%)   0.1774 (  0.2%)  Eliminate PHI nodes for register allocation #2
   0.1581 (  0.3%)   0.0161 (  0.2%)   0.1742 (  0.2%)   0.1767 (  0.2%)  Loop Vectorization
   0.1627 (  0.3%)   0.0121 (  0.1%)   0.1749 (  0.2%)   0.1750 (  0.2%)  Fast Register Allocator
   0.1623 (  0.3%)   0.0105 (  0.1%)   0.1728 (  0.2%)   0.1732 (  0.2%)  Sparse Conditional Constant Propagation #2
   0.1510 (  0.2%)   0.0214 (  0.2%)   0.1724 (  0.2%)   0.1725 (  0.2%)  Two-Address instruction pass #2
   0.1620 (  0.3%)   0.0100 (  0.1%)   0.1720 (  0.2%)   0.1723 (  0.2%)  Live DEBUG_VALUE analysis
   0.1474 (  0.2%)   0.0223 (  0.2%)   0.1697 (  0.2%)   0.1705 (  0.2%)  Remove redundant instructions #2
   0.1465 (  0.2%)   0.0192 (  0.2%)   0.1658 (  0.2%)   0.1669 (  0.2%)  Branch Probability Analysis
   0.1337 (  0.2%)   0.0114 (  0.1%)   0.1451 (  0.2%)   0.1519 (  0.2%)  Dead Code Elimination
   0.1300 (  0.2%)   0.0168 (  0.2%)   0.1468 (  0.2%)   0.1480 (  0.2%)  Aggressive Dead Code Elimination
   0.1319 (  0.2%)   0.0147 (  0.2%)   0.1466 (  0.2%)   0.1476 (  0.2%)  Simplify the CFG #4
   0.1160 (  0.2%)   0.0228 (  0.2%)   0.1388 (  0.2%)   0.1475 (  0.2%)  Reassociate expressions
   0.1152 (  0.2%)   0.0315 (  0.3%)   0.1467 (  0.2%)   0.1471 (  0.2%)  Dominator Tree Construction #24
   0.1265 (  0.2%)   0.0184 (  0.2%)   0.1448 (  0.2%)   0.1452 (  0.2%)  Live Range Shrink
   0.1214 (  0.2%)   0.0141 (  0.2%)   0.1355 (  0.2%)   0.1440 (  0.2%)  Rotate Loops
   0.1274 (  0.2%)   0.0154 (  0.2%)   0.1428 (  0.2%)   0.1429 (  0.2%)  X86 Execution Dependency Fix
   0.1238 (  0.2%)   0.0159 (  0.2%)   0.1397 (  0.2%)   0.1407 (  0.2%)  Sparse Conditional Constant Propagation
   0.1174 (  0.2%)   0.0229 (  0.3%)   0.1403 (  0.2%)   0.1406 (  0.2%)  Simplify the CFG
   0.1242 (  0.2%)   0.0131 (  0.1%)   0.1373 (  0.2%)   0.1383 (  0.2%)  Simplify the CFG #5
   0.1258 (  0.2%)   0.0094 (  0.1%)   0.1351 (  0.2%)   0.1357 (  0.2%)  MachineDominator Tree Construction #10
   0.1172 (  0.2%)   0.0142 (  0.2%)   0.1315 (  0.2%)   0.1328 (  0.2%)  Remove redundant instructions #3
   0.1140 (  0.2%)   0.0105 (  0.1%)   0.1245 (  0.2%)   0.1244 (  0.2%)  Final GC intrinsic lowering pass #2
   0.1113 (  0.2%)   0.0127 (  0.1%)   0.1240 (  0.2%)   0.1244 (  0.2%)  Early Machine Loop Invariant Code Motion
   0.0996 (  0.2%)   0.0165 (  0.2%)   0.1161 (  0.2%)   0.1193 (  0.2%)  Expand Atomic instructions #2
   0.1077 (  0.2%)   0.0112 (  0.1%)   0.1189 (  0.2%)   0.1192 (  0.2%)  Debug Variable Analysis
   0.1152 (  0.2%)   0.0000 (  0.0%)   0.1152 (  0.2%)   0.1156 (  0.2%)  LowerPTLS Pass #2
   0.0958 (  0.2%)   0.0160 (  0.2%)   0.1117 (  0.2%)   0.1124 (  0.2%)  Remove dead machine instructions
   0.0941 (  0.1%)   0.0138 (  0.2%)   0.1079 (  0.1%)   0.1084 (  0.1%)  Merge disjoint stack slots
   0.0929 (  0.1%)   0.0149 (  0.2%)   0.1078 (  0.1%)   0.1082 (  0.1%)  Machine InstCombiner
   0.0963 (  0.2%)   0.0082 (  0.1%)   0.1045 (  0.1%)   0.1055 (  0.1%)  Dominator Tree Construction #18
   0.0896 (  0.1%)   0.0142 (  0.2%)   0.1038 (  0.1%)   0.1045 (  0.1%)  MachinePostDominator Tree Construction
   0.0909 (  0.1%)   0.0085 (  0.1%)   0.0994 (  0.1%)   0.1043 (  0.1%)  Propagate (non-)rootedness information
   0.0895 (  0.1%)   0.0125 (  0.1%)   0.1020 (  0.1%)   0.1024 (  0.1%)  MachinePostDominator Tree Construction #2
   0.0883 (  0.1%)   0.0079 (  0.1%)   0.0962 (  0.1%)   0.1009 (  0.1%)  Dominator Tree Construction #7
   0.0797 (  0.1%)   0.0145 (  0.2%)   0.0941 (  0.1%)   0.0999 (  0.1%)  Dominator Tree Construction #11
   0.0726 (  0.1%)   0.0247 (  0.3%)   0.0973 (  0.1%)   0.0981 (  0.1%)  Free MachineFunction #2
   0.0827 (  0.1%)   0.0085 (  0.1%)   0.0913 (  0.1%)   0.0962 (  0.1%)  Inliner for always_inline functions #2
   0.0857 (  0.1%)   0.0093 (  0.1%)   0.0950 (  0.1%)   0.0960 (  0.1%)  Loop Load Elimination
   0.0811 (  0.1%)   0.0139 (  0.2%)   0.0951 (  0.1%)   0.0953 (  0.1%)  Machine Block Frequency Analysis
   0.0808 (  0.1%)   0.0116 (  0.1%)   0.0924 (  0.1%)   0.0941 (  0.1%)  Dominator Tree Construction #20
   0.0825 (  0.1%)   0.0104 (  0.1%)   0.0929 (  0.1%)   0.0931 (  0.1%)  MachineDominator Tree Construction #8
   0.0785 (  0.1%)   0.0123 (  0.1%)   0.0907 (  0.1%)   0.0924 (  0.1%)  Constant Hoisting
   0.0759 (  0.1%)   0.0155 (  0.2%)   0.0914 (  0.1%)   0.0918 (  0.1%)  Dominator Tree Construction #25
   0.0855 (  0.1%)   0.0057 (  0.1%)   0.0911 (  0.1%)   0.0911 (  0.1%)  Dominator Tree Construction #17
   0.0787 (  0.1%)   0.0119 (  0.1%)   0.0906 (  0.1%)   0.0909 (  0.1%)  MachineDominator Tree Construction #2
   0.0780 (  0.1%)   0.0076 (  0.1%)   0.0856 (  0.1%)   0.0901 (  0.1%)  Dominator Tree Construction #9
   0.0760 (  0.1%)   0.0123 (  0.1%)   0.0883 (  0.1%)   0.0893 (  0.1%)  Dominator Tree Construction #13
   0.0815 (  0.1%)   0.0072 (  0.1%)   0.0887 (  0.1%)   0.0890 (  0.1%)  Dominator Tree Construction #16
   0.0779 (  0.1%)   0.0106 (  0.1%)   0.0885 (  0.1%)   0.0890 (  0.1%)  Machine Block Frequency Analysis #2
   0.0828 (  0.1%)   0.0000 (  0.0%)   0.0828 (  0.1%)   0.0884 (  0.1%)  CallGraph Construction #3
   0.0755 (  0.1%)   0.0118 (  0.1%)   0.0873 (  0.1%)   0.0880 (  0.1%)  Remove dead machine instructions #2
   0.0808 (  0.1%)   0.0061 (  0.1%)   0.0868 (  0.1%)   0.0874 (  0.1%)  X86 vzeroupper inserter #2
   0.0786 (  0.1%)   0.0070 (  0.1%)   0.0856 (  0.1%)   0.0860 (  0.1%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0738 (  0.1%)   0.0106 (  0.1%)   0.0844 (  0.1%)   0.0859 (  0.1%)  Dominator Tree Construction #21
   0.0742 (  0.1%)   0.0071 (  0.1%)   0.0813 (  0.1%)   0.0857 (  0.1%)  Dominator Tree Construction #8
   0.0673 (  0.1%)   0.0176 (  0.2%)   0.0849 (  0.1%)   0.0853 (  0.1%)  Function Alias Analysis Results #17
   0.0788 (  0.1%)   0.0005 (  0.0%)   0.0793 (  0.1%)   0.0843 (  0.1%)  CallGraph Construction #2
   0.0775 (  0.1%)   0.0062 (  0.1%)   0.0837 (  0.1%)   0.0841 (  0.1%)  Dominator Tree Construction #19
   0.0722 (  0.1%)   0.0107 (  0.1%)   0.0829 (  0.1%)   0.0835 (  0.1%)  Post-Dominator Tree Construction
   0.0732 (  0.1%)   0.0097 (  0.1%)   0.0828 (  0.1%)   0.0833 (  0.1%)  MachinePostDominator Tree Construction #3
   0.0688 (  0.1%)   0.0113 (  0.1%)   0.0801 (  0.1%)   0.0805 (  0.1%)  X86 LEA Optimize
   0.0694 (  0.1%)   0.0092 (  0.1%)   0.0786 (  0.1%)   0.0802 (  0.1%)  Dominator Tree Construction #22
   0.0700 (  0.1%)   0.0092 (  0.1%)   0.0792 (  0.1%)   0.0796 (  0.1%)  Machine Dominance Frontier Construction
   0.0688 (  0.1%)   0.0089 (  0.1%)   0.0777 (  0.1%)   0.0793 (  0.1%)  Dominator Tree Construction #23
   0.0690 (  0.1%)   0.0084 (  0.1%)   0.0774 (  0.1%)   0.0779 (  0.1%)  MachineDominator Tree Construction #7
   0.0698 (  0.1%)   0.0073 (  0.1%)   0.0771 (  0.1%)   0.0778 (  0.1%)  Loop-Closed SSA Form Pass #3
   0.0624 (  0.1%)   0.0147 (  0.2%)   0.0771 (  0.1%)   0.0776 (  0.1%)  Natural Loop Information #17
   0.0656 (  0.1%)   0.0104 (  0.1%)   0.0760 (  0.1%)   0.0769 (  0.1%)  MemCpy Optimization #2
   0.0681 (  0.1%)   0.0083 (  0.1%)   0.0765 (  0.1%)   0.0768 (  0.1%)  Slot index numbering #2
   0.0643 (  0.1%)   0.0107 (  0.1%)   0.0750 (  0.1%)   0.0764 (  0.1%)  Block Frequency Analysis
   0.0628 (  0.1%)   0.0096 (  0.1%)   0.0724 (  0.1%)   0.0761 (  0.1%)  Loop-Closed SSA Form Pass
   0.0645 (  0.1%)   0.0102 (  0.1%)   0.0747 (  0.1%)   0.0761 (  0.1%)  Scalar Evolution Analysis #8
   0.0662 (  0.1%)   0.0083 (  0.1%)   0.0745 (  0.1%)   0.0747 (  0.1%)  MachineDominator Tree Construction #9
   0.0646 (  0.1%)   0.0096 (  0.1%)   0.0742 (  0.1%)   0.0745 (  0.1%)  Machine Block Frequency Analysis #3
   0.0621 (  0.1%)   0.0107 (  0.1%)   0.0728 (  0.1%)   0.0744 (  0.1%)  Expand memcmp() to load/stores
   0.0638 (  0.1%)   0.0089 (  0.1%)   0.0728 (  0.1%)   0.0739 (  0.1%)  Natural Loop Information #13
   0.0643 (  0.1%)   0.0091 (  0.1%)   0.0734 (  0.1%)   0.0738 (  0.1%)  Dominator Tree Construction #14
   0.0626 (  0.1%)   0.0096 (  0.1%)   0.0722 (  0.1%)   0.0734 (  0.1%)  Block Frequency Analysis #2
   0.0612 (  0.1%)   0.0105 (  0.1%)   0.0717 (  0.1%)   0.0724 (  0.1%)  Natural Loop Information #8
   0.0621 (  0.1%)   0.0098 (  0.1%)   0.0718 (  0.1%)   0.0719 (  0.1%)  Slot index numbering
   0.0547 (  0.1%)   0.0160 (  0.2%)   0.0707 (  0.1%)   0.0707 (  0.1%)  Lower Julia Exception Handlers #2
   0.0638 (  0.1%)   0.0065 (  0.1%)   0.0703 (  0.1%)   0.0706 (  0.1%)  Stack Slot Coloring
   0.0641 (  0.1%)   0.0056 (  0.1%)   0.0697 (  0.1%)   0.0702 (  0.1%)  Unswitch loops
   0.0542 (  0.1%)   0.0108 (  0.1%)   0.0650 (  0.1%)   0.0693 (  0.1%)  SROA #2
   0.0593 (  0.1%)   0.0088 (  0.1%)   0.0682 (  0.1%)   0.0686 (  0.1%)  MachineDominator Tree Construction #3
   0.0595 (  0.1%)   0.0079 (  0.1%)   0.0674 (  0.1%)   0.0679 (  0.1%)  MachineDominator Tree Construction #6
   0.0559 (  0.1%)   0.0105 (  0.1%)   0.0664 (  0.1%)   0.0667 (  0.1%)  Remove unreachable machine basic blocks
   0.0602 (  0.1%)   0.0048 (  0.1%)   0.0650 (  0.1%)   0.0654 (  0.1%)  Loop-Closed SSA Form Pass #4
   0.0566 (  0.1%)   0.0074 (  0.1%)   0.0639 (  0.1%)   0.0644 (  0.1%)  Dominator Tree Construction #15
   0.0555 (  0.1%)   0.0080 (  0.1%)   0.0635 (  0.1%)   0.0636 (  0.1%)  Machine Block Frequency Analysis #4
   0.0552 (  0.1%)   0.0081 (  0.1%)   0.0633 (  0.1%)   0.0634 (  0.1%)  MachineDominator Tree Construction #4
   0.0551 (  0.1%)   0.0077 (  0.1%)   0.0629 (  0.1%)   0.0633 (  0.1%)  MachineDominator Tree Construction #5
   0.0530 (  0.1%)   0.0098 (  0.1%)   0.0628 (  0.1%)   0.0632 (  0.1%)  Scalar Evolution Analysis #3
   0.0570 (  0.1%)   0.0054 (  0.1%)   0.0624 (  0.1%)   0.0624 (  0.1%)  Natural Loop Information #12
   0.0526 (  0.1%)   0.0070 (  0.1%)   0.0596 (  0.1%)   0.0608 (  0.1%)  Natural Loop Information #14
   0.0459 (  0.1%)   0.0115 (  0.1%)   0.0574 (  0.1%)   0.0604 (  0.1%)  Scalar Evolution Analysis
   0.0511 (  0.1%)   0.0089 (  0.1%)   0.0600 (  0.1%)   0.0603 (  0.1%)  Machine Natural Loop Construction
   0.0469 (  0.1%)   0.0123 (  0.1%)   0.0592 (  0.1%)   0.0597 (  0.1%)  Basic Alias Analysis (stateless AA impl) #16
   0.0551 (  0.1%)   0.0036 (  0.0%)   0.0587 (  0.1%)   0.0592 (  0.1%)  Unroll loops
   0.0492 (  0.1%)   0.0080 (  0.1%)   0.0572 (  0.1%)   0.0585 (  0.1%)  Canonicalize natural loops #6
   0.0479 (  0.1%)   0.0052 (  0.1%)   0.0531 (  0.1%)   0.0562 (  0.1%)  Natural Loop Information #3
   0.0453 (  0.1%)   0.0092 (  0.1%)   0.0546 (  0.1%)   0.0558 (  0.1%)  Function Alias Analysis Results #16
   0.0483 (  0.1%)   0.0071 (  0.1%)   0.0554 (  0.1%)   0.0556 (  0.1%)  Machine Natural Loop Construction #4
   0.0467 (  0.1%)   0.0076 (  0.1%)   0.0543 (  0.1%)   0.0549 (  0.1%)  PostRA Machine Sink
   0.0430 (  0.1%)   0.0081 (  0.1%)   0.0511 (  0.1%)   0.0544 (  0.1%)  Dominator Tree Construction #10
   0.0462 (  0.1%)   0.0076 (  0.1%)   0.0538 (  0.1%)   0.0539 (  0.1%)  BreakFalseDeps
   0.0473 (  0.1%)   0.0065 (  0.1%)   0.0538 (  0.1%)   0.0537 (  0.1%)  Function Alias Analysis Results #14
   0.0437 (  0.1%)   0.0076 (  0.1%)   0.0513 (  0.1%)   0.0525 (  0.1%)  Lower constant intrinsics #2
   0.0456 (  0.1%)   0.0067 (  0.1%)   0.0522 (  0.1%)   0.0524 (  0.1%)  Machine Natural Loop Construction #3
   0.0445 (  0.1%)   0.0077 (  0.1%)   0.0521 (  0.1%)   0.0524 (  0.1%)  X86 cmov Conversion
   0.0477 (  0.1%)   0.0046 (  0.1%)   0.0523 (  0.1%)   0.0524 (  0.1%)  Natural Loop Information #11
   0.0433 (  0.1%)   0.0081 (  0.1%)   0.0514 (  0.1%)   0.0517 (  0.1%)  Early Tail Duplication
   0.0443 (  0.1%)   0.0069 (  0.1%)   0.0512 (  0.1%)   0.0515 (  0.1%)  Post-RA pseudo instruction expansion pass #2
   0.0445 (  0.1%)   0.0059 (  0.1%)   0.0504 (  0.1%)   0.0509 (  0.1%)  Machine Loop Invariant Code Motion
   0.0447 (  0.1%)   0.0057 (  0.1%)   0.0503 (  0.1%)   0.0507 (  0.1%)  Function Alias Analysis Results #15
   0.0394 (  0.1%)   0.0107 (  0.1%)   0.0501 (  0.1%)   0.0507 (  0.1%)  Exception handling preparation #2
   0.0462 (  0.1%)   0.0041 (  0.0%)   0.0503 (  0.1%)   0.0503 (  0.1%)  Two-Address instruction pass
   0.0439 (  0.1%)   0.0058 (  0.1%)   0.0497 (  0.1%)   0.0500 (  0.1%)  Machine Natural Loop Construction #5
   0.0426 (  0.1%)   0.0067 (  0.1%)   0.0493 (  0.1%)   0.0498 (  0.1%)  Natural Loop Information #9
   0.0424 (  0.1%)   0.0063 (  0.1%)   0.0488 (  0.1%)   0.0496 (  0.1%)  Natural Loop Information #16
   0.0386 (  0.1%)   0.0080 (  0.1%)   0.0466 (  0.1%)   0.0492 (  0.1%)  Natural Loop Information #5
   0.0398 (  0.1%)   0.0090 (  0.1%)   0.0488 (  0.1%)   0.0491 (  0.1%)  Function Alias Analysis Results #8
   0.0400 (  0.1%)   0.0089 (  0.1%)   0.0488 (  0.1%)   0.0489 (  0.1%)  Scalar Evolution Analysis #5
   0.0418 (  0.1%)   0.0067 (  0.1%)   0.0485 (  0.1%)   0.0488 (  0.1%)  Shrink Wrapping analysis
   0.0380 (  0.1%)   0.0077 (  0.1%)   0.0457 (  0.1%)   0.0485 (  0.1%)  Natural Loop Information #6
   0.0400 (  0.1%)   0.0085 (  0.1%)   0.0485 (  0.1%)   0.0484 (  0.1%)  Scalar Evolution Analysis #7
   0.0394 (  0.1%)   0.0088 (  0.1%)   0.0483 (  0.1%)   0.0481 (  0.1%)  Scalar Evolution Analysis #4
   0.0410 (  0.1%)   0.0068 (  0.1%)   0.0477 (  0.1%)   0.0480 (  0.1%)  Canonicalize natural loops #4
   0.0397 (  0.1%)   0.0072 (  0.1%)   0.0469 (  0.1%)   0.0480 (  0.1%)  Partially inline calls to library functions
   0.0409 (  0.1%)   0.0066 (  0.1%)   0.0476 (  0.1%)   0.0479 (  0.1%)  Natural Loop Information #10
   0.0387 (  0.1%)   0.0089 (  0.1%)   0.0476 (  0.1%)   0.0478 (  0.1%)  Function Alias Analysis Results #6
   0.0365 (  0.1%)   0.0084 (  0.1%)   0.0449 (  0.1%)   0.0476 (  0.1%)  Canonicalize natural loops
   0.0404 (  0.1%)   0.0060 (  0.1%)   0.0464 (  0.1%)   0.0475 (  0.1%)  Natural Loop Information #15
   0.0423 (  0.1%)   0.0043 (  0.0%)   0.0467 (  0.1%)   0.0472 (  0.1%)  Dead Code Elimination #2
   0.0379 (  0.1%)   0.0088 (  0.1%)   0.0467 (  0.1%)   0.0471 (  0.1%)  Function Alias Analysis Results #7
   0.0376 (  0.1%)   0.0087 (  0.1%)   0.0463 (  0.1%)   0.0466 (  0.1%)  Function Alias Analysis Results #12
   0.0382 (  0.1%)   0.0059 (  0.1%)   0.0442 (  0.1%)   0.0462 (  0.1%)  Function Alias Analysis Results #2
   0.0394 (  0.1%)   0.0064 (  0.1%)   0.0458 (  0.1%)   0.0460 (  0.1%)  Machine Natural Loop Construction #2
   0.0386 (  0.1%)   0.0068 (  0.1%)   0.0454 (  0.1%)   0.0458 (  0.1%)  X86 EFLAGS copy lowering #2
   0.0383 (  0.1%)   0.0071 (  0.1%)   0.0454 (  0.1%)   0.0457 (  0.1%)  SROA #3
   0.0335 (  0.1%)   0.0094 (  0.1%)   0.0429 (  0.1%)   0.0457 (  0.1%)  Function Alias Analysis Results #3
   0.0333 (  0.1%)   0.0094 (  0.1%)   0.0427 (  0.1%)   0.0456 (  0.1%)  Function Alias Analysis Results #4
   0.0366 (  0.1%)   0.0084 (  0.1%)   0.0450 (  0.1%)   0.0454 (  0.1%)  Function Alias Analysis Results #10
   0.0370 (  0.1%)   0.0072 (  0.1%)   0.0443 (  0.1%)   0.0452 (  0.1%)  Merge contiguous icmps into a memcmp
   0.0360 (  0.1%)   0.0085 (  0.1%)   0.0445 (  0.1%)   0.0449 (  0.1%)  Function Alias Analysis Results #5
   0.0360 (  0.1%)   0.0082 (  0.1%)   0.0442 (  0.1%)   0.0445 (  0.1%)  Function Alias Analysis Results #9
   0.0371 (  0.1%)   0.0064 (  0.1%)   0.0435 (  0.1%)   0.0439 (  0.1%)  X86 Optimize Call Frame
   0.0356 (  0.1%)   0.0083 (  0.1%)   0.0439 (  0.1%)   0.0439 (  0.1%)  Scalar Evolution Analysis #2
   0.0353 (  0.1%)   0.0083 (  0.1%)   0.0436 (  0.1%)   0.0439 (  0.1%)  Function Alias Analysis Results #13
   0.0355 (  0.1%)   0.0081 (  0.1%)   0.0437 (  0.1%)   0.0439 (  0.1%)  Function Alias Analysis Results #11
   0.0375 (  0.1%)   0.0057 (  0.1%)   0.0432 (  0.1%)   0.0437 (  0.1%)  X86 LEA Fixup
   0.0367 (  0.1%)   0.0061 (  0.1%)   0.0428 (  0.1%)   0.0432 (  0.1%)  X86 Fixup SetCC
   0.0369 (  0.1%)   0.0060 (  0.1%)   0.0429 (  0.1%)   0.0431 (  0.1%)  Check CFA info and insert CFI instructions if needed #2
   0.0347 (  0.1%)   0.0076 (  0.1%)   0.0423 (  0.1%)   0.0424 (  0.1%)  Scalar Evolution Analysis #6
   0.0361 (  0.1%)   0.0058 (  0.1%)   0.0419 (  0.1%)   0.0423 (  0.1%)  X86 pseudo instruction expansion pass #2
   0.0343 (  0.1%)   0.0067 (  0.1%)   0.0411 (  0.1%)   0.0417 (  0.1%)  Live Register Matrix
   0.0343 (  0.1%)   0.0058 (  0.1%)   0.0401 (  0.1%)   0.0404 (  0.1%)  Finalize ISel and expand pseudo-instructions #2
   0.0331 (  0.1%)   0.0060 (  0.1%)   0.0391 (  0.1%)   0.0400 (  0.1%)  Interleaved Access Pass
   0.0321 (  0.1%)   0.0064 (  0.1%)   0.0386 (  0.1%)   0.0395 (  0.1%)  Basic Alias Analysis (stateless AA impl) #14
   0.0351 (  0.1%)   0.0039 (  0.0%)   0.0390 (  0.1%)   0.0391 (  0.1%)  Hoist/decompose integer division and remainder
   0.0323 (  0.1%)   0.0058 (  0.1%)   0.0381 (  0.1%)   0.0386 (  0.1%)  Tail Duplication
   0.0320 (  0.1%)   0.0061 (  0.1%)   0.0381 (  0.1%)   0.0386 (  0.1%)  Canonicalize natural loops #5
   0.0314 (  0.0%)   0.0064 (  0.1%)   0.0379 (  0.1%)   0.0380 (  0.1%)  Canonicalize natural loops #2
   0.0318 (  0.1%)   0.0052 (  0.1%)   0.0371 (  0.1%)   0.0371 (  0.1%)  Final GC intrinsic lowering pass
   0.0329 (  0.1%)   0.0040 (  0.0%)   0.0369 (  0.1%)   0.0370 (  0.1%)  Basic Alias Analysis (stateless AA impl) #13
   0.0305 (  0.0%)   0.0063 (  0.1%)   0.0368 (  0.1%)   0.0370 (  0.1%)  Canonicalize natural loops #3
   0.0303 (  0.0%)   0.0047 (  0.1%)   0.0351 (  0.0%)   0.0368 (  0.1%)  Basic Alias Analysis (stateless AA impl) #3
   0.0329 (  0.1%)   0.0037 (  0.0%)   0.0366 (  0.1%)   0.0367 (  0.1%)  Dominator Tree Construction #2
   0.0289 (  0.0%)   0.0055 (  0.1%)   0.0343 (  0.0%)   0.0344 (  0.0%)  MemCpy Optimization
   0.0272 (  0.0%)   0.0052 (  0.1%)   0.0324 (  0.0%)   0.0343 (  0.0%)  Natural Loop Information #4
   0.0278 (  0.0%)   0.0059 (  0.1%)   0.0337 (  0.0%)   0.0341 (  0.0%)  Machine Trace Metrics
   0.0278 (  0.0%)   0.0057 (  0.1%)   0.0335 (  0.0%)   0.0337 (  0.0%)  Inliner for always_inline functions
   0.0273 (  0.0%)   0.0053 (  0.1%)   0.0325 (  0.0%)   0.0333 (  0.0%)  Scalarize Masked Memory Intrinsics #2
   0.0276 (  0.0%)   0.0051 (  0.1%)   0.0327 (  0.0%)   0.0331 (  0.0%)  Spill Code Placement Analysis
   0.0270 (  0.0%)   0.0053 (  0.1%)   0.0322 (  0.0%)   0.0331 (  0.0%)  Expand reduction intrinsics #2
   0.0263 (  0.0%)   0.0054 (  0.1%)   0.0317 (  0.0%)   0.0323 (  0.0%)  Basic Alias Analysis (stateless AA impl) #16
   0.0266 (  0.0%)   0.0054 (  0.1%)   0.0320 (  0.0%)   0.0322 (  0.0%)  Post RA top-down list latency scheduler
   0.0257 (  0.0%)   0.0060 (  0.1%)   0.0317 (  0.0%)   0.0321 (  0.0%)  Basic Alias Analysis (stateless AA impl) #7
   0.0252 (  0.0%)   0.0061 (  0.1%)   0.0313 (  0.0%)   0.0317 (  0.0%)  Dominator Tree Construction #4
   0.0258 (  0.0%)   0.0040 (  0.0%)   0.0297 (  0.0%)   0.0315 (  0.0%)  Basic Alias Analysis (stateless AA impl) #4
   0.0246 (  0.0%)   0.0055 (  0.1%)   0.0301 (  0.0%)   0.0305 (  0.0%)  Basic Alias Analysis (stateless AA impl) #9
   0.0254 (  0.0%)   0.0047 (  0.1%)   0.0301 (  0.0%)   0.0303 (  0.0%)  Loop-Closed SSA Form Pass #2
   0.0257 (  0.0%)   0.0030 (  0.0%)   0.0287 (  0.0%)   0.0303 (  0.0%)  Promote heap allocation to stack
   0.0280 (  0.0%)   0.0000 (  0.0%)   0.0280 (  0.0%)   0.0298 (  0.0%)  Merge Duplicate Global Constants #2
   0.0218 (  0.0%)   0.0061 (  0.1%)   0.0279 (  0.0%)   0.0297 (  0.0%)  Basic Alias Analysis (stateless AA impl) #5
   0.0247 (  0.0%)   0.0045 (  0.0%)   0.0292 (  0.0%)   0.0296 (  0.0%)  Promote heap allocation to stack #3
   0.0241 (  0.0%)   0.0045 (  0.0%)   0.0285 (  0.0%)   0.0292 (  0.0%)  Remove unreachable blocks from the CFG #2
   0.0193 (  0.0%)   0.0084 (  0.1%)   0.0277 (  0.0%)   0.0277 (  0.0%)  Free MachineFunction
   0.0231 (  0.0%)   0.0044 (  0.0%)   0.0275 (  0.0%)   0.0276 (  0.0%)  Dominator Tree Construction
   0.0225 (  0.0%)   0.0041 (  0.0%)   0.0266 (  0.0%)   0.0269 (  0.0%)  Bundle Machine CFG Edges #2
   0.0248 (  0.0%)   0.0019 (  0.0%)   0.0267 (  0.0%)   0.0268 (  0.0%)  MachineDominator Tree Construction
   0.0235 (  0.0%)   0.0028 (  0.0%)   0.0263 (  0.0%)   0.0264 (  0.0%)  Memory Dependence Analysis #5
   0.0215 (  0.0%)   0.0045 (  0.0%)   0.0260 (  0.0%)   0.0263 (  0.0%)  Memory Dependence Analysis #2
   0.0218 (  0.0%)   0.0040 (  0.0%)   0.0258 (  0.0%)   0.0261 (  0.0%)  Process Implicit Definitions
   0.0213 (  0.0%)   0.0043 (  0.0%)   0.0256 (  0.0%)   0.0259 (  0.0%)  Optimize machine instruction PHIs
   0.0205 (  0.0%)   0.0045 (  0.0%)   0.0250 (  0.0%)   0.0257 (  0.0%)  Expand indirectbr instructions #2
   0.0212 (  0.0%)   0.0039 (  0.0%)   0.0250 (  0.0%)   0.0253 (  0.0%)  Bundle Machine CFG Edges #3
   0.0190 (  0.0%)   0.0046 (  0.0%)   0.0236 (  0.0%)   0.0251 (  0.0%)  Promote heap allocation to stack #2
   0.0207 (  0.0%)   0.0040 (  0.0%)   0.0247 (  0.0%)   0.0248 (  0.0%)  Live Stack Slot Analysis
   0.0181 (  0.0%)   0.0051 (  0.1%)   0.0231 (  0.0%)   0.0246 (  0.0%)  Basic Alias Analysis (stateless AA impl) #6
   0.0217 (  0.0%)   0.0030 (  0.0%)   0.0247 (  0.0%)   0.0245 (  0.0%)  Basic Alias Analysis (stateless AA impl) #12
   0.0200 (  0.0%)   0.0043 (  0.0%)   0.0243 (  0.0%)   0.0243 (  0.0%)  Insert stack protectors
   0.0193 (  0.0%)   0.0045 (  0.0%)   0.0238 (  0.0%)   0.0243 (  0.0%)  Basic Alias Analysis (stateless AA impl) #11
   0.0194 (  0.0%)   0.0044 (  0.0%)   0.0238 (  0.0%)   0.0241 (  0.0%)  Basic Alias Analysis (stateless AA impl) #10
   0.0212 (  0.0%)   0.0027 (  0.0%)   0.0239 (  0.0%)   0.0240 (  0.0%)  Dominator Tree Construction #5
   0.0196 (  0.0%)   0.0040 (  0.0%)   0.0236 (  0.0%)   0.0239 (  0.0%)  Insert fentry calls #2
   0.0189 (  0.0%)   0.0046 (  0.1%)   0.0236 (  0.0%)   0.0237 (  0.0%)  Lazy Branch Probability Analysis #13
   0.0194 (  0.0%)   0.0041 (  0.0%)   0.0236 (  0.0%)   0.0237 (  0.0%)  Lazy Value Information Analysis #2
   0.0198 (  0.0%)   0.0037 (  0.0%)   0.0235 (  0.0%)   0.0237 (  0.0%)  Promote heap allocation to stack #4
   0.0207 (  0.0%)   0.0026 (  0.0%)   0.0233 (  0.0%)   0.0233 (  0.0%)  Dominator Tree Construction #3
   0.0174 (  0.0%)   0.0049 (  0.1%)   0.0223 (  0.0%)   0.0223 (  0.0%)  GC Invariant Verification Pass #2
   0.0179 (  0.0%)   0.0037 (  0.0%)   0.0216 (  0.0%)   0.0218 (  0.0%)  Virtual Register Map
   0.0175 (  0.0%)   0.0040 (  0.0%)   0.0214 (  0.0%)   0.0217 (  0.0%)  Basic Alias Analysis (stateless AA impl) #8
   0.0193 (  0.0%)   0.0021 (  0.0%)   0.0214 (  0.0%)   0.0217 (  0.0%)  Combine mul and add to muladd
   0.0175 (  0.0%)   0.0038 (  0.0%)   0.0213 (  0.0%)   0.0216 (  0.0%)  Local Dynamic TLS Access Clean-up
   0.0156 (  0.0%)   0.0042 (  0.0%)   0.0198 (  0.0%)   0.0211 (  0.0%)  Lazy Value Information Analysis
   0.0170 (  0.0%)   0.0035 (  0.0%)   0.0205 (  0.0%)   0.0210 (  0.0%)  X86 Indirect Branch Tracking #2
   0.0153 (  0.0%)   0.0042 (  0.0%)   0.0195 (  0.0%)   0.0209 (  0.0%)  Lazy Branch Probability Analysis #3
   0.0161 (  0.0%)   0.0035 (  0.0%)   0.0196 (  0.0%)   0.0208 (  0.0%)  Lazy Branch Probability Analysis #12
   0.0165 (  0.0%)   0.0037 (  0.0%)   0.0201 (  0.0%)   0.0207 (  0.0%)  Early If-Conversion
   0.0166 (  0.0%)   0.0034 (  0.0%)   0.0201 (  0.0%)   0.0203 (  0.0%)  Machine Optimization Remark Emitter #4
   0.0162 (  0.0%)   0.0034 (  0.0%)   0.0196 (  0.0%)   0.0201 (  0.0%)  X86 Indirect Thunks #2
   0.0160 (  0.0%)   0.0034 (  0.0%)   0.0194 (  0.0%)   0.0200 (  0.0%)  Insert XRay ops #2
   0.0160 (  0.0%)   0.0033 (  0.0%)   0.0194 (  0.0%)   0.0199 (  0.0%)  Machine Optimization Remark Emitter #3
   0.0163 (  0.0%)   0.0035 (  0.0%)   0.0198 (  0.0%)   0.0199 (  0.0%)  Loop Access Analysis #2
   0.0161 (  0.0%)   0.0036 (  0.0%)   0.0197 (  0.0%)   0.0199 (  0.0%)  X86 speculative load hardening #2
   0.0161 (  0.0%)   0.0034 (  0.0%)   0.0195 (  0.0%)   0.0199 (  0.0%)  Implement the 'patchable-function' attribute #2
   0.0160 (  0.0%)   0.0036 (  0.0%)   0.0196 (  0.0%)   0.0198 (  0.0%)  Loop Access Analysis
   0.0157 (  0.0%)   0.0035 (  0.0%)   0.0192 (  0.0%)   0.0198 (  0.0%)  X86 Domain Reassignment Pass
   0.0157 (  0.0%)   0.0035 (  0.0%)   0.0192 (  0.0%)   0.0198 (  0.0%)  X86 PIC Global Base Reg Initialization #2
   0.0170 (  0.0%)   0.0023 (  0.0%)   0.0193 (  0.0%)   0.0196 (  0.0%)  Phi Values Analysis #5
   0.0158 (  0.0%)   0.0037 (  0.0%)   0.0195 (  0.0%)   0.0196 (  0.0%)  Lazy Branch Probability Analysis #4
   0.0157 (  0.0%)   0.0036 (  0.0%)   0.0193 (  0.0%)   0.0195 (  0.0%)  Memory Dependence Analysis #4
   0.0157 (  0.0%)   0.0024 (  0.0%)   0.0181 (  0.0%)   0.0194 (  0.0%)  Lazy Branch Probability Analysis #2
   0.0166 (  0.0%)   0.0027 (  0.0%)   0.0193 (  0.0%)   0.0194 (  0.0%)  Natural Loop Information
   0.0155 (  0.0%)   0.0037 (  0.0%)   0.0192 (  0.0%)   0.0193 (  0.0%)  Machine Optimization Remark Emitter #5
   0.0170 (  0.0%)   0.0023 (  0.0%)   0.0192 (  0.0%)   0.0193 (  0.0%)  Lazy Branch Probability Analysis #11
   0.0153 (  0.0%)   0.0033 (  0.0%)   0.0186 (  0.0%)   0.0193 (  0.0%)  Contiguously Lay Out Funclets #2
   0.0151 (  0.0%)   0.0036 (  0.0%)   0.0188 (  0.0%)   0.0192 (  0.0%)  Lazy Block Frequency Analysis #13
   0.0154 (  0.0%)   0.0032 (  0.0%)   0.0186 (  0.0%)   0.0190 (  0.0%)  StackMap Liveness Analysis #2
   0.0152 (  0.0%)   0.0036 (  0.0%)   0.0188 (  0.0%)   0.0189 (  0.0%)  Memory Dependence Analysis #3
   0.0172 (  0.0%)   0.0016 (  0.0%)   0.0187 (  0.0%)   0.0188 (  0.0%)  Eliminate PHI nodes for register allocation
   0.0161 (  0.0%)   0.0023 (  0.0%)   0.0185 (  0.0%)   0.0186 (  0.0%)  Lazy Branch Probability Analysis #10
   0.0150 (  0.0%)   0.0032 (  0.0%)   0.0182 (  0.0%)   0.0186 (  0.0%)  X86 Load Value Injection (LVI) Load Hardening
   0.0149 (  0.0%)   0.0032 (  0.0%)   0.0181 (  0.0%)   0.0186 (  0.0%)  X86 Atom pad short functions
   0.0151 (  0.0%)   0.0032 (  0.0%)   0.0183 (  0.0%)   0.0186 (  0.0%)  X86 FP Stackifier #2
   0.0147 (  0.0%)   0.0033 (  0.0%)   0.0180 (  0.0%)   0.0184 (  0.0%)  Lazy Machine Block Frequency Analysis #3
   0.0147 (  0.0%)   0.0035 (  0.0%)   0.0182 (  0.0%)   0.0184 (  0.0%)  Lazy Branch Probability Analysis #5
   0.0145 (  0.0%)   0.0035 (  0.0%)   0.0179 (  0.0%)   0.0184 (  0.0%)  Lazy Machine Block Frequency Analysis #11
   0.0146 (  0.0%)   0.0032 (  0.0%)   0.0178 (  0.0%)   0.0183 (  0.0%)  Local Stack Slot Allocation #2
   0.0146 (  0.0%)   0.0035 (  0.0%)   0.0181 (  0.0%)   0.0183 (  0.0%)  Lazy Branch Probability Analysis #7
   0.0145 (  0.0%)   0.0031 (  0.0%)   0.0176 (  0.0%)   0.0181 (  0.0%)  Rename Disconnected Subregister Components
   0.0145 (  0.0%)   0.0033 (  0.0%)   0.0177 (  0.0%)   0.0181 (  0.0%)  Lazy Machine Block Frequency Analysis #4
   0.0144 (  0.0%)   0.0031 (  0.0%)   0.0174 (  0.0%)   0.0180 (  0.0%)  X86 Load Value Injection (LVI) Ret-Hardening #2
   0.0144 (  0.0%)   0.0031 (  0.0%)   0.0175 (  0.0%)   0.0180 (  0.0%)  X86 Insert Cache Prefetches #2
   0.0143 (  0.0%)   0.0031 (  0.0%)   0.0174 (  0.0%)   0.0178 (  0.0%)  X86 WinAlloca Expander #2
   0.0142 (  0.0%)   0.0030 (  0.0%)   0.0172 (  0.0%)   0.0177 (  0.0%)  Lazy Machine Block Frequency Analysis #8
   0.0142 (  0.0%)   0.0033 (  0.0%)   0.0175 (  0.0%)   0.0177 (  0.0%)  Lazy Branch Probability Analysis #6
   0.0142 (  0.0%)   0.0030 (  0.0%)   0.0172 (  0.0%)   0.0176 (  0.0%)  Analyze Machine Code For Garbage Collection #2
   0.0155 (  0.0%)   0.0020 (  0.0%)   0.0175 (  0.0%)   0.0175 (  0.0%)  Expand Atomic instructions
   0.0139 (  0.0%)   0.0030 (  0.0%)   0.0169 (  0.0%)   0.0175 (  0.0%)  Lazy Machine Block Frequency Analysis #7
   0.0140 (  0.0%)   0.0033 (  0.0%)   0.0174 (  0.0%)   0.0175 (  0.0%)  Lazy Branch Probability Analysis #9
   0.0140 (  0.0%)   0.0031 (  0.0%)   0.0170 (  0.0%)   0.0174 (  0.0%)  Lazy Machine Block Frequency Analysis #10
   0.0140 (  0.0%)   0.0030 (  0.0%)   0.0170 (  0.0%)   0.0174 (  0.0%)  Lazy Machine Block Frequency Analysis #6
   0.0138 (  0.0%)   0.0033 (  0.0%)   0.0171 (  0.0%)   0.0173 (  0.0%)  Lazy Branch Probability Analysis #8
   0.0138 (  0.0%)   0.0032 (  0.0%)   0.0170 (  0.0%)   0.0173 (  0.0%)  Phi Values Analysis #2
   0.0131 (  0.0%)   0.0036 (  0.0%)   0.0167 (  0.0%)   0.0173 (  0.0%)  Safe Stack instrumentation pass #2
   0.0139 (  0.0%)   0.0030 (  0.0%)   0.0168 (  0.0%)   0.0173 (  0.0%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining) #2
   0.0137 (  0.0%)   0.0029 (  0.0%)   0.0165 (  0.0%)   0.0171 (  0.0%)  Lazy Machine Block Frequency Analysis #9
   0.0137 (  0.0%)   0.0030 (  0.0%)   0.0167 (  0.0%)   0.0171 (  0.0%)  Compressing EVEX instrs to VEX encoding when possible
   0.0159 (  0.0%)   0.0011 (  0.0%)   0.0170 (  0.0%)   0.0171 (  0.0%)  Finalize ISel and expand pseudo-instructions
   0.0155 (  0.0%)   0.0015 (  0.0%)   0.0170 (  0.0%)   0.0171 (  0.0%)  Post-RA pseudo instruction expansion pass
   0.0136 (  0.0%)   0.0031 (  0.0%)   0.0167 (  0.0%)   0.0170 (  0.0%)  Lazy Machine Block Frequency Analysis #5
   0.0137 (  0.0%)   0.0029 (  0.0%)   0.0166 (  0.0%)   0.0170 (  0.0%)  Lazy Block Frequency Analysis #12
   0.0137 (  0.0%)   0.0030 (  0.0%)   0.0167 (  0.0%)   0.0169 (  0.0%)  X86 Avoid Store Forwarding Blocks
   0.0120 (  0.0%)   0.0034 (  0.0%)   0.0155 (  0.0%)   0.0167 (  0.0%)  Lazy Block Frequency Analysis #3
   0.0141 (  0.0%)   0.0025 (  0.0%)   0.0166 (  0.0%)   0.0166 (  0.0%)  CallGraph Construction
   0.0151 (  0.0%)   0.0014 (  0.0%)   0.0165 (  0.0%)   0.0166 (  0.0%)  LICM for julia specific intrinsics.
   0.0132 (  0.0%)   0.0029 (  0.0%)   0.0162 (  0.0%)   0.0166 (  0.0%)  Detect Dead Lanes
   0.0121 (  0.0%)   0.0034 (  0.0%)   0.0155 (  0.0%)   0.0165 (  0.0%)  Optimization Remark Emitter #2
   0.0143 (  0.0%)   0.0021 (  0.0%)   0.0164 (  0.0%)   0.0165 (  0.0%)  Optimization Remark Emitter #10
   0.0134 (  0.0%)   0.0029 (  0.0%)   0.0163 (  0.0%)   0.0165 (  0.0%)  X86 Discriminate Memory Operands #2
   0.0145 (  0.0%)   0.0019 (  0.0%)   0.0164 (  0.0%)   0.0165 (  0.0%)  Optimization Remark Emitter #11
   0.0134 (  0.0%)   0.0021 (  0.0%)   0.0155 (  0.0%)   0.0165 (  0.0%)  Lazy Block Frequency Analysis #2
   0.0152 (  0.0%)   0.0012 (  0.0%)   0.0164 (  0.0%)   0.0165 (  0.0%)  X86 vzeroupper inserter
   0.0130 (  0.0%)   0.0031 (  0.0%)   0.0162 (  0.0%)   0.0164 (  0.0%)  Optimization Remark Emitter #7
   0.0131 (  0.0%)   0.0031 (  0.0%)   0.0162 (  0.0%)   0.0164 (  0.0%)  Phi Values Analysis #4
   0.0142 (  0.0%)   0.0019 (  0.0%)   0.0161 (  0.0%)   0.0163 (  0.0%)  Lazy Block Frequency Analysis #11
   0.0131 (  0.0%)   0.0032 (  0.0%)   0.0162 (  0.0%)   0.0162 (  0.0%)  Demanded bits analysis
   0.0134 (  0.0%)   0.0021 (  0.0%)   0.0155 (  0.0%)   0.0162 (  0.0%)  Optimization Remark Emitter
   0.0128 (  0.0%)   0.0030 (  0.0%)   0.0158 (  0.0%)   0.0161 (  0.0%)  Phi Values Analysis #3
   0.0128 (  0.0%)   0.0030 (  0.0%)   0.0158 (  0.0%)   0.0160 (  0.0%)  Optimization Remark Emitter #8
   0.0138 (  0.0%)   0.0020 (  0.0%)   0.0158 (  0.0%)   0.0159 (  0.0%)  Lazy Block Frequency Analysis #10
   0.0127 (  0.0%)   0.0031 (  0.0%)   0.0157 (  0.0%)   0.0159 (  0.0%)  Optimization Remark Emitter #3
   0.0126 (  0.0%)   0.0027 (  0.0%)   0.0153 (  0.0%)   0.0159 (  0.0%)  Lower Garbage Collection Instructions #2
   0.0127 (  0.0%)   0.0030 (  0.0%)   0.0157 (  0.0%)   0.0159 (  0.0%)  Demanded bits analysis #2
   0.0126 (  0.0%)   0.0027 (  0.0%)   0.0153 (  0.0%)   0.0158 (  0.0%)  Shadow Stack GC Lowering #2
   0.0126 (  0.0%)   0.0030 (  0.0%)   0.0156 (  0.0%)   0.0157 (  0.0%)  Lazy Block Frequency Analysis #7
   0.0125 (  0.0%)   0.0030 (  0.0%)   0.0155 (  0.0%)   0.0156 (  0.0%)  Optimization Remark Emitter #9
   0.0125 (  0.0%)   0.0030 (  0.0%)   0.0155 (  0.0%)   0.0156 (  0.0%)  Lazy Block Frequency Analysis #4
   0.0124 (  0.0%)   0.0030 (  0.0%)   0.0154 (  0.0%)   0.0156 (  0.0%)  Optimization Remark Emitter #4
   0.0125 (  0.0%)   0.0029 (  0.0%)   0.0155 (  0.0%)   0.0156 (  0.0%)  Optimization Remark Emitter #6
   0.0114 (  0.0%)   0.0033 (  0.0%)   0.0147 (  0.0%)   0.0155 (  0.0%)  LCSSA Verifier
   0.0142 (  0.0%)   0.0013 (  0.0%)   0.0155 (  0.0%)   0.0154 (  0.0%)  X86 EFLAGS copy lowering
   0.0123 (  0.0%)   0.0029 (  0.0%)   0.0153 (  0.0%)   0.0154 (  0.0%)  Optimization Remark Emitter #5
   0.0122 (  0.0%)   0.0029 (  0.0%)   0.0151 (  0.0%)   0.0154 (  0.0%)  Lazy Block Frequency Analysis #9
   0.0119 (  0.0%)   0.0029 (  0.0%)   0.0148 (  0.0%)   0.0154 (  0.0%)  LCSSA Verifier #2
   0.0121 (  0.0%)   0.0029 (  0.0%)   0.0150 (  0.0%)   0.0152 (  0.0%)  Lazy Block Frequency Analysis #5
   0.0122 (  0.0%)   0.0029 (  0.0%)   0.0151 (  0.0%)   0.0152 (  0.0%)  Lazy Block Frequency Analysis #6
   0.0119 (  0.0%)   0.0029 (  0.0%)   0.0148 (  0.0%)   0.0150 (  0.0%)  Lazy Block Frequency Analysis #8
   0.0128 (  0.0%)   0.0018 (  0.0%)   0.0146 (  0.0%)   0.0146 (  0.0%)  LowerPTLS Pass
   0.0116 (  0.0%)   0.0028 (  0.0%)   0.0144 (  0.0%)   0.0145 (  0.0%)  LCSSA Verifier #3
   0.0116 (  0.0%)   0.0027 (  0.0%)   0.0143 (  0.0%)   0.0145 (  0.0%)  LCSSA Verifier #4
   0.0125 (  0.0%)   0.0016 (  0.0%)   0.0141 (  0.0%)   0.0142 (  0.0%)  Lower constant intrinsics
   0.0116 (  0.0%)   0.0010 (  0.0%)   0.0126 (  0.0%)   0.0128 (  0.0%)  LICM for julia specific intrinsics. #2
   0.0117 (  0.0%)   0.0010 (  0.0%)   0.0127 (  0.0%)   0.0127 (  0.0%)  X86 pseudo instruction expansion pass
   0.0110 (  0.0%)   0.0011 (  0.0%)   0.0121 (  0.0%)   0.0121 (  0.0%)  Check CFA info and insert CFI instructions if needed
   0.0092 (  0.0%)   0.0023 (  0.0%)   0.0115 (  0.0%)   0.0116 (  0.0%)  Merge Duplicate Global Constants
   0.0081 (  0.0%)   0.0007 (  0.0%)   0.0089 (  0.0%)   0.0090 (  0.0%)  Delete dead loops
   0.0078 (  0.0%)   0.0011 (  0.0%)   0.0089 (  0.0%)   0.0089 (  0.0%)  Scalarize Masked Memory Intrinsics
   0.0076 (  0.0%)   0.0008 (  0.0%)   0.0084 (  0.0%)   0.0086 (  0.0%)  Delete dead loops #2
   0.0074 (  0.0%)   0.0011 (  0.0%)   0.0085 (  0.0%)   0.0085 (  0.0%)  Expand reduction intrinsics
   0.0068 (  0.0%)   0.0012 (  0.0%)   0.0079 (  0.0%)   0.0079 (  0.0%)  GC Invariant Verification Pass
   0.0060 (  0.0%)   0.0014 (  0.0%)   0.0073 (  0.0%)   0.0073 (  0.0%)  Function Alias Analysis Results
   0.0054 (  0.0%)   0.0014 (  0.0%)   0.0068 (  0.0%)   0.0068 (  0.0%)  Exception handling preparation
   0.0059 (  0.0%)   0.0008 (  0.0%)   0.0067 (  0.0%)   0.0067 (  0.0%)  Remove unreachable blocks from the CFG
   0.0048 (  0.0%)   0.0019 (  0.0%)   0.0067 (  0.0%)   0.0067 (  0.0%)  Assumption Cache Tracker
   0.0061 (  0.0%)   0.0000 (  0.0%)   0.0061 (  0.0%)   0.0061 (  0.0%)  Assumption Cache Tracker #2
   0.0048 (  0.0%)   0.0011 (  0.0%)   0.0059 (  0.0%)   0.0059 (  0.0%)  Remove non-integral address space.
   0.0047 (  0.0%)   0.0006 (  0.0%)   0.0054 (  0.0%)   0.0054 (  0.0%)  Bundle Machine CFG Edges
   0.0041 (  0.0%)   0.0009 (  0.0%)   0.0050 (  0.0%)   0.0050 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0036 (  0.0%)   0.0009 (  0.0%)   0.0045 (  0.0%)   0.0045 (  0.0%)  Lazy Branch Probability Analysis
   0.0037 (  0.0%)   0.0007 (  0.0%)   0.0044 (  0.0%)   0.0044 (  0.0%)  Basic Alias Analysis (stateless AA impl) #2
   0.0035 (  0.0%)   0.0006 (  0.0%)   0.0041 (  0.0%)   0.0042 (  0.0%)  X86 Indirect Branch Tracking
   0.0035 (  0.0%)   0.0007 (  0.0%)   0.0042 (  0.0%)   0.0042 (  0.0%)  Lower Julia Exception Handlers
   0.0033 (  0.0%)   0.0008 (  0.0%)   0.0041 (  0.0%)   0.0041 (  0.0%)  Memory Dependence Analysis
   0.0034 (  0.0%)   0.0006 (  0.0%)   0.0040 (  0.0%)   0.0040 (  0.0%)  Expand indirectbr instructions
   0.0031 (  0.0%)   0.0005 (  0.0%)   0.0036 (  0.0%)   0.0036 (  0.0%)  X86 PIC Global Base Reg Initialization
   0.0030 (  0.0%)   0.0005 (  0.0%)   0.0035 (  0.0%)   0.0035 (  0.0%)  Insert fentry calls
   0.0028 (  0.0%)   0.0005 (  0.0%)   0.0034 (  0.0%)   0.0034 (  0.0%)  Machine Optimization Remark Emitter
   0.0026 (  0.0%)   0.0006 (  0.0%)   0.0032 (  0.0%)   0.0032 (  0.0%)  Phi Values Analysis
   0.0025 (  0.0%)   0.0005 (  0.0%)   0.0031 (  0.0%)   0.0032 (  0.0%)  Lazy Block Frequency Analysis
   0.0026 (  0.0%)   0.0005 (  0.0%)   0.0031 (  0.0%)   0.0031 (  0.0%)  Contiguously Lay Out Funclets
   0.0026 (  0.0%)   0.0005 (  0.0%)   0.0030 (  0.0%)   0.0031 (  0.0%)  X86 speculative load hardening
   0.0024 (  0.0%)   0.0006 (  0.0%)   0.0030 (  0.0%)   0.0030 (  0.0%)  Machine Optimization Remark Emitter #2
   0.0025 (  0.0%)   0.0005 (  0.0%)   0.0030 (  0.0%)   0.0030 (  0.0%)  Insert XRay ops
   0.0026 (  0.0%)   0.0004 (  0.0%)   0.0030 (  0.0%)   0.0030 (  0.0%)  X86 FP Stackifier
   0.0024 (  0.0%)   0.0005 (  0.0%)   0.0029 (  0.0%)   0.0029 (  0.0%)  X86 WinAlloca Expander
   0.0025 (  0.0%)   0.0004 (  0.0%)   0.0029 (  0.0%)   0.0029 (  0.0%)  Implement the 'patchable-function' attribute
   0.0024 (  0.0%)   0.0004 (  0.0%)   0.0029 (  0.0%)   0.0029 (  0.0%)  Lazy Machine Block Frequency Analysis
   0.0025 (  0.0%)   0.0004 (  0.0%)   0.0029 (  0.0%)   0.0029 (  0.0%)  X86 Indirect Thunks
   0.0024 (  0.0%)   0.0005 (  0.0%)   0.0029 (  0.0%)   0.0029 (  0.0%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining)
   0.0024 (  0.0%)   0.0004 (  0.0%)   0.0028 (  0.0%)   0.0029 (  0.0%)  X86 Load Value Injection (LVI) Load Hardening (Unoptimized)
   0.0024 (  0.0%)   0.0004 (  0.0%)   0.0028 (  0.0%)   0.0029 (  0.0%)  Local Stack Slot Allocation
   0.0024 (  0.0%)   0.0004 (  0.0%)   0.0028 (  0.0%)   0.0029 (  0.0%)  StackMap Liveness Analysis
   0.0023 (  0.0%)   0.0004 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)  X86 Load Value Injection (LVI) Ret-Hardening
   0.0022 (  0.0%)   0.0005 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  Lazy Machine Block Frequency Analysis #2
   0.0023 (  0.0%)   0.0004 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0023 (  0.0%)   0.0004 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  X86 Discriminate Memory Operands
   0.0021 (  0.0%)   0.0006 (  0.0%)   0.0027 (  0.0%)   0.0027 (  0.0%)  Safe Stack instrumentation pass
   0.0022 (  0.0%)   0.0004 (  0.0%)   0.0027 (  0.0%)   0.0027 (  0.0%)  X86 Insert Cache Prefetches
   0.0022 (  0.0%)   0.0004 (  0.0%)   0.0026 (  0.0%)   0.0026 (  0.0%)  Shadow Stack GC Lowering
   0.0021 (  0.0%)   0.0004 (  0.0%)   0.0025 (  0.0%)   0.0025 (  0.0%)  Lower Garbage Collection Instructions
   0.0020 (  0.0%)   0.0004 (  0.0%)   0.0023 (  0.0%)   0.0024 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0015 (  0.0%)   0.0003 (  0.0%)   0.0018 (  0.0%)   0.0019 (  0.0%)  LowerSIMDLoop Pass
   0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0017 (  0.0%)  Pre-ISel Intrinsic Lowering #2
   0.0013 (  0.0%)   0.0003 (  0.0%)   0.0015 (  0.0%)   0.0016 (  0.0%)  Rewrite Symbols
   0.0012 (  0.0%)   0.0003 (  0.0%)   0.0015 (  0.0%)   0.0015 (  0.0%)  A No-Op Barrier Pass
   0.0011 (  0.0%)   0.0002 (  0.0%)   0.0014 (  0.0%)   0.0014 (  0.0%)  LowerSIMDLoop Pass #2
   0.0005 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0007 (  0.0%)  Create Garbage Collector Module Metadata
   0.0005 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0007 (  0.0%)  Machine Module Information
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Target Pass Configuration
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Machine Branch Probability Analysis
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Profile summary info
   0.0005 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Target Library Information
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Target Transform Information
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)  LowerSIMDLoop Pass #3
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Dominator Tree Construction #12
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Dominator Tree Construction #6
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Natural Loop Information #2
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Natural Loop Information #7
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Remove non-integral address space. #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  A No-Op Barrier Pass #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Profile summary info #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
  63.1949 (100.0%)   9.1490 (100.0%)  72.3439 (100.0%)  73.0409 (100.0%)  Total

76.46user 11.40system 1:28.76elapsed 98%CPU (0avgtext+0avgdata 1109236maxresident)k
2080inputs+196216outputs (13major+383548minor)pagefaults 0swaps

LLVM 9

.ji generation

[vchuravy@thor base]$ export JULIA_LLVM_ARGS="-time-passes"
[vchuravy@thor base]$ cd /home/vchuravy/src/julia/base && JULIA_BINDIR=/home/vchuravy/builds/julia/usr/bin  time /home/vchuravy/builds/julia/usr/bin/julia -g1 -O0 -C "native" --output-ji /home/vchuravy/builds/julia/usr/lib/julia/sys.ji.tmp  --startup-file=no --warn-overwrite=yes --sysimage /home/vchuravy/builds/julia/usr/lib/julia/corecompiler.ji sysimg.jl ../../../builds/julia/base/
Base  ─────────── 23.812777 seconds
Base64  ─────────  3.484582 seconds
CRC32c  ─────────  0.007066 seconds
SHA  ────────────  0.178218 seconds
FileWatching  ───  0.094226 seconds
Unicode  ────────  0.005555 seconds
Mmap  ───────────  0.073344 seconds
Serialization  ──  0.345001 seconds
Libdl  ──────────  0.001368 seconds
Printf  ─────────  0.244708 seconds
Markdown  ───────  1.131422 seconds
LibGit2  ────────  1.648734 seconds
Logging  ────────  0.044262 seconds
Sockets  ────────  0.344069 seconds
Profile  ────────  0.218597 seconds
Dates  ──────────  2.130254 seconds
DelimitedFiles  ─  0.093202 seconds
Random  ─────────  0.499236 seconds
UUIDs  ──────────  0.012732 seconds
Future  ─────────  0.003969 seconds
LinearAlgebra  ──  8.142045 seconds
SparseArrays  ───  3.510902 seconds
SuiteSparse  ────  0.751374 seconds
Distributed  ────  0.791981 seconds
SharedArrays  ───  0.126962 seconds
Pkg  ──────────── 10.722990 seconds
Test  ───────────  0.251861 seconds
REPL  ───────────  0.000157 seconds
Statistics  ─────  0.175373 seconds
Stdlibs total  ── 35.050308 seconds
Sysimage built. Summary:
Total ───────  58.865508 seconds 
Base: ───────  23.812777 seconds 40.4529%
Stdlibs: ────  35.050308 seconds 59.543%
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 21.5692 seconds (21.6326 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   6.1113 ( 35.3%)   2.7502 ( 64.8%)   8.8615 ( 41.1%)   8.8903 ( 41.1%)  X86 Assembly Printer
   7.3960 ( 42.7%)   0.9386 ( 22.1%)   8.3346 ( 38.6%)   8.3583 ( 38.6%)  X86 DAG->DAG Instruction Selection
   0.9197 (  5.3%)   0.0840 (  2.0%)   1.0037 (  4.7%)   1.0092 (  4.7%)  Late Lower GCFrame Pass
   0.4049 (  2.3%)   0.0510 (  1.2%)   0.4559 (  2.1%)   0.4561 (  2.1%)  Fast Register Allocator
   0.2728 (  1.6%)   0.0503 (  1.2%)   0.3231 (  1.5%)   0.3230 (  1.5%)  Simplify the CFG
   0.2382 (  1.4%)   0.0304 (  0.7%)   0.2686 (  1.2%)   0.2685 (  1.2%)  Prologue/Epilogue Insertion & Frame Finalization
   0.1944 (  1.1%)   0.0202 (  0.5%)   0.2146 (  1.0%)   0.2152 (  1.0%)  Live DEBUG_VALUE analysis
   0.1219 (  0.7%)   0.0169 (  0.4%)   0.1388 (  0.6%)   0.1393 (  0.6%)  Two-Address instruction pass
   0.0987 (  0.6%)   0.0146 (  0.3%)   0.1133 (  0.5%)   0.1134 (  0.5%)  Final GC intrinsic lowering pass
   0.0944 (  0.5%)   0.0172 (  0.4%)   0.1116 (  0.5%)   0.1118 (  0.5%)  Inliner for always_inline functions
   0.0879 (  0.5%)   0.0158 (  0.4%)   0.1037 (  0.5%)   0.1038 (  0.5%)  MemCpy Optimization
   0.0667 (  0.4%)   0.0138 (  0.3%)   0.0805 (  0.4%)   0.0805 (  0.4%)  Insert stack protectors
   0.0700 (  0.4%)   0.0094 (  0.2%)   0.0794 (  0.4%)   0.0795 (  0.4%)  Dominator Tree Construction #2
   0.0614 (  0.4%)   0.0149 (  0.4%)   0.0763 (  0.4%)   0.0764 (  0.4%)  Dominator Tree Construction #4
   0.0517 (  0.3%)   0.0216 (  0.5%)   0.0733 (  0.3%)   0.0730 (  0.3%)  Free MachineFunction
   0.0601 (  0.3%)   0.0104 (  0.2%)   0.0705 (  0.3%)   0.0707 (  0.3%)  Dominator Tree Construction
   0.0580 (  0.3%)   0.0084 (  0.2%)   0.0664 (  0.3%)   0.0664 (  0.3%)  MachineDominator Tree Construction
   0.0499 (  0.3%)   0.0073 (  0.2%)   0.0572 (  0.3%)   0.0573 (  0.3%)  Dominator Tree Construction #3
   0.0466 (  0.3%)   0.0074 (  0.2%)   0.0541 (  0.3%)   0.0540 (  0.2%)  Eliminate PHI nodes for register allocation
   0.0445 (  0.3%)   0.0072 (  0.2%)   0.0517 (  0.2%)   0.0517 (  0.2%)  Expand Atomic instructions
   0.0445 (  0.3%)   0.0061 (  0.1%)   0.0506 (  0.2%)   0.0506 (  0.2%)  Post-RA pseudo instruction expansion pass
   0.0432 (  0.2%)   0.0074 (  0.2%)   0.0506 (  0.2%)   0.0505 (  0.2%)  CallGraph Construction
   0.0395 (  0.2%)   0.0055 (  0.1%)   0.0449 (  0.2%)   0.0449 (  0.2%)  LowerPTLS Pass
   0.0388 (  0.2%)   0.0061 (  0.1%)   0.0450 (  0.2%)   0.0448 (  0.2%)  X86 EFLAGS copy lowering
   0.0388 (  0.2%)   0.0048 (  0.1%)   0.0436 (  0.2%)   0.0438 (  0.2%)  X86 vzeroupper inserter
   0.0354 (  0.2%)   0.0060 (  0.1%)   0.0414 (  0.2%)   0.0413 (  0.2%)  Finalize ISel and expand pseudo-instructions
   0.0322 (  0.2%)   0.0043 (  0.1%)   0.0365 (  0.2%)   0.0366 (  0.2%)  X86 pseudo instruction expansion pass
   0.0290 (  0.2%)   0.0060 (  0.1%)   0.0349 (  0.2%)   0.0349 (  0.2%)  Merge Duplicate Global Constants
   0.0297 (  0.2%)   0.0041 (  0.1%)   0.0337 (  0.2%)   0.0339 (  0.2%)  Check CFA info and insert CFI instructions if needed
   0.0274 (  0.2%)   0.0043 (  0.1%)   0.0318 (  0.1%)   0.0316 (  0.1%)  Scalarize Masked Memory Intrinsics
   0.0263 (  0.2%)   0.0045 (  0.1%)   0.0308 (  0.1%)   0.0308 (  0.1%)  Function Alias Analysis Results
   0.0258 (  0.1%)   0.0041 (  0.1%)   0.0299 (  0.1%)   0.0299 (  0.1%)  Expand reduction intrinsics
   0.0207 (  0.1%)   0.0048 (  0.1%)   0.0255 (  0.1%)   0.0252 (  0.1%)  Exception handling preparation
   0.0213 (  0.1%)   0.0038 (  0.1%)   0.0251 (  0.1%)   0.0252 (  0.1%)  GC Invariant Verification Pass
   0.0203 (  0.1%)   0.0032 (  0.1%)   0.0235 (  0.1%)   0.0236 (  0.1%)  Remove unreachable blocks from the CFG
   0.0183 (  0.1%)   0.0031 (  0.1%)   0.0215 (  0.1%)   0.0215 (  0.1%)  Remove non-integral address space.
   0.0144 (  0.1%)   0.0045 (  0.1%)   0.0189 (  0.1%)   0.0189 (  0.1%)  Assumption Cache Tracker
   0.0156 (  0.1%)   0.0027 (  0.1%)   0.0183 (  0.1%)   0.0184 (  0.1%)  Bundle Machine CFG Edges
   0.0148 (  0.1%)   0.0029 (  0.1%)   0.0177 (  0.1%)   0.0178 (  0.1%)  X86 Indirect Branch Tracking
   0.0142 (  0.1%)   0.0025 (  0.1%)   0.0167 (  0.1%)   0.0168 (  0.1%)  Expand indirectbr instructions
   0.0133 (  0.1%)   0.0022 (  0.1%)   0.0156 (  0.1%)   0.0157 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0127 (  0.1%)   0.0021 (  0.1%)   0.0149 (  0.1%)   0.0150 (  0.1%)  Basic Alias Analysis (stateless AA impl) #2
   0.0123 (  0.1%)   0.0023 (  0.1%)   0.0146 (  0.1%)   0.0147 (  0.1%)  X86 PIC Global Base Reg Initialization
   0.0118 (  0.1%)   0.0020 (  0.0%)   0.0138 (  0.1%)   0.0139 (  0.1%)  Phi Values Analysis
   0.0114 (  0.1%)   0.0022 (  0.1%)   0.0136 (  0.1%)   0.0137 (  0.1%)  Machine Optimization Remark Emitter
   0.0112 (  0.1%)   0.0022 (  0.1%)   0.0134 (  0.1%)   0.0136 (  0.1%)  Insert fentry calls
   0.0115 (  0.1%)   0.0020 (  0.0%)   0.0135 (  0.1%)   0.0135 (  0.1%)  Memory Dependence Analysis
   0.0103 (  0.1%)   0.0024 (  0.1%)   0.0126 (  0.1%)   0.0129 (  0.1%)  Machine Optimization Remark Emitter #2
   0.0105 (  0.1%)   0.0020 (  0.0%)   0.0126 (  0.1%)   0.0128 (  0.1%)  Lower Julia Exception Handlers
   0.0106 (  0.1%)   0.0021 (  0.0%)   0.0127 (  0.1%)   0.0127 (  0.1%)  Contiguously Lay Out Funclets
   0.0103 (  0.1%)   0.0020 (  0.0%)   0.0123 (  0.1%)   0.0125 (  0.1%)  Insert XRay ops
   0.0104 (  0.1%)   0.0020 (  0.0%)   0.0123 (  0.1%)   0.0125 (  0.1%)  X86 speculative load hardening
   0.0101 (  0.1%)   0.0020 (  0.0%)   0.0121 (  0.1%)   0.0121 (  0.1%)  Implement the 'patchable-function' attribute
   0.0101 (  0.1%)   0.0019 (  0.0%)   0.0120 (  0.1%)   0.0121 (  0.1%)  X86 FP Stackifier
   0.0100 (  0.1%)   0.0020 (  0.0%)   0.0120 (  0.1%)   0.0120 (  0.1%)  StackMap Liveness Analysis
   0.0099 (  0.1%)   0.0020 (  0.0%)   0.0119 (  0.1%)   0.0119 (  0.1%)  X86 Retpoline Thunks
   0.0095 (  0.1%)   0.0023 (  0.1%)   0.0118 (  0.1%)   0.0119 (  0.1%)  Lazy Machine Block Frequency Analysis #2
   0.0099 (  0.1%)   0.0019 (  0.0%)   0.0118 (  0.1%)   0.0119 (  0.1%)  Analyze Machine Code For Garbage Collection
   0.0098 (  0.1%)   0.0019 (  0.0%)   0.0117 (  0.1%)   0.0119 (  0.1%)  Local Stack Slot Allocation
   0.0097 (  0.1%)   0.0019 (  0.0%)   0.0116 (  0.1%)   0.0119 (  0.1%)  Lazy Machine Block Frequency Analysis
   0.0101 (  0.1%)   0.0017 (  0.0%)   0.0118 (  0.1%)   0.0118 (  0.1%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining)
   0.0098 (  0.1%)   0.0019 (  0.0%)   0.0117 (  0.1%)   0.0118 (  0.1%)  X86 WinAlloca Expander
   0.0096 (  0.1%)   0.0019 (  0.0%)   0.0114 (  0.1%)   0.0116 (  0.1%)  X86 Discriminate Memory Operands
   0.0091 (  0.1%)   0.0020 (  0.0%)   0.0111 (  0.1%)   0.0113 (  0.1%)  Safe Stack instrumentation pass
   0.0090 (  0.1%)   0.0018 (  0.0%)   0.0108 (  0.1%)   0.0109 (  0.1%)  X86 Insert Cache Prefetches
   0.0090 (  0.1%)   0.0015 (  0.0%)   0.0106 (  0.0%)   0.0108 (  0.0%)  Shadow Stack GC Lowering
   0.0090 (  0.1%)   0.0016 (  0.0%)   0.0105 (  0.0%)   0.0106 (  0.0%)  Lower Garbage Collection Instructions
   0.0078 (  0.0%)   0.0012 (  0.0%)   0.0090 (  0.0%)   0.0091 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0063 (  0.0%)   0.0010 (  0.0%)   0.0073 (  0.0%)   0.0073 (  0.0%)  LowerSIMDLoop Pass
   0.0058 (  0.0%)   0.0009 (  0.0%)   0.0067 (  0.0%)   0.0068 (  0.0%)  Rewrite Symbols
   0.0054 (  0.0%)   0.0009 (  0.0%)   0.0063 (  0.0%)   0.0064 (  0.0%)  A No-Op Barrier Pass
   0.0051 (  0.0%)   0.0008 (  0.0%)   0.0059 (  0.0%)   0.0059 (  0.0%)  LowerSIMDLoop Pass #2
   0.0022 (  0.0%)   0.0006 (  0.0%)   0.0029 (  0.0%)   0.0028 (  0.0%)  Create Garbage Collector Module Metadata
   0.0021 (  0.0%)   0.0006 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  Target Pass Configuration
   0.0021 (  0.0%)   0.0006 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)  Target Library Information
   0.0021 (  0.0%)   0.0006 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  Machine Module Information
   0.0022 (  0.0%)   0.0006 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)  Machine Branch Probability Analysis
   0.0023 (  0.0%)   0.0004 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  Profile summary info
   0.0020 (  0.0%)   0.0006 (  0.0%)   0.0026 (  0.0%)   0.0028 (  0.0%)  Target Transform Information
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Dominator Tree Construction #5
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Natural Loop Information
  17.3260 (100.0%)   4.2432 (100.0%)  21.5692 (100.0%)  21.6326 (100.0%)  Total

66.21user 6.11system 1:12.75elapsed 99%CPU (0avgtext+0avgdata 807512maxresident)k
0inputs+157952outputs (0major+297156minor)pagefaults 0swaps

.o generations

[vchuravy@thor base]$ JULIA_BINDIR=/home/vchuravy/builds/julia/usr/bin  time /home/vchuravy/builds/julia/usr/bin/julia -O3 -C "native" --output-o /home/vchuravy/builds/julia/usr/lib/julia/sys-o.a.tmp  --startup-file=no --warn-overwrite=yes --sysimage /home/vchuravy/builds/julia/usr/lib/julia/sys.ji /home/vchuravy/src/julia/contrib/generate_precompile.jl 0
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 67.6611 seconds (68.3778 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  10.0683 ( 17.2%)   1.6105 ( 17.7%)  11.6787 ( 17.3%)  11.7260 ( 17.1%)  X86 DAG->DAG Instruction Selection #2
   5.4392 (  9.3%)   1.4493 ( 16.0%)   6.8884 ( 10.2%)   6.9174 ( 10.1%)  X86 Assembly Printer #2
   2.3136 (  3.9%)   1.2517 ( 13.8%)   3.5653 (  5.3%)   3.5721 (  5.2%)  X86 Assembly Printer
   3.0015 (  5.1%)   0.4392 (  4.8%)   3.4407 (  5.1%)   3.4494 (  5.0%)  X86 DAG->DAG Instruction Selection
   2.1803 (  3.7%)   0.0821 (  0.9%)   2.2624 (  3.3%)   2.3585 (  3.4%)  Combine redundant instructions
   2.1605 (  3.7%)   0.1059 (  1.2%)   2.2664 (  3.3%)   2.3561 (  3.4%)  Global Value Numbering #2
   2.1534 (  3.7%)   0.1526 (  1.7%)   2.3060 (  3.4%)   2.3126 (  3.4%)  Greedy Register Allocator
   2.1161 (  3.6%)   0.1285 (  1.4%)   2.2447 (  3.3%)   2.2630 (  3.3%)  Global Value Numbering
   1.6381 (  2.8%)   0.1168 (  1.3%)   1.7549 (  2.6%)   1.7556 (  2.6%)  Combine redundant instructions #4
   1.1605 (  2.0%)   0.1428 (  1.6%)   1.3033 (  1.9%)   1.3069 (  1.9%)  Machine Instruction Scheduler
   0.9392 (  1.6%)   0.0579 (  0.6%)   0.9971 (  1.5%)   1.0411 (  1.5%)  Late Lower GCFrame Pass #2
   0.8744 (  1.5%)   0.0821 (  0.9%)   0.9565 (  1.4%)   0.9668 (  1.4%)  Combine redundant instructions #2
   0.8447 (  1.4%)   0.0609 (  0.7%)   0.9055 (  1.3%)   0.9075 (  1.3%)  ReachingDefAnalysis
   0.6862 (  1.2%)   0.0662 (  0.7%)   0.7524 (  1.1%)   0.7612 (  1.1%)  Combine redundant instructions #3
   0.7061 (  1.2%)   0.0387 (  0.4%)   0.7449 (  1.1%)   0.7457 (  1.1%)  Loop Strength Reduction
   0.6268 (  1.1%)   0.0742 (  0.8%)   0.7009 (  1.0%)   0.7029 (  1.0%)  Live Variable Analysis
   0.5959 (  1.0%)   0.0730 (  0.8%)   0.6690 (  1.0%)   0.6686 (  1.0%)  CodeGen Prepare
   0.5207 (  0.9%)   0.1266 (  1.4%)   0.6473 (  1.0%)   0.6493 (  0.9%)  Module Verifier #2
   0.5101 (  0.9%)   0.0436 (  0.5%)   0.5536 (  0.8%)   0.5577 (  0.8%)  Induction Variable Simplification
   0.4502 (  0.8%)   0.0468 (  0.5%)   0.4970 (  0.7%)   0.4985 (  0.7%)  Live Interval Analysis
   0.4149 (  0.7%)   0.0622 (  0.7%)   0.4771 (  0.7%)   0.4768 (  0.7%)  Module Verifier
   0.3896 (  0.7%)   0.0460 (  0.5%)   0.4356 (  0.6%)   0.4411 (  0.6%)  SLP Vectorizer
   0.4033 (  0.7%)   0.0351 (  0.4%)   0.4383 (  0.6%)   0.4392 (  0.6%)  Late Lower GCFrame Pass
   0.3969 (  0.7%)   0.0374 (  0.4%)   0.4342 (  0.6%)   0.4354 (  0.6%)  Simple Register Coalescing
   0.3109 (  0.5%)   0.0426 (  0.5%)   0.3535 (  0.5%)   0.3541 (  0.5%)  Machine Common Subexpression Elimination
   0.2878 (  0.5%)   0.0436 (  0.5%)   0.3313 (  0.5%)   0.3314 (  0.5%)  Prologue/Epilogue Insertion & Frame Finalization #2
   0.2572 (  0.4%)   0.0488 (  0.5%)   0.3060 (  0.5%)   0.3199 (  0.5%)  Early CSE
   0.2866 (  0.5%)   0.0295 (  0.3%)   0.3161 (  0.5%)   0.3191 (  0.5%)  Loop Invariant Code Motion
   0.2569 (  0.4%)   0.0249 (  0.3%)   0.2818 (  0.4%)   0.3006 (  0.4%)  Simplify the CFG #2
   0.2526 (  0.4%)   0.0196 (  0.2%)   0.2722 (  0.4%)   0.2722 (  0.4%)  Induction Variable Users
   0.2225 (  0.4%)   0.0293 (  0.3%)   0.2518 (  0.4%)   0.2615 (  0.4%)  Jump Threading
   0.2103 (  0.4%)   0.0203 (  0.2%)   0.2305 (  0.3%)   0.2461 (  0.4%)  SROA
   0.2174 (  0.4%)   0.0180 (  0.2%)   0.2354 (  0.3%)   0.2374 (  0.3%)  Jump Threading #2
   0.2000 (  0.3%)   0.0252 (  0.3%)   0.2252 (  0.3%)   0.2283 (  0.3%)  Dead Store Elimination
   0.1959 (  0.3%)   0.0225 (  0.2%)   0.2185 (  0.3%)   0.2190 (  0.3%)  X86 Byte/Word Instruction Fixup
   0.1952 (  0.3%)   0.0079 (  0.1%)   0.2031 (  0.3%)   0.2112 (  0.3%)  Simplify the CFG #3
   0.1842 (  0.3%)   0.0262 (  0.3%)   0.2104 (  0.3%)   0.2110 (  0.3%)  JuliaMultiVersioning Pass
   0.1778 (  0.3%)   0.0299 (  0.3%)   0.2078 (  0.3%)   0.2100 (  0.3%)  Remove redundant instructions #2
   0.1871 (  0.3%)   0.0192 (  0.2%)   0.2063 (  0.3%)   0.2067 (  0.3%)  Live DEBUG_VALUE analysis #2
   0.1767 (  0.3%)   0.0258 (  0.3%)   0.2025 (  0.3%)   0.2029 (  0.3%)  Peephole Optimizations
   0.1788 (  0.3%)   0.0173 (  0.2%)   0.1961 (  0.3%)   0.1969 (  0.3%)  Machine code sinking
   0.1527 (  0.3%)   0.0320 (  0.4%)   0.1847 (  0.3%)   0.1919 (  0.3%)  Remove redundant instructions
   0.1747 (  0.3%)   0.0130 (  0.1%)   0.1877 (  0.3%)   0.1887 (  0.3%)  Loop Invariant Code Motion #2
   0.1639 (  0.3%)   0.0180 (  0.2%)   0.1819 (  0.3%)   0.1825 (  0.3%)  Branch Probability Basic Block Placement
   0.1620 (  0.3%)   0.0183 (  0.2%)   0.1803 (  0.3%)   0.1810 (  0.3%)  Virtual Register Rewriter
   0.1570 (  0.3%)   0.0130 (  0.1%)   0.1701 (  0.3%)   0.1767 (  0.3%)  Sparse Conditional Constant Propagation #2
   0.1389 (  0.2%)   0.0363 (  0.4%)   0.1752 (  0.3%)   0.1762 (  0.3%)  Insert stack protectors #2
   0.1617 (  0.3%)   0.0126 (  0.1%)   0.1743 (  0.3%)   0.1742 (  0.3%)  Simplify the CFG #6
   0.1510 (  0.3%)   0.0184 (  0.2%)   0.1694 (  0.3%)   0.1698 (  0.2%)  Fast Register Allocator
   0.1447 (  0.2%)   0.0201 (  0.2%)   0.1648 (  0.2%)   0.1655 (  0.2%)  Two-Address instruction pass #2
   0.1468 (  0.3%)   0.0146 (  0.2%)   0.1614 (  0.2%)   0.1636 (  0.2%)  Loop Vectorization
   0.1442 (  0.2%)   0.0163 (  0.2%)   0.1605 (  0.2%)   0.1612 (  0.2%)  Control Flow Optimizer
   0.1316 (  0.2%)   0.0145 (  0.2%)   0.1461 (  0.2%)   0.1554 (  0.2%)  Dead Code Elimination
   0.1337 (  0.2%)   0.0174 (  0.2%)   0.1512 (  0.2%)   0.1526 (  0.2%)  Simplify the CFG #4
   0.1306 (  0.2%)   0.0177 (  0.2%)   0.1484 (  0.2%)   0.1502 (  0.2%)  Aggressive Dead Code Elimination
   0.1306 (  0.2%)   0.0171 (  0.2%)   0.1477 (  0.2%)   0.1493 (  0.2%)  Remove redundant instructions #3
   0.1311 (  0.2%)   0.0172 (  0.2%)   0.1483 (  0.2%)   0.1490 (  0.2%)  Machine Copy Propagation Pass
   0.1154 (  0.2%)   0.0311 (  0.3%)   0.1465 (  0.2%)   0.1470 (  0.2%)  Dominator Tree Construction #23
   0.1117 (  0.2%)   0.0258 (  0.3%)   0.1375 (  0.2%)   0.1432 (  0.2%)  Reassociate expressions
   0.1253 (  0.2%)   0.0169 (  0.2%)   0.1422 (  0.2%)   0.1432 (  0.2%)  Live Range Shrink
   0.1245 (  0.2%)   0.0150 (  0.2%)   0.1394 (  0.2%)   0.1398 (  0.2%)  X86 Execution Dependency Fix
   0.1219 (  0.2%)   0.0119 (  0.1%)   0.1338 (  0.2%)   0.1393 (  0.2%)  Rotate Loops
   0.1198 (  0.2%)   0.0165 (  0.2%)   0.1363 (  0.2%)   0.1380 (  0.2%)  Sparse Conditional Constant Propagation
   0.1218 (  0.2%)   0.0139 (  0.2%)   0.1358 (  0.2%)   0.1374 (  0.2%)  Simplify the CFG #5
   0.1169 (  0.2%)   0.0140 (  0.2%)   0.1309 (  0.2%)   0.1356 (  0.2%)  Final GC intrinsic lowering pass #2
   0.1250 (  0.2%)   0.0099 (  0.1%)   0.1349 (  0.2%)   0.1352 (  0.2%)  MachineDominator Tree Construction #8
   0.1131 (  0.2%)   0.0133 (  0.1%)   0.1264 (  0.2%)   0.1318 (  0.2%)  Recognize loop idioms
   0.1078 (  0.2%)   0.0235 (  0.3%)   0.1313 (  0.2%)   0.1316 (  0.2%)  Simplify the CFG
   0.1131 (  0.2%)   0.0167 (  0.2%)   0.1299 (  0.2%)   0.1305 (  0.2%)  MachinePostDominator Tree Construction #2
   0.1042 (  0.2%)   0.0250 (  0.3%)   0.1292 (  0.2%)   0.1293 (  0.2%)  Branch Probability Analysis #3
   0.1117 (  0.2%)   0.0134 (  0.1%)   0.1251 (  0.2%)   0.1261 (  0.2%)  Eliminate PHI nodes for register allocation #2
   0.1051 (  0.2%)   0.0181 (  0.2%)   0.1232 (  0.2%)   0.1237 (  0.2%)  MachinePostDominator Tree Construction
   0.1082 (  0.2%)   0.0112 (  0.1%)   0.1194 (  0.2%)   0.1201 (  0.2%)  Debug Variable Analysis
   0.1047 (  0.2%)   0.0133 (  0.1%)   0.1180 (  0.2%)   0.1186 (  0.2%)  Machine Copy Propagation Pass #2
   0.0986 (  0.2%)   0.0179 (  0.2%)   0.1165 (  0.2%)   0.1167 (  0.2%)  Expand Atomic instructions #2
   0.1032 (  0.2%)   0.0121 (  0.1%)   0.1153 (  0.2%)   0.1158 (  0.2%)  Early Machine Loop Invariant Code Motion
   0.0941 (  0.2%)   0.0156 (  0.2%)   0.1097 (  0.2%)   0.1102 (  0.2%)  Remove dead machine instructions
   0.0919 (  0.2%)   0.0136 (  0.1%)   0.1055 (  0.2%)   0.1059 (  0.2%)  MachinePostDominator Tree Construction #3
   0.0891 (  0.2%)   0.0103 (  0.1%)   0.0994 (  0.1%)   0.1057 (  0.2%)  Dominator Tree Construction #6
   0.0876 (  0.1%)   0.0116 (  0.1%)   0.0992 (  0.1%)   0.1051 (  0.2%)  Propagate (non-)rootedness information
   0.0898 (  0.2%)   0.0142 (  0.2%)   0.1040 (  0.2%)   0.1045 (  0.2%)  Merge disjoint stack slots
   0.0944 (  0.2%)   0.0084 (  0.1%)   0.1028 (  0.2%)   0.1031 (  0.2%)  Dominator Tree Construction #17
   0.0797 (  0.1%)   0.0181 (  0.2%)   0.0978 (  0.1%)   0.1017 (  0.1%)  Dominator Tree Construction #10
   0.0883 (  0.2%)   0.0066 (  0.1%)   0.0949 (  0.1%)   0.0989 (  0.1%)  Dominator Tree Construction #16
   0.0825 (  0.1%)   0.0133 (  0.1%)   0.0958 (  0.1%)   0.0964 (  0.1%)  Machine InstCombiner
   0.0830 (  0.1%)   0.0090 (  0.1%)   0.0920 (  0.1%)   0.0960 (  0.1%)  Dominator Tree Construction #15
   0.0826 (  0.1%)   0.0129 (  0.1%)   0.0955 (  0.1%)   0.0956 (  0.1%)  Dominator Tree Construction #19
   0.0858 (  0.1%)   0.0086 (  0.1%)   0.0944 (  0.1%)   0.0955 (  0.1%)  Loop Load Elimination
   0.0708 (  0.1%)   0.0233 (  0.3%)   0.0941 (  0.1%)   0.0946 (  0.1%)  Free MachineFunction #2
   0.0801 (  0.1%)   0.0136 (  0.1%)   0.0936 (  0.1%)   0.0937 (  0.1%)  Machine Block Frequency Analysis
   0.0746 (  0.1%)   0.0151 (  0.2%)   0.0897 (  0.1%)   0.0912 (  0.1%)  Dominator Tree Construction #12
   0.0824 (  0.1%)   0.0084 (  0.1%)   0.0908 (  0.1%)   0.0909 (  0.1%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0814 (  0.1%)   0.0059 (  0.1%)   0.0873 (  0.1%)   0.0907 (  0.1%)  Inliner for always_inline functions #2
   0.0745 (  0.1%)   0.0097 (  0.1%)   0.0842 (  0.1%)   0.0898 (  0.1%)  Dominator Tree Construction #7
   0.0740 (  0.1%)   0.0152 (  0.2%)   0.0892 (  0.1%)   0.0897 (  0.1%)  Dominator Tree Construction #24
   0.0808 (  0.1%)   0.0051 (  0.1%)   0.0859 (  0.1%)   0.0891 (  0.1%)  Dominator Tree Construction #8
   0.0779 (  0.1%)   0.0110 (  0.1%)   0.0889 (  0.1%)   0.0888 (  0.1%)  Branch Probability Analysis #2
   0.0772 (  0.1%)   0.0104 (  0.1%)   0.0876 (  0.1%)   0.0881 (  0.1%)  Machine Block Frequency Analysis #2
   0.0750 (  0.1%)   0.0116 (  0.1%)   0.0867 (  0.1%)   0.0871 (  0.1%)  MachineDominator Tree Construction #2
   0.0741 (  0.1%)   0.0107 (  0.1%)   0.0848 (  0.1%)   0.0862 (  0.1%)  Branch Probability Analysis
   0.0750 (  0.1%)   0.0103 (  0.1%)   0.0852 (  0.1%)   0.0860 (  0.1%)  Remove dead machine instructions #2
   0.0720 (  0.1%)   0.0120 (  0.1%)   0.0840 (  0.1%)   0.0852 (  0.1%)  Dominator Tree Construction #14
   0.0779 (  0.1%)   0.0071 (  0.1%)   0.0850 (  0.1%)   0.0851 (  0.1%)  Live DEBUG_VALUE analysis
   0.0846 (  0.1%)   0.0000 (  0.0%)   0.0846 (  0.1%)   0.0847 (  0.1%)  LowerPTLS Pass #2
   0.0776 (  0.1%)   0.0058 (  0.1%)   0.0834 (  0.1%)   0.0838 (  0.1%)  X86 vzeroupper inserter #2
   0.0732 (  0.1%)   0.0103 (  0.1%)   0.0835 (  0.1%)   0.0834 (  0.1%)  Dominator Tree Construction #20
   0.0715 (  0.1%)   0.0107 (  0.1%)   0.0822 (  0.1%)   0.0833 (  0.1%)  Post-Dominator Tree Construction
   0.0774 (  0.1%)   0.0000 (  0.0%)   0.0774 (  0.1%)   0.0816 (  0.1%)  CallGraph Construction #2
   0.0619 (  0.1%)   0.0195 (  0.2%)   0.0814 (  0.1%)   0.0815 (  0.1%)  Lower Julia Exception Handlers #2
   0.0689 (  0.1%)   0.0119 (  0.1%)   0.0808 (  0.1%)   0.0807 (  0.1%)  Constant Hoisting
   0.0739 (  0.1%)   0.0063 (  0.1%)   0.0802 (  0.1%)   0.0803 (  0.1%)  Dominator Tree Construction #18
   0.0789 (  0.1%)   0.0000 (  0.0%)   0.0789 (  0.1%)   0.0791 (  0.1%)  CallGraph Construction #3
   0.0687 (  0.1%)   0.0082 (  0.1%)   0.0768 (  0.1%)   0.0774 (  0.1%)  MachineDominator Tree Construction #7
   0.0670 (  0.1%)   0.0091 (  0.1%)   0.0761 (  0.1%)   0.0765 (  0.1%)  Machine Block Frequency Analysis #3
   0.0681 (  0.1%)   0.0079 (  0.1%)   0.0760 (  0.1%)   0.0764 (  0.1%)  Slot index numbering #2
   0.0676 (  0.1%)   0.0081 (  0.1%)   0.0757 (  0.1%)   0.0763 (  0.1%)  MachineDominator Tree Construction #6
   0.0625 (  0.1%)   0.0121 (  0.1%)   0.0746 (  0.1%)   0.0760 (  0.1%)  Block Frequency Analysis
   0.0640 (  0.1%)   0.0089 (  0.1%)   0.0729 (  0.1%)   0.0757 (  0.1%)  Loop-Closed SSA Form Pass
   0.0627 (  0.1%)   0.0114 (  0.1%)   0.0741 (  0.1%)   0.0755 (  0.1%)  MemCpy Optimization #2
   0.0652 (  0.1%)   0.0093 (  0.1%)   0.0745 (  0.1%)   0.0745 (  0.1%)  Dominator Tree Construction #21
   0.0598 (  0.1%)   0.0136 (  0.1%)   0.0733 (  0.1%)   0.0737 (  0.1%)  Natural Loop Information #16
   0.0650 (  0.1%)   0.0069 (  0.1%)   0.0719 (  0.1%)   0.0726 (  0.1%)  Loop-Closed SSA Form Pass #3
   0.0608 (  0.1%)   0.0107 (  0.1%)   0.0715 (  0.1%)   0.0715 (  0.1%)  Block Frequency Analysis #2
   0.0619 (  0.1%)   0.0092 (  0.1%)   0.0711 (  0.1%)   0.0712 (  0.1%)  Dominator Tree Construction #22
   0.0605 (  0.1%)   0.0093 (  0.1%)   0.0698 (  0.1%)   0.0703 (  0.1%)  Slot index numbering
   0.0595 (  0.1%)   0.0090 (  0.1%)   0.0686 (  0.1%)   0.0696 (  0.1%)  Dominator Tree Construction #13
   0.0550 (  0.1%)   0.0137 (  0.2%)   0.0688 (  0.1%)   0.0692 (  0.1%)  Function Alias Analysis Results #17
   0.0618 (  0.1%)   0.0063 (  0.1%)   0.0681 (  0.1%)   0.0685 (  0.1%)  Stack Slot Coloring
   0.0512 (  0.1%)   0.0142 (  0.2%)   0.0654 (  0.1%)   0.0681 (  0.1%)  SROA #2
   0.0583 (  0.1%)   0.0086 (  0.1%)   0.0669 (  0.1%)   0.0672 (  0.1%)  MachineDominator Tree Construction #3
   0.0589 (  0.1%)   0.0072 (  0.1%)   0.0660 (  0.1%)   0.0664 (  0.1%)  MachineDominator Tree Construction #5
   0.0594 (  0.1%)   0.0053 (  0.1%)   0.0647 (  0.1%)   0.0653 (  0.1%)  Loop-Closed SSA Form Pass #4
   0.0529 (  0.1%)   0.0103 (  0.1%)   0.0632 (  0.1%)   0.0645 (  0.1%)  Natural Loop Information #7
   0.0537 (  0.1%)   0.0099 (  0.1%)   0.0636 (  0.1%)   0.0643 (  0.1%)  Remove unreachable machine basic blocks
   0.0550 (  0.1%)   0.0085 (  0.1%)   0.0635 (  0.1%)   0.0638 (  0.1%)  X86 LEA Optimize
   0.0549 (  0.1%)   0.0075 (  0.1%)   0.0623 (  0.1%)   0.0628 (  0.1%)  Machine Block Frequency Analysis #4
   0.0543 (  0.1%)   0.0078 (  0.1%)   0.0621 (  0.1%)   0.0624 (  0.1%)  MachineDominator Tree Construction #4
   0.0531 (  0.1%)   0.0093 (  0.1%)   0.0624 (  0.1%)   0.0622 (  0.1%)  Scalar Evolution Analysis #8
   0.0544 (  0.1%)   0.0059 (  0.1%)   0.0603 (  0.1%)   0.0608 (  0.1%)  Unswitch loops
   0.0510 (  0.1%)   0.0087 (  0.1%)   0.0596 (  0.1%)   0.0600 (  0.1%)  Machine Natural Loop Construction
   0.0507 (  0.1%)   0.0083 (  0.1%)   0.0590 (  0.1%)   0.0591 (  0.1%)  Natural Loop Information #13
   0.0490 (  0.1%)   0.0091 (  0.1%)   0.0581 (  0.1%)   0.0582 (  0.1%)  Expand memcmp() to load/stores
   0.0524 (  0.1%)   0.0054 (  0.1%)   0.0579 (  0.1%)   0.0578 (  0.1%)  Natural Loop Information #12
   0.0488 (  0.1%)   0.0054 (  0.1%)   0.0542 (  0.1%)   0.0563 (  0.1%)  Natural Loop Information #11
   0.0459 (  0.1%)   0.0096 (  0.1%)   0.0555 (  0.1%)   0.0559 (  0.1%)  Scalar Evolution Analysis #3
   0.0515 (  0.1%)   0.0041 (  0.0%)   0.0556 (  0.1%)   0.0558 (  0.1%)  Unroll loops
   0.0396 (  0.1%)   0.0139 (  0.2%)   0.0535 (  0.1%)   0.0555 (  0.1%)  Scalar Evolution Analysis
   0.0488 (  0.1%)   0.0036 (  0.0%)   0.0524 (  0.1%)   0.0544 (  0.1%)  Natural Loop Information #2
   0.0469 (  0.1%)   0.0066 (  0.1%)   0.0535 (  0.1%)   0.0539 (  0.1%)  Machine Natural Loop Construction #4
   0.0420 (  0.1%)   0.0110 (  0.1%)   0.0530 (  0.1%)   0.0533 (  0.1%)  Exception handling preparation #2
   0.0449 (  0.1%)   0.0066 (  0.1%)   0.0514 (  0.1%)   0.0521 (  0.1%)  Machine Natural Loop Construction #3
   0.0429 (  0.1%)   0.0074 (  0.1%)   0.0503 (  0.1%)   0.0508 (  0.1%)  X86 EFLAGS copy lowering #2
   0.0417 (  0.1%)   0.0087 (  0.1%)   0.0504 (  0.1%)   0.0507 (  0.1%)  Dominator Tree Construction #9
   0.0432 (  0.1%)   0.0067 (  0.1%)   0.0498 (  0.1%)   0.0504 (  0.1%)  PostRA Machine Sink
   0.0430 (  0.1%)   0.0066 (  0.1%)   0.0496 (  0.1%)   0.0497 (  0.1%)  Shrink Wrapping analysis
   0.0433 (  0.1%)   0.0057 (  0.1%)   0.0489 (  0.1%)   0.0494 (  0.1%)  Machine Natural Loop Construction #5
   0.0415 (  0.1%)   0.0069 (  0.1%)   0.0484 (  0.1%)   0.0488 (  0.1%)  X86 cmov Conversion
   0.0371 (  0.1%)   0.0093 (  0.1%)   0.0463 (  0.1%)   0.0487 (  0.1%)  Natural Loop Information #4
   0.0412 (  0.1%)   0.0067 (  0.1%)   0.0479 (  0.1%)   0.0486 (  0.1%)  Natural Loop Information #8
   0.0418 (  0.1%)   0.0064 (  0.1%)   0.0482 (  0.1%)   0.0486 (  0.1%)  BreakFalseDeps
   0.0405 (  0.1%)   0.0067 (  0.1%)   0.0472 (  0.1%)   0.0482 (  0.1%)  Natural Loop Information #9
   0.0424 (  0.1%)   0.0056 (  0.1%)   0.0481 (  0.1%)   0.0481 (  0.1%)  Two-Address instruction pass
   0.0420 (  0.1%)   0.0057 (  0.1%)   0.0477 (  0.1%)   0.0481 (  0.1%)  Machine Loop Invariant Code Motion
   0.0413 (  0.1%)   0.0063 (  0.1%)   0.0476 (  0.1%)   0.0479 (  0.1%)  Post-RA pseudo instruction expansion pass #2
   0.0369 (  0.1%)   0.0087 (  0.1%)   0.0457 (  0.1%)   0.0478 (  0.1%)  Natural Loop Information #5
   0.0385 (  0.1%)   0.0078 (  0.1%)   0.0463 (  0.1%)   0.0474 (  0.1%)  SROA #3
   0.0403 (  0.1%)   0.0050 (  0.1%)   0.0453 (  0.1%)   0.0473 (  0.1%)  Dead Code Elimination #2
   0.0371 (  0.1%)   0.0095 (  0.1%)   0.0466 (  0.1%)   0.0472 (  0.1%)  Scalar Evolution Analysis #4
   0.0385 (  0.1%)   0.0064 (  0.1%)   0.0449 (  0.1%)   0.0466 (  0.1%)  Function Alias Analysis Results #14
   0.0367 (  0.1%)   0.0093 (  0.1%)   0.0460 (  0.1%)   0.0465 (  0.1%)  Scalar Evolution Analysis #5
   0.0393 (  0.1%)   0.0071 (  0.1%)   0.0463 (  0.1%)   0.0464 (  0.1%)  Canonicalize natural loops #6
   0.0369 (  0.1%)   0.0091 (  0.1%)   0.0460 (  0.1%)   0.0462 (  0.1%)  Basic Alias Analysis (stateless AA impl) #19
   0.0396 (  0.1%)   0.0061 (  0.1%)   0.0457 (  0.1%)   0.0458 (  0.1%)  Natural Loop Information #14
   0.0359 (  0.1%)   0.0089 (  0.1%)   0.0449 (  0.1%)   0.0456 (  0.1%)  Scalar Evolution Analysis #7
   0.0382 (  0.1%)   0.0062 (  0.1%)   0.0444 (  0.1%)   0.0448 (  0.1%)  Machine Natural Loop Construction #2
   0.0380 (  0.1%)   0.0057 (  0.1%)   0.0437 (  0.1%)   0.0444 (  0.1%)  X86 LEA Fixup
   0.0374 (  0.1%)   0.0063 (  0.1%)   0.0437 (  0.1%)   0.0443 (  0.1%)  Natural Loop Information #10
   0.0352 (  0.1%)   0.0081 (  0.1%)   0.0433 (  0.1%)   0.0434 (  0.1%)  Function Alias Analysis Results #16
   0.0367 (  0.1%)   0.0065 (  0.1%)   0.0432 (  0.1%)   0.0433 (  0.1%)  Natural Loop Information #15
   0.0360 (  0.1%)   0.0063 (  0.1%)   0.0423 (  0.1%)   0.0429 (  0.1%)  Canonicalize natural loops #4
   0.0357 (  0.1%)   0.0062 (  0.1%)   0.0418 (  0.1%)   0.0421 (  0.1%)  Finalize ISel and expand pseudo-instructions #2
   0.0317 (  0.1%)   0.0087 (  0.1%)   0.0404 (  0.1%)   0.0421 (  0.1%)  Canonicalize natural loops
   0.0326 (  0.1%)   0.0087 (  0.1%)   0.0413 (  0.1%)   0.0419 (  0.1%)  Scalar Evolution Analysis #2
   0.0350 (  0.1%)   0.0061 (  0.1%)   0.0411 (  0.1%)   0.0415 (  0.1%)  Early Tail Duplication
   0.0324 (  0.1%)   0.0085 (  0.1%)   0.0409 (  0.1%)   0.0414 (  0.1%)  Function Alias Analysis Results #12
   0.0318 (  0.1%)   0.0083 (  0.1%)   0.0401 (  0.1%)   0.0411 (  0.1%)  Function Alias Analysis Results #8
   0.0358 (  0.1%)   0.0051 (  0.1%)   0.0409 (  0.1%)   0.0411 (  0.1%)  Function Alias Analysis Results #15
   0.0351 (  0.1%)   0.0056 (  0.1%)   0.0408 (  0.1%)   0.0411 (  0.1%)  Check CFA info and insert CFI instructions if needed #2
   0.0323 (  0.1%)   0.0080 (  0.1%)   0.0403 (  0.1%)   0.0407 (  0.1%)  Scalar Evolution Analysis #6
   0.0275 (  0.0%)   0.0110 (  0.1%)   0.0385 (  0.1%)   0.0404 (  0.1%)  Function Alias Analysis Results #4
   0.0311 (  0.1%)   0.0082 (  0.1%)   0.0393 (  0.1%)   0.0402 (  0.1%)  Function Alias Analysis Results #10
   0.0351 (  0.1%)   0.0037 (  0.0%)   0.0388 (  0.1%)   0.0400 (  0.1%)  Function Alias Analysis Results #2
   0.0311 (  0.1%)   0.0081 (  0.1%)   0.0392 (  0.1%)   0.0399 (  0.1%)  Function Alias Analysis Results #7
   0.0308 (  0.1%)   0.0081 (  0.1%)   0.0389 (  0.1%)   0.0399 (  0.1%)  Function Alias Analysis Results #5
   0.0333 (  0.1%)   0.0066 (  0.1%)   0.0399 (  0.1%)   0.0398 (  0.1%)  Partially inline calls to library functions
   0.0271 (  0.0%)   0.0109 (  0.1%)   0.0380 (  0.1%)   0.0397 (  0.1%)  Function Alias Analysis Results #3
   0.0337 (  0.1%)   0.0054 (  0.1%)   0.0390 (  0.1%)   0.0396 (  0.1%)  X86 pseudo instruction expansion pass #2
   0.0304 (  0.1%)   0.0081 (  0.1%)   0.0384 (  0.1%)   0.0394 (  0.1%)  Function Alias Analysis Results #11
   0.0333 (  0.1%)   0.0053 (  0.1%)   0.0386 (  0.1%)   0.0394 (  0.1%)  X86 Optimize Call Frame
   0.0305 (  0.1%)   0.0081 (  0.1%)   0.0386 (  0.1%)   0.0393 (  0.1%)  Function Alias Analysis Results #13
   0.0319 (  0.1%)   0.0072 (  0.1%)   0.0390 (  0.1%)   0.0389 (  0.1%)  Merge contiguous icmps into a memcmp
   0.0324 (  0.1%)   0.0062 (  0.1%)   0.0386 (  0.1%)   0.0389 (  0.1%)  Live Register Matrix
   0.0300 (  0.1%)   0.0080 (  0.1%)   0.0380 (  0.1%)   0.0389 (  0.1%)  Function Alias Analysis Results #6
   0.0297 (  0.1%)   0.0083 (  0.1%)   0.0380 (  0.1%)   0.0388 (  0.1%)  Function Alias Analysis Results #9
   0.0298 (  0.1%)   0.0069 (  0.1%)   0.0367 (  0.1%)   0.0375 (  0.1%)  Canonicalize natural loops #2
   0.0304 (  0.1%)   0.0064 (  0.1%)   0.0367 (  0.1%)   0.0368 (  0.1%)  Interleaved Access Pass
   0.0324 (  0.1%)   0.0041 (  0.0%)   0.0365 (  0.1%)   0.0365 (  0.1%)  Hoist/decompose integer division and remainder
   0.0308 (  0.1%)   0.0042 (  0.0%)   0.0350 (  0.1%)   0.0351 (  0.1%)  Dominator Tree Construction #2
   0.0301 (  0.1%)   0.0047 (  0.1%)   0.0348 (  0.1%)   0.0350 (  0.1%)  Remove unreachable blocks from the CFG #2
   0.0294 (  0.1%)   0.0051 (  0.1%)   0.0345 (  0.1%)   0.0345 (  0.1%)  Final GC intrinsic lowering pass
   0.0276 (  0.0%)   0.0053 (  0.1%)   0.0330 (  0.0%)   0.0338 (  0.0%)  Loop-Closed SSA Form Pass #2
   0.0270 (  0.0%)   0.0058 (  0.1%)   0.0327 (  0.0%)   0.0328 (  0.0%)  Expand reduction intrinsics #2
   0.0265 (  0.0%)   0.0055 (  0.1%)   0.0320 (  0.0%)   0.0327 (  0.0%)  Canonicalize natural loops #5
   0.0242 (  0.0%)   0.0061 (  0.1%)   0.0303 (  0.0%)   0.0326 (  0.0%)  Basic Alias Analysis (stateless AA impl) #3
   0.0263 (  0.0%)   0.0062 (  0.1%)   0.0325 (  0.0%)   0.0326 (  0.0%)  Inliner for always_inline functions
   0.0263 (  0.0%)   0.0055 (  0.1%)   0.0318 (  0.0%)   0.0323 (  0.0%)  Machine Trace Metrics
   0.0268 (  0.0%)   0.0044 (  0.0%)   0.0312 (  0.0%)   0.0317 (  0.0%)  Tail Duplication
   0.0260 (  0.0%)   0.0057 (  0.1%)   0.0316 (  0.0%)   0.0317 (  0.0%)  Scalarize Masked Memory Intrinsics #2
   0.0264 (  0.0%)   0.0051 (  0.1%)   0.0316 (  0.0%)   0.0315 (  0.0%)  MemCpy Optimization
   0.0260 (  0.0%)   0.0053 (  0.1%)   0.0313 (  0.0%)   0.0313 (  0.0%)  Natural Loop Information #3
   0.0258 (  0.0%)   0.0051 (  0.1%)   0.0309 (  0.0%)   0.0313 (  0.0%)  Post RA top-down list latency scheduler
   0.0247 (  0.0%)   0.0056 (  0.1%)   0.0303 (  0.0%)   0.0311 (  0.0%)  Canonicalize natural loops #3
   0.0290 (  0.0%)   0.0000 (  0.0%)   0.0290 (  0.0%)   0.0309 (  0.0%)  Merge Duplicate Global Constants #2
   0.0256 (  0.0%)   0.0047 (  0.1%)   0.0303 (  0.0%)   0.0306 (  0.0%)  Spill Code Placement Analysis
   0.0246 (  0.0%)   0.0050 (  0.1%)   0.0296 (  0.0%)   0.0304 (  0.0%)  Promote heap allocation to stack #3
   0.0248 (  0.0%)   0.0040 (  0.0%)   0.0288 (  0.0%)   0.0302 (  0.0%)  Basic Alias Analysis (stateless AA impl) #15
   0.0229 (  0.0%)   0.0069 (  0.1%)   0.0298 (  0.0%)   0.0300 (  0.0%)  Dominator Tree Construction #4
   0.0265 (  0.0%)   0.0022 (  0.0%)   0.0287 (  0.0%)   0.0298 (  0.0%)  Promote heap allocation to stack
   0.0255 (  0.0%)   0.0036 (  0.0%)   0.0291 (  0.0%)   0.0291 (  0.0%)  Basic Alias Analysis (stateless AA impl) #16
   0.0236 (  0.0%)   0.0055 (  0.1%)   0.0290 (  0.0%)   0.0291 (  0.0%)  Basic Alias Analysis (stateless AA impl) #17
   0.0238 (  0.0%)   0.0045 (  0.0%)   0.0283 (  0.0%)   0.0287 (  0.0%)  X86 Indirect Branch Tracking #2
   0.0224 (  0.0%)   0.0040 (  0.0%)   0.0264 (  0.0%)   0.0267 (  0.0%)  Bundle Machine CFG Edges #2
   0.0218 (  0.0%)   0.0043 (  0.0%)   0.0261 (  0.0%)   0.0262 (  0.0%)  Dominator Tree Construction
   0.0227 (  0.0%)   0.0024 (  0.0%)   0.0252 (  0.0%)   0.0260 (  0.0%)  Basic Alias Analysis (stateless AA impl) #4
   0.0201 (  0.0%)   0.0052 (  0.1%)   0.0254 (  0.0%)   0.0259 (  0.0%)  Basic Alias Analysis (stateless AA impl) #12
   0.0177 (  0.0%)   0.0070 (  0.1%)   0.0247 (  0.0%)   0.0258 (  0.0%)  Basic Alias Analysis (stateless AA impl) #5
   0.0214 (  0.0%)   0.0032 (  0.0%)   0.0245 (  0.0%)   0.0257 (  0.0%)  Memory Dependence Analysis #5
   0.0185 (  0.0%)   0.0060 (  0.1%)   0.0245 (  0.0%)   0.0256 (  0.0%)  Promote heap allocation to stack #2
   0.0158 (  0.0%)   0.0098 (  0.1%)   0.0255 (  0.0%)   0.0255 (  0.0%)  Free MachineFunction
   0.0195 (  0.0%)   0.0051 (  0.1%)   0.0246 (  0.0%)   0.0253 (  0.0%)  Basic Alias Analysis (stateless AA impl) #11
   0.0209 (  0.0%)   0.0037 (  0.0%)   0.0246 (  0.0%)   0.0250 (  0.0%)  X86 Fixup SetCC
   0.0219 (  0.0%)   0.0030 (  0.0%)   0.0249 (  0.0%)   0.0250 (  0.0%)  MachineDominator Tree Construction
   0.0208 (  0.0%)   0.0037 (  0.0%)   0.0245 (  0.0%)   0.0250 (  0.0%)  Process Implicit Definitions
   0.0207 (  0.0%)   0.0036 (  0.0%)   0.0243 (  0.0%)   0.0249 (  0.0%)  Bundle Machine CFG Edges #3
   0.0200 (  0.0%)   0.0046 (  0.1%)   0.0246 (  0.0%)   0.0249 (  0.0%)  Basic Alias Analysis (stateless AA impl) #18
   0.0200 (  0.0%)   0.0041 (  0.0%)   0.0241 (  0.0%)   0.0245 (  0.0%)  Optimize machine instruction PHIs
   0.0199 (  0.0%)   0.0043 (  0.0%)   0.0242 (  0.0%)   0.0245 (  0.0%)  Promote heap allocation to stack #4
   0.0166 (  0.0%)   0.0076 (  0.1%)   0.0242 (  0.0%)   0.0243 (  0.0%)  GC Invariant Verification Pass #2
   0.0195 (  0.0%)   0.0047 (  0.1%)   0.0242 (  0.0%)   0.0243 (  0.0%)  Expand indirectbr instructions #2
   0.0186 (  0.0%)   0.0048 (  0.1%)   0.0234 (  0.0%)   0.0240 (  0.0%)  Basic Alias Analysis (stateless AA impl) #10
   0.0187 (  0.0%)   0.0045 (  0.0%)   0.0232 (  0.0%)   0.0236 (  0.0%)  Memory Dependence Analysis #2
   0.0183 (  0.0%)   0.0047 (  0.1%)   0.0230 (  0.0%)   0.0230 (  0.0%)  Insert stack protectors
   0.0194 (  0.0%)   0.0034 (  0.0%)   0.0228 (  0.0%)   0.0229 (  0.0%)  Dominator Tree Construction #3
   0.0192 (  0.0%)   0.0036 (  0.0%)   0.0228 (  0.0%)   0.0224 (  0.0%)  Live Stack Slot Analysis
   0.0170 (  0.0%)   0.0046 (  0.1%)   0.0216 (  0.0%)   0.0221 (  0.0%)  Basic Alias Analysis (stateless AA impl) #14
   0.0181 (  0.0%)   0.0036 (  0.0%)   0.0217 (  0.0%)   0.0220 (  0.0%)  Virtual Register Map
   0.0173 (  0.0%)   0.0036 (  0.0%)   0.0209 (  0.0%)   0.0216 (  0.0%)  Local Dynamic TLS Access Clean-up
   0.0178 (  0.0%)   0.0036 (  0.0%)   0.0214 (  0.0%)   0.0216 (  0.0%)  Machine Optimization Remark Emitter #3
   0.0173 (  0.0%)   0.0034 (  0.0%)   0.0206 (  0.0%)   0.0213 (  0.0%)  Insert fentry calls #2
   0.0160 (  0.0%)   0.0044 (  0.0%)   0.0204 (  0.0%)   0.0211 (  0.0%)  Basic Alias Analysis (stateless AA impl) #13
   0.0142 (  0.0%)   0.0058 (  0.1%)   0.0200 (  0.0%)   0.0210 (  0.0%)  Basic Alias Analysis (stateless AA impl) #6
   0.0160 (  0.0%)   0.0044 (  0.0%)   0.0204 (  0.0%)   0.0208 (  0.0%)  Basic Alias Analysis (stateless AA impl) #7
   0.0172 (  0.0%)   0.0028 (  0.0%)   0.0200 (  0.0%)   0.0207 (  0.0%)  Phi Values Analysis #5
   0.0157 (  0.0%)   0.0043 (  0.0%)   0.0200 (  0.0%)   0.0207 (  0.0%)  Basic Alias Analysis (stateless AA impl) #8
   0.0167 (  0.0%)   0.0036 (  0.0%)   0.0202 (  0.0%)   0.0205 (  0.0%)  Early If-Conversion
   0.0165 (  0.0%)   0.0033 (  0.0%)   0.0198 (  0.0%)   0.0204 (  0.0%)  Machine Optimization Remark Emitter #4
   0.0161 (  0.0%)   0.0036 (  0.0%)   0.0198 (  0.0%)   0.0203 (  0.0%)  Machine Optimization Remark Emitter #5
   0.0160 (  0.0%)   0.0039 (  0.0%)   0.0199 (  0.0%)   0.0203 (  0.0%)  Lazy Value Information Analysis #2
   0.0180 (  0.0%)   0.0022 (  0.0%)   0.0202 (  0.0%)   0.0203 (  0.0%)  Combine mul and add to muladd
   0.0159 (  0.0%)   0.0032 (  0.0%)   0.0192 (  0.0%)   0.0197 (  0.0%)  Implement the 'patchable-function' attribute #2
   0.0159 (  0.0%)   0.0034 (  0.0%)   0.0193 (  0.0%)   0.0196 (  0.0%)  X86 Domain Reassignment Pass
   0.0156 (  0.0%)   0.0032 (  0.0%)   0.0188 (  0.0%)   0.0196 (  0.0%)  Rename Disconnected Subregister Components
   0.0156 (  0.0%)   0.0032 (  0.0%)   0.0188 (  0.0%)   0.0193 (  0.0%)  X86 Atom pad short functions
   0.0157 (  0.0%)   0.0032 (  0.0%)   0.0189 (  0.0%)   0.0191 (  0.0%)  X86 Retpoline Thunks #2
   0.0132 (  0.0%)   0.0049 (  0.1%)   0.0182 (  0.0%)   0.0191 (  0.0%)  Lazy Value Information Analysis
   0.0153 (  0.0%)   0.0030 (  0.0%)   0.0184 (  0.0%)   0.0190 (  0.0%)  Contiguously Lay Out Funclets #2
   0.0146 (  0.0%)   0.0037 (  0.0%)   0.0183 (  0.0%)   0.0189 (  0.0%)  Phi Values Analysis #2
   0.0152 (  0.0%)   0.0031 (  0.0%)   0.0183 (  0.0%)   0.0188 (  0.0%)  Insert XRay ops #2
   0.0152 (  0.0%)   0.0030 (  0.0%)   0.0183 (  0.0%)   0.0187 (  0.0%)  X86 FP Stackifier #2
   0.0150 (  0.0%)   0.0031 (  0.0%)   0.0181 (  0.0%)   0.0186 (  0.0%)  X86 speculative load hardening #2
   0.0150 (  0.0%)   0.0032 (  0.0%)   0.0182 (  0.0%)   0.0186 (  0.0%)  X86 PIC Global Base Reg Initialization #2
   0.0148 (  0.0%)   0.0032 (  0.0%)   0.0181 (  0.0%)   0.0186 (  0.0%)  Local Stack Slot Allocation #2
   0.0149 (  0.0%)   0.0030 (  0.0%)   0.0178 (  0.0%)   0.0185 (  0.0%)  Lazy Machine Block Frequency Analysis #3
   0.0143 (  0.0%)   0.0037 (  0.0%)   0.0180 (  0.0%)   0.0185 (  0.0%)  Memory Dependence Analysis #4
   0.0147 (  0.0%)   0.0030 (  0.0%)   0.0177 (  0.0%)   0.0184 (  0.0%)  X86 Discriminate Memory Operands #2
   0.0149 (  0.0%)   0.0030 (  0.0%)   0.0180 (  0.0%)   0.0184 (  0.0%)  StackMap Liveness Analysis #2
   0.0147 (  0.0%)   0.0034 (  0.0%)   0.0182 (  0.0%)   0.0183 (  0.0%)  Lazy Machine Block Frequency Analysis #5
   0.0149 (  0.0%)   0.0031 (  0.0%)   0.0180 (  0.0%)   0.0183 (  0.0%)  X86 WinAlloca Expander #2
   0.0144 (  0.0%)   0.0035 (  0.0%)   0.0179 (  0.0%)   0.0182 (  0.0%)  Loop Access Analysis #2
   0.0140 (  0.0%)   0.0038 (  0.0%)   0.0178 (  0.0%)   0.0182 (  0.0%)  Loop Access Analysis
   0.0147 (  0.0%)   0.0031 (  0.0%)   0.0178 (  0.0%)   0.0182 (  0.0%)  X86 Avoid Store Forwarding Blocks
   0.0139 (  0.0%)   0.0037 (  0.0%)   0.0177 (  0.0%)   0.0180 (  0.0%)  Memory Dependence Analysis #3
   0.0146 (  0.0%)   0.0030 (  0.0%)   0.0176 (  0.0%)   0.0179 (  0.0%)  Analyze Machine Code For Garbage Collection #2
   0.0145 (  0.0%)   0.0025 (  0.0%)   0.0170 (  0.0%)   0.0179 (  0.0%)  Lazy Branch Probability Analysis #9
   0.0142 (  0.0%)   0.0030 (  0.0%)   0.0172 (  0.0%)   0.0177 (  0.0%)  Lazy Machine Block Frequency Analysis #4
   0.0156 (  0.0%)   0.0020 (  0.0%)   0.0176 (  0.0%)   0.0176 (  0.0%)  Eliminate PHI nodes for register allocation
   0.0135 (  0.0%)   0.0036 (  0.0%)   0.0172 (  0.0%)   0.0175 (  0.0%)  Basic Alias Analysis (stateless AA impl) #9
   0.0141 (  0.0%)   0.0025 (  0.0%)   0.0165 (  0.0%)   0.0173 (  0.0%)  Optimization Remark Emitter #10
   0.0117 (  0.0%)   0.0047 (  0.1%)   0.0165 (  0.0%)   0.0173 (  0.0%)  Lazy Branch Probability Analysis #2
   0.0141 (  0.0%)   0.0028 (  0.0%)   0.0168 (  0.0%)   0.0173 (  0.0%)  Compressing EVEX instrs to VEX encoding when possible
   0.0138 (  0.0%)   0.0029 (  0.0%)   0.0167 (  0.0%)   0.0172 (  0.0%)  Detect Dead Lanes
   0.0138 (  0.0%)   0.0028 (  0.0%)   0.0166 (  0.0%)   0.0171 (  0.0%)  X86 Insert Cache Prefetches #2
   0.0139 (  0.0%)   0.0024 (  0.0%)   0.0163 (  0.0%)   0.0171 (  0.0%)  Lazy Block Frequency Analysis #9
   0.0130 (  0.0%)   0.0035 (  0.0%)   0.0165 (  0.0%)   0.0169 (  0.0%)  Phi Values Analysis #4
   0.0115 (  0.0%)   0.0046 (  0.1%)   0.0160 (  0.0%)   0.0168 (  0.0%)  Optimization Remark Emitter #2
   0.0140 (  0.0%)   0.0028 (  0.0%)   0.0169 (  0.0%)   0.0168 (  0.0%)  Expand Atomic instructions
   0.0127 (  0.0%)   0.0034 (  0.0%)   0.0161 (  0.0%)   0.0167 (  0.0%)  Lazy Branch Probability Analysis #3
   0.0129 (  0.0%)   0.0034 (  0.0%)   0.0163 (  0.0%)   0.0166 (  0.0%)  Safe Stack instrumentation pass #2
   0.0126 (  0.0%)   0.0034 (  0.0%)   0.0160 (  0.0%)   0.0166 (  0.0%)  Lazy Branch Probability Analysis #4
   0.0144 (  0.0%)   0.0016 (  0.0%)   0.0160 (  0.0%)   0.0166 (  0.0%)  Lazy Branch Probability Analysis
   0.0127 (  0.0%)   0.0035 (  0.0%)   0.0162 (  0.0%)   0.0166 (  0.0%)  Lazy Branch Probability Analysis #6
   0.0145 (  0.0%)   0.0021 (  0.0%)   0.0166 (  0.0%)   0.0165 (  0.0%)  Post-RA pseudo instruction expansion pass
   0.0128 (  0.0%)   0.0034 (  0.0%)   0.0162 (  0.0%)   0.0165 (  0.0%)  Lazy Branch Probability Analysis #5
   0.0125 (  0.0%)   0.0034 (  0.0%)   0.0159 (  0.0%)   0.0165 (  0.0%)  Optimization Remark Emitter #9
   0.0127 (  0.0%)   0.0034 (  0.0%)   0.0161 (  0.0%)   0.0165 (  0.0%)  Phi Values Analysis #3
   0.0126 (  0.0%)   0.0035 (  0.0%)   0.0161 (  0.0%)   0.0165 (  0.0%)  Optimization Remark Emitter #6
   0.0126 (  0.0%)   0.0034 (  0.0%)   0.0160 (  0.0%)   0.0164 (  0.0%)  Optimization Remark Emitter #7
   0.0124 (  0.0%)   0.0034 (  0.0%)   0.0158 (  0.0%)   0.0164 (  0.0%)  Lazy Branch Probability Analysis #7
   0.0142 (  0.0%)   0.0021 (  0.0%)   0.0163 (  0.0%)   0.0163 (  0.0%)  Lazy Branch Probability Analysis #10
   0.0125 (  0.0%)   0.0034 (  0.0%)   0.0160 (  0.0%)   0.0163 (  0.0%)  Optimization Remark Emitter #4
   0.0131 (  0.0%)   0.0031 (  0.0%)   0.0162 (  0.0%)   0.0163 (  0.0%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining) #2
   0.0126 (  0.0%)   0.0035 (  0.0%)   0.0160 (  0.0%)   0.0163 (  0.0%)  Optimization Remark Emitter #8
   0.0123 (  0.0%)   0.0034 (  0.0%)   0.0157 (  0.0%)   0.0162 (  0.0%)  Lazy Branch Probability Analysis #8
   0.0125 (  0.0%)   0.0033 (  0.0%)   0.0159 (  0.0%)   0.0162 (  0.0%)  Lazy Block Frequency Analysis #6
   0.0111 (  0.0%)   0.0046 (  0.1%)   0.0156 (  0.0%)   0.0162 (  0.0%)  Lazy Block Frequency Analysis #2
   0.0125 (  0.0%)   0.0034 (  0.0%)   0.0160 (  0.0%)   0.0162 (  0.0%)  Optimization Remark Emitter #3
   0.0123 (  0.0%)   0.0033 (  0.0%)   0.0156 (  0.0%)   0.0162 (  0.0%)  Optimization Remark Emitter #5
   0.0125 (  0.0%)   0.0033 (  0.0%)   0.0159 (  0.0%)   0.0162 (  0.0%)  Demanded bits analysis
   0.0141 (  0.0%)   0.0019 (  0.0%)   0.0160 (  0.0%)   0.0160 (  0.0%)  X86 vzeroupper inserter
   0.0122 (  0.0%)   0.0034 (  0.0%)   0.0156 (  0.0%)   0.0160 (  0.0%)  Demanded bits analysis #2
   0.0132 (  0.0%)   0.0027 (  0.0%)   0.0159 (  0.0%)   0.0160 (  0.0%)  CallGraph Construction
   0.0139 (  0.0%)   0.0016 (  0.0%)   0.0155 (  0.0%)   0.0160 (  0.0%)  Lazy Block Frequency Analysis
   0.0138 (  0.0%)   0.0015 (  0.0%)   0.0153 (  0.0%)   0.0159 (  0.0%)  Optimization Remark Emitter
   0.0121 (  0.0%)   0.0033 (  0.0%)   0.0154 (  0.0%)   0.0159 (  0.0%)  Lazy Block Frequency Analysis #5
   0.0137 (  0.0%)   0.0020 (  0.0%)   0.0158 (  0.0%)   0.0157 (  0.0%)  Optimization Remark Emitter #11
   0.0136 (  0.0%)   0.0020 (  0.0%)   0.0156 (  0.0%)   0.0157 (  0.0%)  Lazy Block Frequency Analysis #10
   0.0119 (  0.0%)   0.0033 (  0.0%)   0.0152 (  0.0%)   0.0157 (  0.0%)  Lazy Block Frequency Analysis #8
   0.0120 (  0.0%)   0.0033 (  0.0%)   0.0153 (  0.0%)   0.0157 (  0.0%)  Lazy Block Frequency Analysis #4
   0.0120 (  0.0%)   0.0033 (  0.0%)   0.0152 (  0.0%)   0.0156 (  0.0%)  Lazy Block Frequency Analysis #3
   0.0120 (  0.0%)   0.0033 (  0.0%)   0.0152 (  0.0%)   0.0156 (  0.0%)  Lazy Block Frequency Analysis #7
   0.0136 (  0.0%)   0.0020 (  0.0%)   0.0156 (  0.0%)   0.0155 (  0.0%)  Finalize ISel and expand pseudo-instructions
   0.0106 (  0.0%)   0.0043 (  0.0%)   0.0149 (  0.0%)   0.0154 (  0.0%)  LCSSA Verifier
   0.0117 (  0.0%)   0.0032 (  0.0%)   0.0148 (  0.0%)   0.0153 (  0.0%)  LCSSA Verifier #4
   0.0130 (  0.0%)   0.0020 (  0.0%)   0.0150 (  0.0%)   0.0151 (  0.0%)  X86 EFLAGS copy lowering
   0.0115 (  0.0%)   0.0032 (  0.0%)   0.0147 (  0.0%)   0.0150 (  0.0%)  LCSSA Verifier #2
   0.0115 (  0.0%)   0.0032 (  0.0%)   0.0146 (  0.0%)   0.0150 (  0.0%)  LCSSA Verifier #3
   0.0118 (  0.0%)   0.0029 (  0.0%)   0.0147 (  0.0%)   0.0148 (  0.0%)  Lower Garbage Collection Instructions #2
   0.0118 (  0.0%)   0.0029 (  0.0%)   0.0147 (  0.0%)   0.0147 (  0.0%)  Shadow Stack GC Lowering #2
   0.0125 (  0.0%)   0.0019 (  0.0%)   0.0144 (  0.0%)   0.0144 (  0.0%)  LowerPTLS Pass
   0.0124 (  0.0%)   0.0017 (  0.0%)   0.0141 (  0.0%)   0.0143 (  0.0%)  LICM for julia specific intrinsics.
   0.0109 (  0.0%)   0.0015 (  0.0%)   0.0124 (  0.0%)   0.0124 (  0.0%)  X86 pseudo instruction expansion pass
   0.0089 (  0.0%)   0.0024 (  0.0%)   0.0113 (  0.0%)   0.0113 (  0.0%)  Merge Duplicate Global Constants
   0.0101 (  0.0%)   0.0012 (  0.0%)   0.0113 (  0.0%)   0.0113 (  0.0%)  Check CFA info and insert CFI instructions if needed
   0.0082 (  0.0%)   0.0015 (  0.0%)   0.0097 (  0.0%)   0.0097 (  0.0%)  Scalarize Masked Memory Intrinsics
   0.0084 (  0.0%)   0.0011 (  0.0%)   0.0095 (  0.0%)   0.0096 (  0.0%)  LICM for julia specific intrinsics. #2
   0.0084 (  0.0%)   0.0008 (  0.0%)   0.0092 (  0.0%)   0.0094 (  0.0%)  Delete dead loops
   0.0079 (  0.0%)   0.0014 (  0.0%)   0.0093 (  0.0%)   0.0093 (  0.0%)  Expand reduction intrinsics
   0.0076 (  0.0%)   0.0006 (  0.0%)   0.0082 (  0.0%)   0.0084 (  0.0%)  Delete dead loops #2
   0.0065 (  0.0%)   0.0012 (  0.0%)   0.0077 (  0.0%)   0.0077 (  0.0%)  Remove unreachable blocks from the CFG
   0.0064 (  0.0%)   0.0012 (  0.0%)   0.0076 (  0.0%)   0.0076 (  0.0%)  GC Invariant Verification Pass
   0.0058 (  0.0%)   0.0013 (  0.0%)   0.0072 (  0.0%)   0.0072 (  0.0%)  Function Alias Analysis Results
   0.0049 (  0.0%)   0.0015 (  0.0%)   0.0065 (  0.0%)   0.0065 (  0.0%)  Exception handling preparation
   0.0061 (  0.0%)   0.0000 (  0.0%)   0.0061 (  0.0%)   0.0061 (  0.0%)  Assumption Cache Tracker #2
   0.0041 (  0.0%)   0.0019 (  0.0%)   0.0060 (  0.0%)   0.0059 (  0.0%)  Assumption Cache Tracker
   0.0044 (  0.0%)   0.0008 (  0.0%)   0.0051 (  0.0%)   0.0052 (  0.0%)  Bundle Machine CFG Edges
   0.0040 (  0.0%)   0.0009 (  0.0%)   0.0050 (  0.0%)   0.0050 (  0.0%)  Remove non-integral address space.
   0.0033 (  0.0%)   0.0008 (  0.0%)   0.0041 (  0.0%)   0.0041 (  0.0%)  Lower Julia Exception Handlers
   0.0034 (  0.0%)   0.0007 (  0.0%)   0.0041 (  0.0%)   0.0041 (  0.0%)  X86 Indirect Branch Tracking
   0.0033 (  0.0%)   0.0007 (  0.0%)   0.0040 (  0.0%)   0.0041 (  0.0%)  Expand indirectbr instructions
   0.0029 (  0.0%)   0.0006 (  0.0%)   0.0036 (  0.0%)   0.0036 (  0.0%)  Memory Dependence Analysis
   0.0031 (  0.0%)   0.0006 (  0.0%)   0.0037 (  0.0%)   0.0036 (  0.0%)  Basic Alias Analysis (stateless AA impl) #2
   0.0029 (  0.0%)   0.0007 (  0.0%)   0.0036 (  0.0%)   0.0036 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0027 (  0.0%)   0.0009 (  0.0%)   0.0036 (  0.0%)   0.0036 (  0.0%)  Phi Values Analysis
   0.0029 (  0.0%)   0.0006 (  0.0%)   0.0035 (  0.0%)   0.0035 (  0.0%)  X86 PIC Global Base Reg Initialization
   0.0027 (  0.0%)   0.0005 (  0.0%)   0.0033 (  0.0%)   0.0033 (  0.0%)  Insert fentry calls
   0.0026 (  0.0%)   0.0006 (  0.0%)   0.0032 (  0.0%)   0.0033 (  0.0%)  Machine Optimization Remark Emitter
   0.0025 (  0.0%)   0.0006 (  0.0%)   0.0031 (  0.0%)   0.0032 (  0.0%)  Contiguously Lay Out Funclets
   0.0023 (  0.0%)   0.0006 (  0.0%)   0.0029 (  0.0%)   0.0030 (  0.0%)  Machine Optimization Remark Emitter #2
   0.0024 (  0.0%)   0.0005 (  0.0%)   0.0029 (  0.0%)   0.0030 (  0.0%)  X86 speculative load hardening
   0.0024 (  0.0%)   0.0005 (  0.0%)   0.0029 (  0.0%)   0.0029 (  0.0%)  Insert XRay ops
   0.0024 (  0.0%)   0.0005 (  0.0%)   0.0029 (  0.0%)   0.0029 (  0.0%)  X86 FP Stackifier
   0.0023 (  0.0%)   0.0005 (  0.0%)   0.0028 (  0.0%)   0.0029 (  0.0%)  StackMap Liveness Analysis
   0.0023 (  0.0%)   0.0005 (  0.0%)   0.0028 (  0.0%)   0.0029 (  0.0%)  Implement the 'patchable-function' attribute
   0.0023 (  0.0%)   0.0005 (  0.0%)   0.0028 (  0.0%)   0.0029 (  0.0%)  X86 WinAlloca Expander
   0.0023 (  0.0%)   0.0005 (  0.0%)   0.0028 (  0.0%)   0.0029 (  0.0%)  X86 Retpoline Thunks
   0.0024 (  0.0%)   0.0005 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining)
   0.0021 (  0.0%)   0.0006 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)  Lazy Machine Block Frequency Analysis #2
   0.0023 (  0.0%)   0.0005 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)  Local Stack Slot Allocation
   0.0022 (  0.0%)   0.0005 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0021 (  0.0%)   0.0006 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  Safe Stack instrumentation pass
   0.0023 (  0.0%)   0.0005 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  Lazy Machine Block Frequency Analysis
   0.0022 (  0.0%)   0.0005 (  0.0%)   0.0027 (  0.0%)   0.0028 (  0.0%)  X86 Discriminate Memory Operands
   0.0022 (  0.0%)   0.0004 (  0.0%)   0.0026 (  0.0%)   0.0027 (  0.0%)  X86 Insert Cache Prefetches
   0.0021 (  0.0%)   0.0004 (  0.0%)   0.0025 (  0.0%)   0.0025 (  0.0%)  Shadow Stack GC Lowering
   0.0021 (  0.0%)   0.0004 (  0.0%)   0.0025 (  0.0%)   0.0025 (  0.0%)  Lower Garbage Collection Instructions
   0.0019 (  0.0%)   0.0004 (  0.0%)   0.0023 (  0.0%)   0.0023 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0020 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.0%)   0.0020 (  0.0%)  Pre-ISel Intrinsic Lowering #2
   0.0014 (  0.0%)   0.0003 (  0.0%)   0.0017 (  0.0%)   0.0018 (  0.0%)  LowerSIMDLoop Pass
   0.0013 (  0.0%)   0.0003 (  0.0%)   0.0015 (  0.0%)   0.0016 (  0.0%)  Rewrite Symbols
   0.0011 (  0.0%)   0.0002 (  0.0%)   0.0014 (  0.0%)   0.0014 (  0.0%)  LowerSIMDLoop Pass #2
   0.0011 (  0.0%)   0.0003 (  0.0%)   0.0014 (  0.0%)   0.0014 (  0.0%)  A No-Op Barrier Pass
   0.0005 (  0.0%)   0.0002 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)  Create Garbage Collector Module Metadata
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Target Library Information
   0.0005 (  0.0%)   0.0001 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Profile summary info
   0.0005 (  0.0%)   0.0002 (  0.0%)   0.0007 (  0.0%)   0.0006 (  0.0%)  Target Pass Configuration
   0.0005 (  0.0%)   0.0002 (  0.0%)   0.0007 (  0.0%)   0.0006 (  0.0%)  Target Transform Information
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Machine Module Information
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Machine Branch Probability Analysis
   0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0003 (  0.0%)  LowerSIMDLoop Pass #3
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Dominator Tree Construction #11
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Dominator Tree Construction #5
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Natural Loop Information
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Natural Loop Information #6
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Remove non-integral address space. #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  A No-Op Barrier Pass #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Profile summary info #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information #2
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata #2
  58.5775 (100.0%)   9.0836 (100.0%)  67.6611 (100.0%)  68.3778 (100.0%)  Total

71.27user 11.43system 1:23.81elapsed 98%CPU (0avgtext+0avgdata 1111856maxresident)k
184inputs+196584outputs (0major+384503minor)pagefaults 0swaps

@chriselrod
Copy link
Contributor

Would LLVM 11 be an option for Julia 1.6? RC-2 is out now, with 11-final planned for August 26th.
If so, it'd be worth benchmarking as well. There's been a lot of recent work on improving LLVM's speed, mainly for Rust's sake:
https://nikic.github.io/2020/05/10/Make-LLVM-fast-again.html
https://blog.mozilla.org/nnethercote/2020/08/05/how-to-speed-up-the-rust-compiler-some-more-in-2020/
Would be interesting to benchmark and see if it helps us, too.

I believe LLVM 11 also includes this: https://reviews.llvm.org/D75016
I'd like to at least have that patch in Julia 1.6. I'll look into how to apply it if we won't ship with Julia LLVM 11 (i.e., looking at Valentin's and Yuyichao's PRs to see how it's done).
That patch should allow LoopVectorization to stop using asm call for vfmadd231p(d/s), and just use @llvm.fmuladd instead.

BTW, not particularly relevant to this issue, but the AVX-512 down-clocking problem has been almost complete solved as of Ice Lake: https://travisdowns.github.io/blog/2020/08/19/icl-avx512-freq.html

Instructions: scalar/light128 bit - heavy 256 bit - heavy 512 bit
My cascadelake 1 core: 4.6 - 4.3 - 4.1
My cascadelake all-core: 4.6 - 4.3 - 4.1
Ice Lake i5-1035G4 1 core: 3.7 - 3.7 - 3.6
Ice Lake i5-1035G4 4 core: 3.3 - 3.3 - 3.3
There are some other problems where LLVM generates bad AVX-512 code where 256 bit vectors would be faster than 512 bit, even at the same clock speed.

I haven't found out how to use JULIA_LLVM_ARGS to actually change the preferred vector width, so I can't run benchmarks without also using different LLVM versions. LLVM's args are different from Clang's (where it would be -mprefer-default-vector-width=$BITS).

@yuyichao
Copy link
Contributor

I assume you mean -mprefer-vector-width=.

It's not an LLVM command line option but a (global and per-function) target property. Add prefer-256-bit to the cpu specification should work, i.e. julia -C "native,prefer-256-bit".

@chriselrod
Copy link
Contributor

Yes, perfect, thanks!

@IanButterworth
Copy link
Sponsor Member

@chriselrod I was thinking the same. Getting to LLVM 11 in 1.6 seems like a good idea on the face of it. 10 does seem to be the ugly duckling and the work on speed since does seem positive, but probably still a good idea to work through 10 to get to 11?

There's also some newer arm chip support in LLVM 11 such as nvidia Carmel that would be nice to get into the next LTS.

@chriselrod
Copy link
Contributor

chriselrod commented Aug 23, 2020

10 does seem to be the ugly duckling

It may be the ugly duckling, but I did notice some improvements in several LoopVectorization examples, where integer/indexing code was handled better by LLVM 10 than by LLVM <=9. So there're some definite improvements (that probably apply to other examples as well?).

A simple example of where 256-bit performs much better than 512-bit:

julia> function trmv!(y, A, x)
           fill!(y, 0)
           @inbounds for n  axes(A,2)
               @simd for m  1:n
                   y[m] += A[m,n] * x[n]
               end
           end
       end
trmv! (generic function with 1 method)

julia> M = 64; C = rand(M, M); x = rand(M); y = similar(x);

julia> @benchmark trmv!($y, $C, $x)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     825.942 ns (0.00% GC)
  median time:      830.907 ns (0.00% GC)
  mean time:        838.359 ns (0.00% GC)
  maximum time:     1.315 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     86

Versus starting with -C "native,prefer-256-bit":

julia> function trmv!(y, A, x)
           fill!(y, 0)
           @inbounds for n  axes(A,2)
               @simd for m  1:n
                   y[m] += A[m,n] * x[n]
               end
           end
       end
trmv! (generic function with 1 method)

julia> M = 64; C = rand(M, M); x = rand(M); y = similar(x);

julia> @benchmark trmv!($y, $C, $x)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     508.031 ns (0.00% GC)
  median time:      509.016 ns (0.00% GC)
  mean time:        509.541 ns (0.00% GC)
  maximum time:     668.363 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     193

FWIW, replacing @simd with @avx on the inner loop yields 350ns with 512 bit vectors, which is why I emphasize that this is a problem with LLVM, not a problem with wide vectors.
@avx doesn't work on the outer loop yet because the loop is triangular, but I'll find the time to do this some day.

@vchuravy
Copy link
Sponsor Member Author

Note that this is LLVM 10.0.1 so lots of fixes might already been backported. Going to LLVM 11 is feasible, but higher risk and might involve more work from our side to side out regressions.

If you really want to see LLVM 11 for 1.6 I would appreciate help with that. Getting an WIP on Yggdrasil for rc2 would be the first step, as well as looking at the patch list. Currently I do these upgrades when I am plagued by insomnia.

@chriselrod
Copy link
Contributor

I've read a lot of LLVM IR, but that's very different from being familiar with LLVM itself.

What's the procedure for these upgrades, how I would I start? Which patch list? Julia's list of LLVM patches, to see which are no longer needed in LLVM 11?

@vchuravy
Copy link
Sponsor Member Author

@nanosoldier runbenchmarks(ALL, vs = ":master")

@vchuravy
Copy link
Sponsor Member Author

@nanosoldier runtests(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@vchuravy
Copy link
Sponsor Member Author

["array", "index", "(\"sumeach\", \"SubArray{Int32, 2, BaseBenchmarks.ArrayBenchmarks.ArrayLS{Int32, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, false}\")"] 15.31 (50%) 1.00 (1%)

Definitely worth looking into

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

@maleadt
Copy link
Member

maleadt commented Sep 24, 2020

@nanosoldier runtests(["CartesianGeneticProgramming", "Contour", "DecFP", "Gridap", "IncompleteLU", "Infinity", "IntervalTrees", "MusicManipulations", "NormalSplines"], vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

@vchuravy vchuravy merged commit 864582c into master Sep 24, 2020
@vchuravy vchuravy deleted the vc/upgrade_llvm_10 branch September 24, 2020 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external dependencies Involves LLVM, OpenBLAS, or other linked libraries
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants