-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Pitfalls and How to Catch them #727
Conversation
d14226e
to
5c1f6bf
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #727 +/- ##
==========================================
- Coverage 96.48% 96.41% -0.07%
==========================================
Files 54 57 +3
Lines 2729 2791 +62
==========================================
+ Hits 2633 2691 +58
- Misses 96 100 +4 ☔ View full report in Codecov by Sentry. |
b7bc477
to
dafc583
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark Results
Benchmark suite | Current: d1eca68 | Previous: 4505c9f | Ratio |
---|---|---|---|
Dense(2 => 2)/cpu/reverse/ReverseDiff (compiled)/(2, 128) |
3674.375 ns |
3640.5 ns |
1.01 |
Dense(2 => 2)/cpu/reverse/Zygote/(2, 128) |
7185.166666666667 ns |
7198.333333333333 ns |
1.00 |
Dense(2 => 2)/cpu/reverse/Tracker/(2, 128) |
21771 ns |
21049 ns |
1.03 |
Dense(2 => 2)/cpu/reverse/ReverseDiff/(2, 128) |
9878.5 ns |
9856.2 ns |
1.00 |
Dense(2 => 2)/cpu/reverse/Flux/(2, 128) |
9057 ns |
9172 ns |
0.99 |
Dense(2 => 2)/cpu/reverse/SimpleChains/(2, 128) |
4460.875 ns |
4541 ns |
0.98 |
Dense(2 => 2)/cpu/reverse/Enzyme/(2, 128) |
1158.8402777777778 ns |
1160.112676056338 ns |
1.00 |
Dense(2 => 2)/cpu/forward/NamedTuple/(2, 128) |
1120.9571428571428 ns |
1169.4358974358975 ns |
0.96 |
Dense(2 => 2)/cpu/forward/ComponentArray/(2, 128) |
1170.55 ns |
1184.4857142857143 ns |
0.99 |
Dense(2 => 2)/cpu/forward/Flux/(2, 128) |
1788.5357142857142 ns |
1778.4666666666667 ns |
1.01 |
Dense(2 => 2)/cpu/forward/SimpleChains/(2, 128) |
179.50980392156862 ns |
178.93653032440056 ns |
1.00 |
Dense(20 => 20)/cpu/reverse/ReverseDiff (compiled)/(20, 128) |
17363 ns |
17282 ns |
1.00 |
Dense(20 => 20)/cpu/reverse/Zygote/(20, 128) |
17012 ns |
17022 ns |
1.00 |
Dense(20 => 20)/cpu/reverse/Tracker/(20, 128) |
40336 ns |
39183 ns |
1.03 |
Dense(20 => 20)/cpu/reverse/ReverseDiff/(20, 128) |
29545 ns |
29245 ns |
1.01 |
Dense(20 => 20)/cpu/reverse/Flux/(20, 128) |
20128 ns |
21791 ns |
0.92 |
Dense(20 => 20)/cpu/reverse/SimpleChains/(20, 128) |
17373 ns |
17312 ns |
1.00 |
Dense(20 => 20)/cpu/reverse/Enzyme/(20, 128) |
4333.714285714285 ns |
4330.857142857143 ns |
1.00 |
Dense(20 => 20)/cpu/forward/NamedTuple/(20, 128) |
3888.5 ns |
3845.875 ns |
1.01 |
Dense(20 => 20)/cpu/forward/ComponentArray/(20, 128) |
3986.25 ns |
3932.25 ns |
1.01 |
Dense(20 => 20)/cpu/forward/Flux/(20, 128) |
4992.142857142857 ns |
4924.857142857143 ns |
1.01 |
Dense(20 => 20)/cpu/forward/SimpleChains/(20, 128) |
1657.1 ns |
1653.1 ns |
1.00 |
Conv((3, 3), 3 => 3)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 3, 128) |
46633485 ns |
43688609.5 ns |
1.07 |
Conv((3, 3), 3 => 3)/cpu/reverse/Zygote/(64, 64, 3, 128) |
57691066 ns |
57875220 ns |
1.00 |
Conv((3, 3), 3 => 3)/cpu/reverse/Tracker/(64, 64, 3, 128) |
111480015 ns |
94212606.5 ns |
1.18 |
Conv((3, 3), 3 => 3)/cpu/reverse/ReverseDiff/(64, 64, 3, 128) |
102473763 ns |
92167603 ns |
1.11 |
Conv((3, 3), 3 => 3)/cpu/reverse/Flux/(64, 64, 3, 128) |
105468734 ns |
78485746 ns |
1.34 |
Conv((3, 3), 3 => 3)/cpu/reverse/SimpleChains/(64, 64, 3, 128) |
12023231.5 ns |
11740759.5 ns |
1.02 |
Conv((3, 3), 3 => 3)/cpu/reverse/Enzyme/(64, 64, 3, 128) |
8474373 ns |
8461836 ns |
1.00 |
Conv((3, 3), 3 => 3)/cpu/forward/NamedTuple/(64, 64, 3, 128) |
7017652 ns |
7015431 ns |
1.00 |
Conv((3, 3), 3 => 3)/cpu/forward/ComponentArray/(64, 64, 3, 128) |
7003051 ns |
6997181 ns |
1.00 |
Conv((3, 3), 3 => 3)/cpu/forward/Flux/(64, 64, 3, 128) |
18435270 ns |
18434990 ns |
1.00 |
Conv((3, 3), 3 => 3)/cpu/forward/SimpleChains/(64, 64, 3, 128) |
6398236 ns |
6395518 ns |
1.00 |
vgg16/cpu/reverse/Zygote/(32, 32, 3, 16) |
756595885 ns |
735875069 ns |
1.03 |
vgg16/cpu/reverse/Zygote/(32, 32, 3, 64) |
2552773914 ns |
2560001225 ns |
1.00 |
vgg16/cpu/reverse/Zygote/(32, 32, 3, 2) |
145609200 ns |
134397378 ns |
1.08 |
vgg16/cpu/reverse/Tracker/(32, 32, 3, 16) |
863419788 ns |
978250119 ns |
0.88 |
vgg16/cpu/reverse/Tracker/(32, 32, 3, 64) |
3449542523 ns |
3570165964 ns |
0.97 |
vgg16/cpu/reverse/Tracker/(32, 32, 3, 2) |
223440476.5 ns |
240018428.5 ns |
0.93 |
vgg16/cpu/reverse/Flux/(32, 32, 3, 16) |
732301383 ns |
800431721.5 ns |
0.91 |
vgg16/cpu/reverse/Flux/(32, 32, 3, 64) |
3385189902 ns |
2845642648 ns |
1.19 |
vgg16/cpu/reverse/Flux/(32, 32, 3, 2) |
131036679.5 ns |
139114591 ns |
0.94 |
vgg16/cpu/forward/NamedTuple/(32, 32, 3, 16) |
174026417 ns |
173071950.5 ns |
1.01 |
vgg16/cpu/forward/NamedTuple/(32, 32, 3, 64) |
654732707.5 ns |
652361858.5 ns |
1.00 |
vgg16/cpu/forward/NamedTuple/(32, 32, 3, 2) |
34666742.5 ns |
34635607 ns |
1.00 |
vgg16/cpu/forward/ComponentArray/(32, 32, 3, 16) |
164981547.5 ns |
164826664 ns |
1.00 |
vgg16/cpu/forward/ComponentArray/(32, 32, 3, 64) |
645325827 ns |
645086099 ns |
1.00 |
vgg16/cpu/forward/ComponentArray/(32, 32, 3, 2) |
30335987.5 ns |
30506097 ns |
0.99 |
vgg16/cpu/forward/Flux/(32, 32, 3, 16) |
228157180 ns |
228376597 ns |
1.00 |
vgg16/cpu/forward/Flux/(32, 32, 3, 64) |
774764699.5 ns |
857070523 ns |
0.90 |
vgg16/cpu/forward/Flux/(32, 32, 3, 2) |
37522583 ns |
38093395.5 ns |
0.99 |
Conv((3, 3), 64 => 64)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 64, 128) |
1249890656.5 ns |
1207490420.5 ns |
1.04 |
Conv((3, 3), 64 => 64)/cpu/reverse/Zygote/(64, 64, 64, 128) |
1861556584 ns |
1874166667 ns |
0.99 |
Conv((3, 3), 64 => 64)/cpu/reverse/Tracker/(64, 64, 64, 128) |
2409852401 ns |
2498454248 ns |
0.96 |
Conv((3, 3), 64 => 64)/cpu/reverse/ReverseDiff/(64, 64, 64, 128) |
2548713964 ns |
2542592633 ns |
1.00 |
Conv((3, 3), 64 => 64)/cpu/reverse/Flux/(64, 64, 64, 128) |
1967513931 ns |
2002502937 ns |
0.98 |
Conv((3, 3), 64 => 64)/cpu/reverse/Enzyme/(64, 64, 64, 128) |
362160925 ns |
357426065 ns |
1.01 |
Conv((3, 3), 64 => 64)/cpu/forward/NamedTuple/(64, 64, 64, 128) |
321814591 ns |
319693714 ns |
1.01 |
Conv((3, 3), 64 => 64)/cpu/forward/ComponentArray/(64, 64, 64, 128) |
323120623 ns |
317954298 ns |
1.02 |
Conv((3, 3), 64 => 64)/cpu/forward/Flux/(64, 64, 64, 128) |
411676251 ns |
473384184.5 ns |
0.87 |
Conv((3, 3), 1 => 1)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 1, 128) |
11749693 ns |
11869677 ns |
0.99 |
Conv((3, 3), 1 => 1)/cpu/reverse/Zygote/(64, 64, 1, 128) |
17936724 ns |
17901011.5 ns |
1.00 |
Conv((3, 3), 1 => 1)/cpu/reverse/Tracker/(64, 64, 1, 128) |
19119798.5 ns |
19189143 ns |
1.00 |
Conv((3, 3), 1 => 1)/cpu/reverse/ReverseDiff/(64, 64, 1, 128) |
23788006.5 ns |
23958547 ns |
0.99 |
Conv((3, 3), 1 => 1)/cpu/reverse/Flux/(64, 64, 1, 128) |
17920112 ns |
17923251 ns |
1.00 |
Conv((3, 3), 1 => 1)/cpu/reverse/SimpleChains/(64, 64, 1, 128) |
1171676.5 ns |
1165364 ns |
1.01 |
Conv((3, 3), 1 => 1)/cpu/reverse/Enzyme/(64, 64, 1, 128) |
2530044 ns |
2517607 ns |
1.00 |
Conv((3, 3), 1 => 1)/cpu/forward/NamedTuple/(64, 64, 1, 128) |
2059060 ns |
2045179 ns |
1.01 |
Conv((3, 3), 1 => 1)/cpu/forward/ComponentArray/(64, 64, 1, 128) |
2035872.5 ns |
2030912 ns |
1.00 |
Conv((3, 3), 1 => 1)/cpu/forward/Flux/(64, 64, 1, 128) |
2087108.5 ns |
2067039 ns |
1.01 |
Conv((3, 3), 1 => 1)/cpu/forward/SimpleChains/(64, 64, 1, 128) |
204333 ns |
200071 ns |
1.02 |
Dense(200 => 200)/cpu/reverse/ReverseDiff (compiled)/(200, 128) |
294572 ns |
293791 ns |
1.00 |
Dense(200 => 200)/cpu/reverse/Zygote/(200, 128) |
268152.5 ns |
269561 ns |
0.99 |
Dense(200 => 200)/cpu/reverse/Tracker/(200, 128) |
372418 ns |
371871 ns |
1.00 |
Dense(200 => 200)/cpu/reverse/ReverseDiff/(200, 128) |
412313 ns |
412566 ns |
1.00 |
Dense(200 => 200)/cpu/reverse/Flux/(200, 128) |
276358 ns |
276704 ns |
1.00 |
Dense(200 => 200)/cpu/reverse/SimpleChains/(200, 128) |
416951 ns |
410473 ns |
1.02 |
Dense(200 => 200)/cpu/reverse/Enzyme/(200, 128) |
83687 ns |
83495 ns |
1.00 |
Dense(200 => 200)/cpu/forward/NamedTuple/(200, 128) |
82054 ns |
82302 ns |
1.00 |
Dense(200 => 200)/cpu/forward/ComponentArray/(200, 128) |
83116 ns |
85228 ns |
0.98 |
Dense(200 => 200)/cpu/forward/Flux/(200, 128) |
87404 ns |
87413 ns |
1.00 |
Dense(200 => 200)/cpu/forward/SimpleChains/(200, 128) |
104987 ns |
104644 ns |
1.00 |
Conv((3, 3), 16 => 16)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 16, 128) |
195304591 ns |
199077378 ns |
0.98 |
Conv((3, 3), 16 => 16)/cpu/reverse/Zygote/(64, 64, 16, 128) |
327096306 ns |
328274228.5 ns |
1.00 |
Conv((3, 3), 16 => 16)/cpu/reverse/Tracker/(64, 64, 16, 128) |
436109596.5 ns |
449850574.5 ns |
0.97 |
Conv((3, 3), 16 => 16)/cpu/reverse/ReverseDiff/(64, 64, 16, 128) |
484444356 ns |
481685356 ns |
1.01 |
Conv((3, 3), 16 => 16)/cpu/reverse/Flux/(64, 64, 16, 128) |
409134479.5 ns |
416250117.5 ns |
0.98 |
Conv((3, 3), 16 => 16)/cpu/reverse/SimpleChains/(64, 64, 16, 128) |
340551010 ns |
324981397.5 ns |
1.05 |
Conv((3, 3), 16 => 16)/cpu/reverse/Enzyme/(64, 64, 16, 128) |
51609023 ns |
51576403 ns |
1.00 |
Conv((3, 3), 16 => 16)/cpu/forward/NamedTuple/(64, 64, 16, 128) |
44183475 ns |
43917353 ns |
1.01 |
Conv((3, 3), 16 => 16)/cpu/forward/ComponentArray/(64, 64, 16, 128) |
43960554.5 ns |
43756850 ns |
1.00 |
Conv((3, 3), 16 => 16)/cpu/forward/Flux/(64, 64, 16, 128) |
70769078 ns |
57875341 ns |
1.22 |
Conv((3, 3), 16 => 16)/cpu/forward/SimpleChains/(64, 64, 16, 128) |
28378372 ns |
28245618 ns |
1.00 |
Dense(2000 => 2000)/cpu/reverse/ReverseDiff (compiled)/(2000, 128) |
19111483 ns |
19116803 ns |
1.00 |
Dense(2000 => 2000)/cpu/reverse/Zygote/(2000, 128) |
19626256 ns |
19717359 ns |
1.00 |
Dense(2000 => 2000)/cpu/reverse/Tracker/(2000, 128) |
23688154 ns |
23591237 ns |
1.00 |
Dense(2000 => 2000)/cpu/reverse/ReverseDiff/(2000, 128) |
24392680 ns |
24305289 ns |
1.00 |
Dense(2000 => 2000)/cpu/reverse/Flux/(2000, 128) |
19770793 ns |
19690281 ns |
1.00 |
Dense(2000 => 2000)/cpu/reverse/Enzyme/(2000, 128) |
6541006 ns |
6530046 ns |
1.00 |
Dense(2000 => 2000)/cpu/forward/NamedTuple/(2000, 128) |
6553037 ns |
6543416 ns |
1.00 |
Dense(2000 => 2000)/cpu/forward/ComponentArray/(2000, 128) |
6526035 ns |
6530674.5 ns |
1.00 |
Dense(2000 => 2000)/cpu/forward/Flux/(2000, 128) |
6513413 ns |
6580813 ns |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
b778734
to
7cd27af
Compare
7cd27af
to
5c945b9
Compare
9e7c641
to
784bb8c
Compare
8f74a06
to
c08130d
Compare
c08130d
to
6b30f9e
Compare
4fd4c73
to
0b38e88
Compare
doc: add docs for `match_eltype`
0b38e88
to
15c5281
Compare
97ff4db
to
4c2d87a
Compare
4c2d87a
to
f71719e
Compare
8a2f743
to
a6c6c79
Compare
a6c6c79
to
d1eca68
Compare
Main Changes
TODOs
Type stability. Enable Dispatch Doctor for this part.__match_eltype
functionDispatch Doctor? (maybe here or in some other PR)