Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Pitfalls and How to Catch them #727

Merged
merged 9 commits into from
Jul 2, 2024
Merged

Conversation

avik-pal
Copy link
Member

@avik-pal avik-pal commented Jun 24, 2024

Main Changes

  • List all preferences on a central page
    • Standardizes the naming of the preferences and deprecates the older versions.
  • Shortens the texts in the sidebar

TODOs

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

Attention: Patch coverage is 92.64706% with 10 lines in your changes missing coverage. Please review.

Project coverage is 96.41%. Comparing base (4505c9f) to head (d1eca68).

Files Patch % Lines
src/helpers/match_eltype.jl 70.83% 7 Missing ⚠️
src/helpers/nested_ad.jl 50.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #727      +/-   ##
==========================================
- Coverage   96.48%   96.41%   -0.07%     
==========================================
  Files          54       57       +3     
  Lines        2729     2791      +62     
==========================================
+ Hits         2633     2691      +58     
- Misses         96      100       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@avik-pal avik-pal force-pushed the ap/perf_pitfalls branch 2 times, most recently from b7bc477 to dafc583 Compare June 24, 2024 05:35
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Benchmark suite Current: d1eca68 Previous: 4505c9f Ratio
Dense(2 => 2)/cpu/reverse/ReverseDiff (compiled)/(2, 128) 3674.375 ns 3640.5 ns 1.01
Dense(2 => 2)/cpu/reverse/Zygote/(2, 128) 7185.166666666667 ns 7198.333333333333 ns 1.00
Dense(2 => 2)/cpu/reverse/Tracker/(2, 128) 21771 ns 21049 ns 1.03
Dense(2 => 2)/cpu/reverse/ReverseDiff/(2, 128) 9878.5 ns 9856.2 ns 1.00
Dense(2 => 2)/cpu/reverse/Flux/(2, 128) 9057 ns 9172 ns 0.99
Dense(2 => 2)/cpu/reverse/SimpleChains/(2, 128) 4460.875 ns 4541 ns 0.98
Dense(2 => 2)/cpu/reverse/Enzyme/(2, 128) 1158.8402777777778 ns 1160.112676056338 ns 1.00
Dense(2 => 2)/cpu/forward/NamedTuple/(2, 128) 1120.9571428571428 ns 1169.4358974358975 ns 0.96
Dense(2 => 2)/cpu/forward/ComponentArray/(2, 128) 1170.55 ns 1184.4857142857143 ns 0.99
Dense(2 => 2)/cpu/forward/Flux/(2, 128) 1788.5357142857142 ns 1778.4666666666667 ns 1.01
Dense(2 => 2)/cpu/forward/SimpleChains/(2, 128) 179.50980392156862 ns 178.93653032440056 ns 1.00
Dense(20 => 20)/cpu/reverse/ReverseDiff (compiled)/(20, 128) 17363 ns 17282 ns 1.00
Dense(20 => 20)/cpu/reverse/Zygote/(20, 128) 17012 ns 17022 ns 1.00
Dense(20 => 20)/cpu/reverse/Tracker/(20, 128) 40336 ns 39183 ns 1.03
Dense(20 => 20)/cpu/reverse/ReverseDiff/(20, 128) 29545 ns 29245 ns 1.01
Dense(20 => 20)/cpu/reverse/Flux/(20, 128) 20128 ns 21791 ns 0.92
Dense(20 => 20)/cpu/reverse/SimpleChains/(20, 128) 17373 ns 17312 ns 1.00
Dense(20 => 20)/cpu/reverse/Enzyme/(20, 128) 4333.714285714285 ns 4330.857142857143 ns 1.00
Dense(20 => 20)/cpu/forward/NamedTuple/(20, 128) 3888.5 ns 3845.875 ns 1.01
Dense(20 => 20)/cpu/forward/ComponentArray/(20, 128) 3986.25 ns 3932.25 ns 1.01
Dense(20 => 20)/cpu/forward/Flux/(20, 128) 4992.142857142857 ns 4924.857142857143 ns 1.01
Dense(20 => 20)/cpu/forward/SimpleChains/(20, 128) 1657.1 ns 1653.1 ns 1.00
Conv((3, 3), 3 => 3)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 3, 128) 46633485 ns 43688609.5 ns 1.07
Conv((3, 3), 3 => 3)/cpu/reverse/Zygote/(64, 64, 3, 128) 57691066 ns 57875220 ns 1.00
Conv((3, 3), 3 => 3)/cpu/reverse/Tracker/(64, 64, 3, 128) 111480015 ns 94212606.5 ns 1.18
Conv((3, 3), 3 => 3)/cpu/reverse/ReverseDiff/(64, 64, 3, 128) 102473763 ns 92167603 ns 1.11
Conv((3, 3), 3 => 3)/cpu/reverse/Flux/(64, 64, 3, 128) 105468734 ns 78485746 ns 1.34
Conv((3, 3), 3 => 3)/cpu/reverse/SimpleChains/(64, 64, 3, 128) 12023231.5 ns 11740759.5 ns 1.02
Conv((3, 3), 3 => 3)/cpu/reverse/Enzyme/(64, 64, 3, 128) 8474373 ns 8461836 ns 1.00
Conv((3, 3), 3 => 3)/cpu/forward/NamedTuple/(64, 64, 3, 128) 7017652 ns 7015431 ns 1.00
Conv((3, 3), 3 => 3)/cpu/forward/ComponentArray/(64, 64, 3, 128) 7003051 ns 6997181 ns 1.00
Conv((3, 3), 3 => 3)/cpu/forward/Flux/(64, 64, 3, 128) 18435270 ns 18434990 ns 1.00
Conv((3, 3), 3 => 3)/cpu/forward/SimpleChains/(64, 64, 3, 128) 6398236 ns 6395518 ns 1.00
vgg16/cpu/reverse/Zygote/(32, 32, 3, 16) 756595885 ns 735875069 ns 1.03
vgg16/cpu/reverse/Zygote/(32, 32, 3, 64) 2552773914 ns 2560001225 ns 1.00
vgg16/cpu/reverse/Zygote/(32, 32, 3, 2) 145609200 ns 134397378 ns 1.08
vgg16/cpu/reverse/Tracker/(32, 32, 3, 16) 863419788 ns 978250119 ns 0.88
vgg16/cpu/reverse/Tracker/(32, 32, 3, 64) 3449542523 ns 3570165964 ns 0.97
vgg16/cpu/reverse/Tracker/(32, 32, 3, 2) 223440476.5 ns 240018428.5 ns 0.93
vgg16/cpu/reverse/Flux/(32, 32, 3, 16) 732301383 ns 800431721.5 ns 0.91
vgg16/cpu/reverse/Flux/(32, 32, 3, 64) 3385189902 ns 2845642648 ns 1.19
vgg16/cpu/reverse/Flux/(32, 32, 3, 2) 131036679.5 ns 139114591 ns 0.94
vgg16/cpu/forward/NamedTuple/(32, 32, 3, 16) 174026417 ns 173071950.5 ns 1.01
vgg16/cpu/forward/NamedTuple/(32, 32, 3, 64) 654732707.5 ns 652361858.5 ns 1.00
vgg16/cpu/forward/NamedTuple/(32, 32, 3, 2) 34666742.5 ns 34635607 ns 1.00
vgg16/cpu/forward/ComponentArray/(32, 32, 3, 16) 164981547.5 ns 164826664 ns 1.00
vgg16/cpu/forward/ComponentArray/(32, 32, 3, 64) 645325827 ns 645086099 ns 1.00
vgg16/cpu/forward/ComponentArray/(32, 32, 3, 2) 30335987.5 ns 30506097 ns 0.99
vgg16/cpu/forward/Flux/(32, 32, 3, 16) 228157180 ns 228376597 ns 1.00
vgg16/cpu/forward/Flux/(32, 32, 3, 64) 774764699.5 ns 857070523 ns 0.90
vgg16/cpu/forward/Flux/(32, 32, 3, 2) 37522583 ns 38093395.5 ns 0.99
Conv((3, 3), 64 => 64)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 64, 128) 1249890656.5 ns 1207490420.5 ns 1.04
Conv((3, 3), 64 => 64)/cpu/reverse/Zygote/(64, 64, 64, 128) 1861556584 ns 1874166667 ns 0.99
Conv((3, 3), 64 => 64)/cpu/reverse/Tracker/(64, 64, 64, 128) 2409852401 ns 2498454248 ns 0.96
Conv((3, 3), 64 => 64)/cpu/reverse/ReverseDiff/(64, 64, 64, 128) 2548713964 ns 2542592633 ns 1.00
Conv((3, 3), 64 => 64)/cpu/reverse/Flux/(64, 64, 64, 128) 1967513931 ns 2002502937 ns 0.98
Conv((3, 3), 64 => 64)/cpu/reverse/Enzyme/(64, 64, 64, 128) 362160925 ns 357426065 ns 1.01
Conv((3, 3), 64 => 64)/cpu/forward/NamedTuple/(64, 64, 64, 128) 321814591 ns 319693714 ns 1.01
Conv((3, 3), 64 => 64)/cpu/forward/ComponentArray/(64, 64, 64, 128) 323120623 ns 317954298 ns 1.02
Conv((3, 3), 64 => 64)/cpu/forward/Flux/(64, 64, 64, 128) 411676251 ns 473384184.5 ns 0.87
Conv((3, 3), 1 => 1)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 1, 128) 11749693 ns 11869677 ns 0.99
Conv((3, 3), 1 => 1)/cpu/reverse/Zygote/(64, 64, 1, 128) 17936724 ns 17901011.5 ns 1.00
Conv((3, 3), 1 => 1)/cpu/reverse/Tracker/(64, 64, 1, 128) 19119798.5 ns 19189143 ns 1.00
Conv((3, 3), 1 => 1)/cpu/reverse/ReverseDiff/(64, 64, 1, 128) 23788006.5 ns 23958547 ns 0.99
Conv((3, 3), 1 => 1)/cpu/reverse/Flux/(64, 64, 1, 128) 17920112 ns 17923251 ns 1.00
Conv((3, 3), 1 => 1)/cpu/reverse/SimpleChains/(64, 64, 1, 128) 1171676.5 ns 1165364 ns 1.01
Conv((3, 3), 1 => 1)/cpu/reverse/Enzyme/(64, 64, 1, 128) 2530044 ns 2517607 ns 1.00
Conv((3, 3), 1 => 1)/cpu/forward/NamedTuple/(64, 64, 1, 128) 2059060 ns 2045179 ns 1.01
Conv((3, 3), 1 => 1)/cpu/forward/ComponentArray/(64, 64, 1, 128) 2035872.5 ns 2030912 ns 1.00
Conv((3, 3), 1 => 1)/cpu/forward/Flux/(64, 64, 1, 128) 2087108.5 ns 2067039 ns 1.01
Conv((3, 3), 1 => 1)/cpu/forward/SimpleChains/(64, 64, 1, 128) 204333 ns 200071 ns 1.02
Dense(200 => 200)/cpu/reverse/ReverseDiff (compiled)/(200, 128) 294572 ns 293791 ns 1.00
Dense(200 => 200)/cpu/reverse/Zygote/(200, 128) 268152.5 ns 269561 ns 0.99
Dense(200 => 200)/cpu/reverse/Tracker/(200, 128) 372418 ns 371871 ns 1.00
Dense(200 => 200)/cpu/reverse/ReverseDiff/(200, 128) 412313 ns 412566 ns 1.00
Dense(200 => 200)/cpu/reverse/Flux/(200, 128) 276358 ns 276704 ns 1.00
Dense(200 => 200)/cpu/reverse/SimpleChains/(200, 128) 416951 ns 410473 ns 1.02
Dense(200 => 200)/cpu/reverse/Enzyme/(200, 128) 83687 ns 83495 ns 1.00
Dense(200 => 200)/cpu/forward/NamedTuple/(200, 128) 82054 ns 82302 ns 1.00
Dense(200 => 200)/cpu/forward/ComponentArray/(200, 128) 83116 ns 85228 ns 0.98
Dense(200 => 200)/cpu/forward/Flux/(200, 128) 87404 ns 87413 ns 1.00
Dense(200 => 200)/cpu/forward/SimpleChains/(200, 128) 104987 ns 104644 ns 1.00
Conv((3, 3), 16 => 16)/cpu/reverse/ReverseDiff (compiled)/(64, 64, 16, 128) 195304591 ns 199077378 ns 0.98
Conv((3, 3), 16 => 16)/cpu/reverse/Zygote/(64, 64, 16, 128) 327096306 ns 328274228.5 ns 1.00
Conv((3, 3), 16 => 16)/cpu/reverse/Tracker/(64, 64, 16, 128) 436109596.5 ns 449850574.5 ns 0.97
Conv((3, 3), 16 => 16)/cpu/reverse/ReverseDiff/(64, 64, 16, 128) 484444356 ns 481685356 ns 1.01
Conv((3, 3), 16 => 16)/cpu/reverse/Flux/(64, 64, 16, 128) 409134479.5 ns 416250117.5 ns 0.98
Conv((3, 3), 16 => 16)/cpu/reverse/SimpleChains/(64, 64, 16, 128) 340551010 ns 324981397.5 ns 1.05
Conv((3, 3), 16 => 16)/cpu/reverse/Enzyme/(64, 64, 16, 128) 51609023 ns 51576403 ns 1.00
Conv((3, 3), 16 => 16)/cpu/forward/NamedTuple/(64, 64, 16, 128) 44183475 ns 43917353 ns 1.01
Conv((3, 3), 16 => 16)/cpu/forward/ComponentArray/(64, 64, 16, 128) 43960554.5 ns 43756850 ns 1.00
Conv((3, 3), 16 => 16)/cpu/forward/Flux/(64, 64, 16, 128) 70769078 ns 57875341 ns 1.22
Conv((3, 3), 16 => 16)/cpu/forward/SimpleChains/(64, 64, 16, 128) 28378372 ns 28245618 ns 1.00
Dense(2000 => 2000)/cpu/reverse/ReverseDiff (compiled)/(2000, 128) 19111483 ns 19116803 ns 1.00
Dense(2000 => 2000)/cpu/reverse/Zygote/(2000, 128) 19626256 ns 19717359 ns 1.00
Dense(2000 => 2000)/cpu/reverse/Tracker/(2000, 128) 23688154 ns 23591237 ns 1.00
Dense(2000 => 2000)/cpu/reverse/ReverseDiff/(2000, 128) 24392680 ns 24305289 ns 1.00
Dense(2000 => 2000)/cpu/reverse/Flux/(2000, 128) 19770793 ns 19690281 ns 1.00
Dense(2000 => 2000)/cpu/reverse/Enzyme/(2000, 128) 6541006 ns 6530046 ns 1.00
Dense(2000 => 2000)/cpu/forward/NamedTuple/(2000, 128) 6553037 ns 6543416 ns 1.00
Dense(2000 => 2000)/cpu/forward/ComponentArray/(2000, 128) 6526035 ns 6530674.5 ns 1.00
Dense(2000 => 2000)/cpu/forward/Flux/(2000, 128) 6513413 ns 6580813 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

@avik-pal avik-pal force-pushed the ap/perf_pitfalls branch 9 times, most recently from b778734 to 7cd27af Compare June 26, 2024 15:38
@avik-pal avik-pal mentioned this pull request Jun 26, 2024
34 tasks
@avik-pal avik-pal added this to the v0.6 milestone Jun 28, 2024
@avik-pal avik-pal force-pushed the ap/perf_pitfalls branch 8 times, most recently from 9e7c641 to 784bb8c Compare June 28, 2024 16:00
@avik-pal avik-pal force-pushed the ap/perf_pitfalls branch 3 times, most recently from 8f74a06 to c08130d Compare July 1, 2024 01:15
src/helpers/match_eltype.jl Outdated Show resolved Hide resolved
src/helpers/match_eltype.jl Outdated Show resolved Hide resolved
src/helpers/match_eltype.jl Outdated Show resolved Hide resolved
doc: add docs for `match_eltype`
@avik-pal avik-pal force-pushed the ap/perf_pitfalls branch 6 times, most recently from 8a2f743 to a6c6c79 Compare July 2, 2024 04:26
@avik-pal avik-pal changed the title [WIP] Performance Pitfalls and How to Catch them Performance Pitfalls and How to Catch them Jul 2, 2024
@avik-pal avik-pal merged commit 7720832 into main Jul 2, 2024
61 of 66 checks passed
@avik-pal avik-pal deleted the ap/perf_pitfalls branch July 2, 2024 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Auto detect and warn against performance pitfalls
1 participant