Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize hot paths and reduce overhead for low-end devices #368

Merged
merged 80 commits into from
Jul 20, 2024

Conversation

negezor
Copy link
Contributor

@negezor negezor commented Jul 12, 2024

While profiling my website, I often see numerous unhead calls on hot paths on low-performance devices. Therefore, this PR aims to reduce the overall impact on performance. This PR includes:

  • Compare values directly instead of allocating an array and then iterating over it https://jsperf.app/megoxe
  • Use Set instead of an array since set.has() runs in constant time, unlike array.includes() https://jsperf.app/pikuzu/3
  • Instead of using string.startsWith() for the first character, it's better to use index access as it is faster https://jsbench.me/welq15149d/1
  • Instead of string.split()[0], it's better to use string.indexOf() + string.substring() https://jsperf.app/zovuke
  • Avoiding promises can greatly improve performance, as they are performance killers. I added a simple helper that allows you to not lose much in readability, but significantly improve performance. https://jsperf.app/gacuwo/3
  • To the previous point, we also remove the async modifier from the function where a Promise is already returned without the need for await
Benchmark Result

There are a few other areas where performance can be improved, namely:

  • Avoiding Promise as they hurt performance Implemented via the thenable helper. Promise is now seen as an edge case
  • Reducing the number of Object.entries() Done. If we want performance, we'll have to say goodbye to Object.entries, Object.values ​​and Object.keys
  • Possibly combining two sorts in the sort plugin. Done

Since this library is a building block for other applications, I believe it should have the maximum possible performance.

Performance was tested on the following hardware:
CPU: AMD Ryzen 7950x3D
System: WSL 2 Arch Linux on Windows 11
Node.js: v22.5.0

Before

✓ dom-useHead (1) 11339ms
  name          hz      min      max     mean      p75      p99     p995     p999     rme  samples
· x50        9.3537  99.2468  144.97   106.91   108.30   138.09   144.97   144.97  ±1.13%      100
✓ ssr bench (1) 11341ms
  name          hz      min      max     mean      p75      p99     p995     p999     rme  samples
· x50 ssr   89.0412  9.8624  18.1303  11.2308  11.4576  15.0532  15.3867  17.0052  ±0.47%     1000

After

✓ dom-useHead (1) 9408ms
  name          hz       min      max     mean      p75     p99    p995     p999      rme  samples
· x50       11.2612  80.9079   113.33  88.8004  89.7723  112.17  113.33   113.33   ±1.11%      100
✓ ssr bench (1) 7332ms
  name           hz      min      max     mean      p75     p99    p995     p999     rme   samples
· x50 ssr    138.38   6.4453  10.3630   7.2264   7.3634  9.5104  9.6495  10.1183  ±0.42%      1000

After using the forked hookable with these changes unjs/hookable#102

✓ dom-useHead (1) 9157ms
  name           hz      min      max     mean      p75     p99    p995    p999      rme   samples
· x50       11.5724  79.0402   118.18  86.4127  87.9506  106.06  118.18  118.18   ±1.16%       100
✓ ssr bench (1) 5274ms
  name           hz      min      max     mean      p75     p99    p995    p999      rme   samples
· x50 ssr    193.42   4.6108   8.3951   5.1700   5.3634  5.9843  6.4341  8.1951   ±0.41%      1000

@negezor negezor changed the title Perf perf: optimize hot paths and reduce overhead for low-end devices Jul 12, 2024
@harlan-zw
Copy link
Collaborator

harlan-zw commented Jul 13, 2024

Woah this is really impressive work, thank you so much. I'll try get some benchmarks tomorrow but all looks good so far.

Avoiding Promise as they hurt performance

I think this can probably be dropped from the API in v2, I don't think they're really used and the logic is confusing. (that is users providing promises to useHead as input)

Since this library is a building block for other applications, I believe it should have the maximum possible performance.

Agreed!

@negezor
Copy link
Contributor Author

negezor commented Jul 14, 2024

Separately, I would like to share a small victory in processTemplateParams. This iteration approached character-by-character parsing
Before

 ✓ processTemplateParams (5) 5306ms
   name                        hz     min     max    mean     p75     p99    p995    p999     rme  samples
 · basic               124,256.92  0.0068  1.2928  0.0080  0.0079  0.0188  0.0203  0.0467  ±0.71%    62129   slowest
 · nested props        223,829.82  0.0036  0.6271  0.0045  0.0042  0.0144  0.0273  0.0359  ±0.51%   111915
 · not found props     321,787.56  0.0026  0.2824  0.0031  0.0030  0.0056  0.0139  0.0278  ±0.41%   160894
 · with url          4,358,487.01  0.0002  2.3345  0.0002  0.0002  0.0003  0.0004  0.0011  ±1.03%  2179244
 · simple string    15,700,280.18  0.0000  0.4865  0.0001  0.0001  0.0001  0.0001  0.0004  ±0.71%  7850141   fastest

After

 ✓ processTemplateParams (5) 5525ms
   name                        hz     min     max    mean     p75     p99    p995    p999     rme  samples
 · basic               318,078.28  0.0027  1.0443  0.0031  0.0031  0.0053  0.0066  0.0186  ±0.53%   159040
 · nested props        330,560.01  0.0026  0.3683  0.0030  0.0030  0.0038  0.0048  0.0154  ±0.27%   165281
 · not found props     316,737.26  0.0026  0.3337  0.0032  0.0031  0.0056  0.0060  0.0157  ±0.34%   158369   slowest
 · with url          5,279,266.29  0.0002  1.9639  0.0002  0.0002  0.0002  0.0002  0.0006  ±1.06%  2639634
 · simple string    18,801,730.95  0.0000  0.5143  0.0001  0.0001  0.0001  0.0001  0.0001  ±0.40%  9400866   fastest

@harlan-zw
Copy link
Collaborator

Just let me know when you're finished making the changes you'd like, I'll work on getting it merged and benchmarked further.

@negezor
Copy link
Contributor Author

negezor commented Jul 14, 2024

@harlan-zw I have made all the changes I wanted. You can merge now 😊

@harlan-zw harlan-zw self-requested a review July 14, 2024 14:18
@harlan-zw
Copy link
Collaborator

harlan-zw commented Jul 15, 2024

You've done a great job on this PR, thanks again.

I've tried to do some benchmarking on these changes but the performance actually seems very slightly worst off, maybe you have a better idea on how to accurately bench this e2e though.

Before

 ✓ test/vue/dom-useHead.bench.ts (1) 13515ms
   ✓ dom-useHead (1) 13514ms
     name      hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · x50   7.9066  112.44  177.98  126.48  129.53  156.32  177.98  177.98  ±1.29%      100
 ✓ test/unhead/ssr/ssr-perf.bench.ts (1) 13381ms
   ✓ ssr bench (1) 13380ms
     name          hz      min      max     mean      p75      p99     p995     p999     rme  samples
   · x50 ssr  75.4054  12.1169  25.1118  13.2616  13.2981  17.7024  18.7785  22.4677  ±0.47%     1000

After

 ✓ test/vue/dom-useHead.bench.ts (1) 13672ms
   ✓ dom-useHead (1) 13671ms
     name      hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · x50   7.7883  113.84  199.53  128.40  130.55  194.32  199.53  199.53  ±2.05%      100
 ✓ test/unhead/ssr/ssr-perf.bench.ts (1) 13599ms
   ✓ ssr bench (1) 13598ms
     name          hz      min      max     mean      p75      p99     p995     p999     rme  samples
   · x50 ssr  74.2828  12.2232  22.5889  13.4621  13.5607  19.1117  20.1525  22.3832  ±0.55%     1000

I think we can still probably merge this since we've slightly reduced the bundle size but wanted to give you the opportunity to provide e2e benchmarks as it would be great if we could advertise a % improvement.

@negezor
Copy link
Contributor Author

negezor commented Jul 15, 2024

I think it greatly depends on the specifics of the project. Typically we have more than one useHead() in an application, and there is also an initial head.push() that is contained for the application. Not to mention, we often want schema-org to be included as well. Forgot to mention that I tested with these changes unjs/hookable#102 😅

During testing I used slowdown 6x in dev tools. For me, two factors were important in optimization:

  • Initial script execution time
  • Time required to process changes

A little later I will check what is slowing down in this benchmark.

@negezor
Copy link
Contributor Author

negezor commented Jul 18, 2024

@harlan-zw I've finished improving other areas where performance was dropping. I updated the results in the main description of the PR. Could you please review it again?

The only thing that bothers me in the dom-useHead.bench.ts benchmark is that all the focus is on the jsdom library.

flamegraph

It turns out that if we really want more performance, we need to move away from using Object.entries(), Object.values(), and Object.keys() in favor of for in with Object.prototype.hasOwnProperty. This slightly worsens code readability, but it's not critical considering it gives a 2x, sometimes 3x performance boost.

Another thing consuming 1/4 of the performance is mini-loops that just checked a couple of properties to avoid code duplication. I removed them where direct checks are possible, but this doesn't add much duplication.

The overall package size has indeed increased, but executing JS is much more expensive than parsing it. We could reduce the package size by dropping support for Promise.

@harlan-zw
Copy link
Collaborator

harlan-zw commented Jul 20, 2024

Hi @negezor, thanks again for your hard work on this I really appreciate it.

I had some time to sit and make a proper SSR benchmark based on my site and all of the plugins being used. Running it on this PR it looks like you've improved the SSR by 20% 🎉 🚀!

The package size difference is negligible so all good there.

-- current --
 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 3211ms
   ✓ ssr e2e bench (1) 3209ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,615.77  0.5511  2.1222  0.6189  0.6177  1.0534  1.2626  1.6639  ±0.50%     5000
 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 3243ms
   ✓ ssr e2e bench (1) 3242ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,598.03  0.5499  1.8620  0.6258  0.6190  1.0233  1.1068  1.3735  ±0.46%     5000
 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 3213ms
   ✓ ssr e2e bench (1) 3211ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,613.98  0.5546  2.0391  0.6196  0.6200  0.9634  1.0067  1.3202  ±0.39%     5000
 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 3211ms
   ✓ ssr e2e bench (1) 3210ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,614.06  0.5492  2.0267  0.6196  0.6255  1.0152  1.0993  1.5132  ±0.45%     5000
 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 3220ms
   ✓ ssr e2e bench (1) 3219ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,610.08  0.5545  1.9893  0.6211  0.6159  1.0911  1.2073  1.4338  ±0.48%     5000


-- this PR  -- 

✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 2678ms
   ✓ ssr e2e bench (1) 2676ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,949.40  0.4521  1.7332  0.5130  0.5086  0.8895  0.9698  1.2820  ±0.50%     5000


 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 2650ms
   ✓ ssr e2e bench (1) 2648ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,972.05  0.4497  1.8316  0.5071  0.5046  0.8682  0.9137  1.3332  ±0.46%     5000

 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 2615ms
   ✓ ssr e2e bench (1) 2613ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,998.93  0.4499  1.6762  0.5003  0.5048  0.8365  0.8900  1.0098  ±0.42%     5000

 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 2665ms
   ✓ ssr e2e bench (1) 2664ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,958.39  0.4502  2.1185  0.5106  0.5057  0.9516  1.0020  1.2473  ±0.55%     5000

 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 2678ms
   ✓ ssr e2e bench (1) 2677ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,950.99  0.4508  1.7268  0.5126  0.5148  0.8579  0.8979  1.0237  ±0.40%     5000

As all tests are passing I'm going to merge this, I'll hold off on the release until I can do a little bit more testing, ideally, we release this as a patch but it might (?) be a little risky.

Future Performance Opportunities

I'd be very open to making breaking changes that will simplify the API, reduce internal code and improve the speed. These can be introduced in a v2 beta.

You identified removing promises would help with this, would you be interested in working on this in a separate PR? The only constraint is that the DOM rendering should still be delayed via a promise (to avoid thrashing DOM updates with incremental useHead calls).

Would be great to hear any other ideas you have also.

Btw you should setup github sponsors if you can 😛, would be great to give you a one-off amount.

@harlan-zw harlan-zw merged commit 9325dfc into unjs:main Jul 20, 2024
1 of 2 checks passed
@negezor
Copy link
Contributor Author

negezor commented Jul 20, 2024

Running it on this PR it looks like you've improved the SSR by 20% 🎉 🚀!

That's cool. I've also been testing with my SSR, there the numbers range from 10% for the simplest pages and up to 25% on complex ones with multiple nested routers.

ideally, we release this as a patch but it might (?) be a little risky.

In case of any regressions, a patch can be released 😅

You identified removing promises would help with this, would you be interested in working on this in a separate PR?

Yeah sure, now it'll just come down to deleting branches for Promise.

Would be great to hear any other ideas you have also.

The module is great as it is, covering all my needs and covered by almost every possible test. I don't even have an idea of what to suggest 😅

Btw you should setup github sponsors if you can 😛, would be great to give you a one-off amount.

I'm not planning to setup Github Sponsors at the moment, but thank you for the suggestion 😊

@negezor
Copy link
Contributor Author

negezor commented Jul 22, 2024

I deployed the fork with these changes to production and got this result:
image
Current
image
This PR
image

@harlan-zw
Copy link
Collaborator

harlan-zw commented Jul 23, 2024

Wow very nice! 😍 🚀 I will try get the release out next couple of days, sorry for the delay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants