perf: optimize hot paths and reduce overhead for low-end devices #368

negezor · 2024-07-12T18:29:46Z

While profiling my website, I often see numerous unhead calls on hot paths on low-performance devices. Therefore, this PR aims to reduce the overall impact on performance. This PR includes:

Compare values directly instead of allocating an array and then iterating over it https://jsperf.app/megoxe
Use Set instead of an array since set.has() runs in constant time, unlike array.includes() https://jsperf.app/pikuzu/3
Instead of using string.startsWith() for the first character, it's better to use index access as it is faster https://jsbench.me/welq15149d/1
Instead of string.split()[0], it's better to use string.indexOf() + string.substring() https://jsperf.app/zovuke
Avoiding promises can greatly improve performance, as they are performance killers. I added a simple helper that allows you to not lose much in readability, but significantly improve performance. https://jsperf.app/gacuwo/3
To the previous point, we also remove the async modifier from the function where a Promise is already returned without the need for await

Benchmark Result

There are a few other areas where performance can be improved, namely:

~~Avoiding Promise as they hurt performance~~ Implemented via the thenable helper. Promise is now seen as an edge case
~~Reducing the number of Object.entries()~~ Done. If we want performance, we'll have to say goodbye to Object.entries, Object.values and Object.keys
~~Possibly combining two sorts in the sort plugin.~~ Done

Since this library is a building block for other applications, I believe it should have the maximum possible performance.

Performance was tested on the following hardware:
CPU: AMD Ryzen 7950x3D
System: WSL 2 Arch Linux on Windows 11
Node.js: v22.5.0

Before

✓ dom-useHead (1) 11339ms
  name          hz      min      max     mean      p75      p99     p995     p999     rme  samples
· x50        9.3537  99.2468  144.97   106.91   108.30   138.09   144.97   144.97  ±1.13%      100
✓ ssr bench (1) 11341ms
  name          hz      min      max     mean      p75      p99     p995     p999     rme  samples
· x50 ssr   89.0412  9.8624  18.1303  11.2308  11.4576  15.0532  15.3867  17.0052  ±0.47%     1000

After

✓ dom-useHead (1) 9408ms
  name          hz       min      max     mean      p75     p99    p995     p999      rme  samples
· x50       11.2612  80.9079   113.33  88.8004  89.7723  112.17  113.33   113.33   ±1.11%      100
✓ ssr bench (1) 7332ms
  name           hz      min      max     mean      p75     p99    p995     p999     rme   samples
· x50 ssr    138.38   6.4453  10.3630   7.2264   7.3634  9.5104  9.6495  10.1183  ±0.42%      1000

After using the forked hookable with these changes unjs/hookable#102

✓ dom-useHead (1) 9157ms
  name           hz      min      max     mean      p75     p99    p995    p999      rme   samples
· x50       11.5724  79.0402   118.18  86.4127  87.9506  106.06  118.18  118.18   ±1.16%       100
✓ ssr bench (1) 5274ms
  name           hz      min      max     mean      p75     p99    p995    p999      rme   samples
· x50 ssr    193.42   4.6108   8.3951   5.1700   5.3634  5.9843  6.4341  8.1951   ±0.41%      1000

harlan-zw · 2024-07-13T13:28:18Z

Woah this is really impressive work, thank you so much. I'll try get some benchmarks tomorrow but all looks good so far.

Avoiding Promise as they hurt performance

I think this can probably be dropped from the API in v2, I don't think they're really used and the logic is confusing. (that is users providing promises to useHead as input)

Since this library is a building block for other applications, I believe it should have the maximum possible performance.

Agreed!

…nderDOMHead

…n of filter

negezor · 2024-07-14T08:31:20Z

Separately, I would like to share a small victory in processTemplateParams. This iteration approached character-by-character parsing
Before

 ✓ processTemplateParams (5) 5306ms
   name                        hz     min     max    mean     p75     p99    p995    p999     rme  samples
 · basic               124,256.92  0.0068  1.2928  0.0080  0.0079  0.0188  0.0203  0.0467  ±0.71%    62129   slowest
 · nested props        223,829.82  0.0036  0.6271  0.0045  0.0042  0.0144  0.0273  0.0359  ±0.51%   111915
 · not found props     321,787.56  0.0026  0.2824  0.0031  0.0030  0.0056  0.0139  0.0278  ±0.41%   160894
 · with url          4,358,487.01  0.0002  2.3345  0.0002  0.0002  0.0003  0.0004  0.0011  ±1.03%  2179244
 · simple string    15,700,280.18  0.0000  0.4865  0.0001  0.0001  0.0001  0.0001  0.0004  ±0.71%  7850141   fastest

After

 ✓ processTemplateParams (5) 5525ms
   name                        hz     min     max    mean     p75     p99    p995    p999     rme  samples
 · basic               318,078.28  0.0027  1.0443  0.0031  0.0031  0.0053  0.0066  0.0186  ±0.53%   159040
 · nested props        330,560.01  0.0026  0.3683  0.0030  0.0030  0.0038  0.0048  0.0154  ±0.27%   165281
 · not found props     316,737.26  0.0026  0.3337  0.0032  0.0031  0.0056  0.0060  0.0157  ±0.34%   158369   slowest
 · with url          5,279,266.29  0.0002  1.9639  0.0002  0.0002  0.0002  0.0002  0.0006  ±1.06%  2639634
 · simple string    18,801,730.95  0.0000  0.5143  0.0001  0.0001  0.0001  0.0001  0.0001  ±0.40%  9400866   fastest

harlan-zw · 2024-07-14T12:00:38Z

Just let me know when you're finished making the changes you'd like, I'll work on getting it merged and benchmarked further.

negezor · 2024-07-14T12:11:25Z

@harlan-zw I have made all the changes I wanted. You can merge now 😊

harlan-zw · 2024-07-15T08:02:14Z

You've done a great job on this PR, thanks again.

I've tried to do some benchmarking on these changes but the performance actually seems very slightly worst off, maybe you have a better idea on how to accurately bench this e2e though.

Before

 ✓ test/vue/dom-useHead.bench.ts (1) 13515ms
   ✓ dom-useHead (1) 13514ms
     name      hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · x50   7.9066  112.44  177.98  126.48  129.53  156.32  177.98  177.98  ±1.29%      100
 ✓ test/unhead/ssr/ssr-perf.bench.ts (1) 13381ms
   ✓ ssr bench (1) 13380ms
     name          hz      min      max     mean      p75      p99     p995     p999     rme  samples
   · x50 ssr  75.4054  12.1169  25.1118  13.2616  13.2981  17.7024  18.7785  22.4677  ±0.47%     1000

After

 ✓ test/vue/dom-useHead.bench.ts (1) 13672ms
   ✓ dom-useHead (1) 13671ms
     name      hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · x50   7.7883  113.84  199.53  128.40  130.55  194.32  199.53  199.53  ±2.05%      100
 ✓ test/unhead/ssr/ssr-perf.bench.ts (1) 13599ms
   ✓ ssr bench (1) 13598ms
     name          hz      min      max     mean      p75      p99     p995     p999     rme  samples
   · x50 ssr  74.2828  12.2232  22.5889  13.4621  13.5607  19.1117  20.1525  22.3832  ±0.55%     1000

I think we can still probably merge this since we've slightly reduced the bundle size but wanted to give you the opportunity to provide e2e benchmarks as it would be great if we could advertise a % improvement.

negezor · 2024-07-15T08:43:08Z

I think it greatly depends on the specifics of the project. Typically we have more than one useHead() in an application, and there is also an initial head.push() that is contained for the application. Not to mention, we often want schema-org to be included as well. Forgot to mention that I tested with these changes unjs/hookable#102 😅

During testing I used slowdown 6x in dev tools. For me, two factors were important in optimization:

Initial script execution time
Time required to process changes

A little later I will check what is slowing down in this benchmark.

…derDOMHead

…and innerHTML

… plugin

negezor · 2024-07-18T08:37:44Z

@harlan-zw I've finished improving other areas where performance was dropping. I updated the results in the main description of the PR. Could you please review it again?

The only thing that bothers me in the dom-useHead.bench.ts benchmark is that all the focus is on the jsdom library.

flamegraph

It turns out that if we really want more performance, we need to move away from using Object.entries(), Object.values(), and Object.keys() in favor of for in with Object.prototype.hasOwnProperty. This slightly worsens code readability, but it's not critical considering it gives a 2x, sometimes 3x performance boost.

Another thing consuming 1/4 of the performance is mini-loops that just checked a couple of properties to avoid code duplication. I removed them where direct checks are possible, but this doesn't add much duplication.

The overall package size has indeed increased, but executing JS is much more expensive than parsing it. We could reduce the package size by dropping support for Promise.

harlan-zw · 2024-07-20T02:30:40Z

Hi @negezor, thanks again for your hard work on this I really appreciate it.

I had some time to sit and make a proper SSR benchmark based on my site and all of the plugins being used. Running it on this PR it looks like you've improved the SSR by 20% 🎉 🚀!

The package size difference is negligible so all good there.

-- current --
 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 3211ms
   ✓ ssr e2e bench (1) 3209ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,615.77  0.5511  2.1222  0.6189  0.6177  1.0534  1.2626  1.6639  ±0.50%     5000
 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 3243ms
   ✓ ssr e2e bench (1) 3242ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,598.03  0.5499  1.8620  0.6258  0.6190  1.0233  1.1068  1.3735  ±0.46%     5000
 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 3213ms
   ✓ ssr e2e bench (1) 3211ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,613.98  0.5546  2.0391  0.6196  0.6200  0.9634  1.0067  1.3202  ±0.39%     5000
 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 3211ms
   ✓ ssr e2e bench (1) 3210ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,614.06  0.5492  2.0267  0.6196  0.6255  1.0152  1.0993  1.5132  ±0.45%     5000
 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 3220ms
   ✓ ssr e2e bench (1) 3219ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,610.08  0.5545  1.9893  0.6211  0.6159  1.0911  1.2073  1.4338  ±0.48%     5000


-- this PR  -- 

✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 2678ms
   ✓ ssr e2e bench (1) 2676ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,949.40  0.4521  1.7332  0.5130  0.5086  0.8895  0.9698  1.2820  ±0.50%     5000


 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 2650ms
   ✓ ssr e2e bench (1) 2648ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,972.05  0.4497  1.8316  0.5071  0.5046  0.8682  0.9137  1.3332  ±0.46%     5000

 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 2615ms
   ✓ ssr e2e bench (1) 2613ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,998.93  0.4499  1.6762  0.5003  0.5048  0.8365  0.8900  1.0098  ±0.42%     5000

 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 2665ms
   ✓ ssr e2e bench (1) 2664ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,958.39  0.4502  2.1185  0.5106  0.5057  0.9516  1.0020  1.2473  ±0.55%     5000

 ✓ test/bench/ssr-harlanzw-com-e2e.bench.ts (1) 2678ms
   ✓ ssr e2e bench (1) 2677ms
     name        hz     min     max    mean     p75     p99    p995    p999     rme  samples
   · e2e   1,950.99  0.4508  1.7268  0.5126  0.5148  0.8579  0.8979  1.0237  ±0.40%     5000

As all tests are passing I'm going to merge this, I'll hold off on the release until I can do a little bit more testing, ideally, we release this as a patch but it might (?) be a little risky.

Future Performance Opportunities

I'd be very open to making breaking changes that will simplify the API, reduce internal code and improve the speed. These can be introduced in a v2 beta.

You identified removing promises would help with this, would you be interested in working on this in a separate PR? The only constraint is that the DOM rendering should still be delayed via a promise (to avoid thrashing DOM updates with incremental useHead calls).

Would be great to hear any other ideas you have also.

Btw you should setup github sponsors if you can 😛, would be great to give you a one-off amount.

negezor · 2024-07-20T11:06:41Z

Running it on this PR it looks like you've improved the SSR by 20% 🎉 🚀!

That's cool. I've also been testing with my SSR, there the numbers range from 10% for the simplest pages and up to 25% on complex ones with multiple nested routers.

ideally, we release this as a patch but it might (?) be a little risky.

In case of any regressions, a patch can be released 😅

You identified removing promises would help with this, would you be interested in working on this in a separate PR?

Yeah sure, now it'll just come down to deleting branches for Promise.

Would be great to hear any other ideas you have also.

The module is great as it is, covering all my needs and covered by almost every possible test. I don't even have an idea of what to suggest 😅

Btw you should setup github sponsors if you can 😛, would be great to give you a one-off amount.

I'm not planning to setup Github Sponsors at the moment, but thank you for the suggestion 😊

negezor · 2024-07-22T21:44:41Z

I deployed the fork with these changes to production and got this result:

Current

This PR

harlan-zw · 2024-07-23T02:24:28Z

Wow very nice! 😍 🚀 I will try get the release out next couple of days, sorry for the delay!

negezor added 4 commits July 13, 2024 01:51

perf: compare values directly instead of includes

4e2f278

perf: compare first character via access index instead of startsWith

a8eddd5

perf: use Set + has instead of an array with includes

955b33e

perf(shared): implement split once for fixKeyCase and resolveMetaKeyType

d241e9c

negezor changed the title ~~Perf~~ perf: optimize hot paths and reduce overhead for low-end devices Jul 12, 2024

negezor added 2 commits July 13, 2024 23:55

feat(shared): introduce thenable helper

2552521

perf(shared): use thenable in normalise for reduce async/await functions

5f7418d

negezor added 12 commits July 14, 2024 01:25

perf(dom): use promise chain instead of async function in debouncedRe…

f845e8d

…nderDOMHead

chore(dom): remove async modifier for debouncedRenderDOMHead

0b84990

perf(shared): remove unnecessary array spread in tagDedupeKey

db93926

refactor: undefined is allowed for spread object

933380a

perf(schema-org): avoid new array allocation in dedupeMerge

159ea79

perf(dom): check empty class or style in foreach instead of allocatio…

e11e9c2

…n of filter

perf(dom): implement split once for style in trackCtx

54b9efa

perf(dom): convert tag name to lowercase once in renderDOMHead

d753005

refactor: compare with undefined without typeof in safe places

0ee43de

test(shared): add benchmark for processTemplateParams

598d8e6

perf(shared): move sub to top module in templateParams

a8a3f5a

perf(shared): avoid unnecessary operations in processTemplateParams

2846564

negezor added 2 commits July 14, 2024 20:00

perf(shared): use single replacer in processTemplateParams

a683e20

refactor(dom): move condition to else in renderDOMHead

1c03cce

Merge branch 'main' into perf

3b89662

harlan-zw self-requested a review July 14, 2024 14:18

harlan-zw approved these changes Jul 14, 2024

View reviewed changes

negezor added 24 commits July 18, 2024 04:35

perf(unhead): first check hasProps in dedupe plugin

667e15e

perf(unhead): handle classes & styles without loop in dedupe plugin

e74bbc5

perf(unhead): use object with null prototype in dedupe plugin

0476089

perf(unhead): use for of loop instead of map for patch entry

d1ccd89

perf(dom): use for in loop for props instead of Object.entries in ren…

e29952f

…derDOMHead

perf(dom): use for in loop for handle _eventHandlers in renderDOMHead

9214fe7

perf(dom): clear side effects in for in loop in renderDOMHead

32851ff

perf(dom): replace loop with a direct property check for textContent …

4193b98

…and innerHTML

perf(dom): remove cast to array HTMLCollection in renderDOMHead

f391c98

perf(unhead): speed up hashTag using for in loop & early returns

fd0c261

perf(vue): reduce overhead from resolveUnrefHeadInput

ec0f70f

perf(dom): use set to store taken dedupe keys

4dbbd59

perf(ssr): use string concatenation in propsToString

c2aa41b

perf(ssr): add space before attrs in propsToString

3c99210

perf(ssr): use string concatenation in ssrRenderTags

9e4e271

perf(ssr): use object.assign instead of spread operator

0967e7a

perf(shared): use for of instead of array map

c476861

perf(shared): check the static string first in fixKeyCase

1548f16

perf(shared): use for in & for of loops in meta

168c79a

perf(schema-org): use for in loop in resolveNodeId

bd3038f

perf(schema-org): use for in loop in stripEmptyProperties

c6e7c35

perf(unhead): delete tag._duped if exists in dedupe plugin

36b1dc9

perf(unhead): remove loop for check third party dedupe keys in dedupe…

a6da0f9

… plugin

perf(shared): reduce operations in eventHandler plugin

4f530d8

harlan-zw merged commit 9325dfc into unjs:main Jul 20, 2024
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize hot paths and reduce overhead for low-end devices #368

perf: optimize hot paths and reduce overhead for low-end devices #368

negezor commented Jul 12, 2024 •

edited

Loading

harlan-zw commented Jul 13, 2024 •

edited

Loading

negezor commented Jul 14, 2024 •

edited

Loading

harlan-zw commented Jul 14, 2024

negezor commented Jul 14, 2024

harlan-zw commented Jul 15, 2024 •

edited

Loading

negezor commented Jul 15, 2024

negezor commented Jul 18, 2024

harlan-zw commented Jul 20, 2024 •

edited

Loading

negezor commented Jul 20, 2024

negezor commented Jul 22, 2024 •

edited

Loading

harlan-zw commented Jul 23, 2024 •

edited

Loading

perf: optimize hot paths and reduce overhead for low-end devices #368

perf: optimize hot paths and reduce overhead for low-end devices #368

Conversation

negezor commented Jul 12, 2024 • edited Loading

harlan-zw commented Jul 13, 2024 • edited Loading

negezor commented Jul 14, 2024 • edited Loading

harlan-zw commented Jul 14, 2024

negezor commented Jul 14, 2024

harlan-zw commented Jul 15, 2024 • edited Loading

negezor commented Jul 15, 2024

negezor commented Jul 18, 2024

harlan-zw commented Jul 20, 2024 • edited Loading

negezor commented Jul 20, 2024

negezor commented Jul 22, 2024 • edited Loading

harlan-zw commented Jul 23, 2024 • edited Loading

negezor commented Jul 12, 2024 •

edited

Loading

harlan-zw commented Jul 13, 2024 •

edited

Loading

negezor commented Jul 14, 2024 •

edited

Loading

harlan-zw commented Jul 15, 2024 •

edited

Loading

harlan-zw commented Jul 20, 2024 •

edited

Loading

negezor commented Jul 22, 2024 •

edited

Loading

harlan-zw commented Jul 23, 2024 •

edited

Loading