Use of reduce vs entries #846

kenttregenza · 2024-04-24T03:57:39Z

kenttregenza
Apr 24, 2024

We're using Typebox for our new project along side of tRPC. Our schemas are numerous and complex with "schemas" sometimes haveing 40-50 properties in "layers" (TRecords etc). We use Typebox extensively to help shape our tRPC routes (using compiled validators etc) and data transformations to and from a document data storage.

So be it tRPC or Typebox our tsserver inside VSCode takes sometimes 70s to get through compiling/prepping all the files inour multi-package monorepo. That's fine a once off hit is ok if incremental compilation is happening later (slow but bearable).

The Problem

What we did notice is that when we use the Value part of Typebox inside our Fastify API (Node 20) we are getting significant impact on perf. Now again this could be attributed to the complexity/composition of how we construct our schema (eg we use TComposite) but delving into the Typebox after wrapping every chunk of code in a performance timing I saw most of the problem was using Value and its use of the Javascript reduce method.

We are wondering why this is the case? Is this a legacy code thing? Based on perf metric etc?
AFAICT there are two "patterns" in where reduce is used thoughout Typebox code... one to reduce and array into an array and another to reduce an object into another object.

Array to array

The example patterns

const newArray = arr.reduce((acc, item) => {
    return [...acc, ...transform(item)];
  }, [])

Maybe I am not following but isn't this fairly inefficient? Essential for each item in the array it copies the whole array processed so far the ...acc and then contactenates an additional array of results (from a transform function) onto the end of the array.

This looks like a n! action.... Could really just be a flatMap?

const newArray = arr.flatMap(item=>transform(item))

Array of Objects to a newobject

The example patterns

const arr = Object.keys(oldObject)
const newObject = arr.reduce((acc, item) => {
    return {...acc, [item]:.transform(oldObject[item])];
  }, {})

Again this copies the who object (...acc) for every "item"... so if the arr is large the copes are n!
(there is variations of this where it is an array of objects transformed into a single object where each unique key can be an array... something like TComposite)

Could this again benefit from some ES6 functions like fromEntries and entries?

const newObject = Object.fromEntries(Object.entries(oldObject).map([key, item])=>[key, transform(item)]))

(With TComposite there might be another step where groupBy is involved)

Like the array example on small objects (1-5 props) this probably doesn't really "matter" in perf but on objects with lots of properties (or lots of objects with lots of properties) then it starts being significant.

I would be willing to go through the Typebox codebase to just change all the cases and see but to be honest I don't know how to setup the environment in order to do that and test it (is just a simple download change and submit a pull request? or is there a no-go area on code changes... eg language/platform constraints).

Anyway I thought I'd start asking first if I have missed something obvious or not so obvious before attempting it.

sinclairzx81 · 2024-04-24T08:17:42Z

sinclairzx81
Apr 24, 2024
Maintainer

@kenttregenza Hi.

TypeBox's internal composition logic is written to run on older versions of JavaScript (which may not support .flatMap), but mostly it's written to replicate (as close as possible) conditional type level logic at runtime. TypeBox has been through a few implementations of this (and a very extensive one on 0.32.0 where a lot of this logic was introduced). The following shows a few iterations of the pattern and general thinking.

// -----------------------------------------------------------------
// type level (need to replicate this type at runtime)
// -----------------------------------------------------------------
type TMapping<T extends TSchema[], Acc extends TSchema[] = []> = (
  T extends [infer L extends TSchema, ...infer R extends TSchema[]]
  ? TMapping<R, [...Acc, L]>
  : Acc
)
// -----------------------------------------------------------------
// runtime level (#1 - verbatim)
// -----------------------------------------------------------------
const Mapping1 = <T extends TSchema[]>(T: [...T], Acc: TSchema[] = []): T => {
  const [L, ...R] = T
  return (
    T.length > 0
      ? Mapping1(R, [...Acc, L])
      : Acc
  ) as T
}
// -----------------------------------------------------------------
// runtime level (#2 - reduce)
// -----------------------------------------------------------------
const Mapping2 = <T extends TSchema[]>(T: [...T]): T => {
  return T.reduce((Acc, L) => {
    return [...Acc, L] // array copy (slow)
  }, [] as TSchema[]) as T
}
// -----------------------------------------------------------------
// runtime level (#3 - reduce - no copy)
// -----------------------------------------------------------------
const Mapping3 = <T extends TSchema[]>(T: [...T]): T => {
  return T.reduce((Acc, L) => {
    Acc.push(L) // avoid array copy
    return Acc
  }, [] as TSchema[]) as T
}

So, currently TypeBox uses the Mapping2 pattern, but you're right in pointing out the overhead with respect to the array copy there. I should note these patterns are only applicable to type composition only (not necessarily related to Value), but do think Mapping3 would likely be efficient so would be interested in exploring this.

I would be willing to go through the Typebox codebase to just change all the cases and see but to be honest I don't know how to setup the environment in order to do that and test it (is just a simple download change and submit a pull request? or is there a no-go area on code changes... eg language/platform constraints).

I think before implementing anything here, it would be good to understand the problem a little better. The 70 second delay you mention is very extensive (and certainly far more than I would expect for TypeBox even with the Mapping2). Would you be able to provide a repro (tRPC + TypeBox) which demonstrates this delay? I would be keen to observe the issue first, then consider a strategy for optimizing (which may involve broadly implementing Mapping3)

Keep me posted

1 reply

kenttregenza Apr 24, 2024
Author

Thanks for replying.

Javascript target?

Oh ok.... I just thought since you were already using "modern" ES features like Object.entries in your code and then looking at your hammer build tests for that you test only on Typescript versions

['4.9.5', '5.0.4', '5.1.3', '5.1.6', '5.2.2', '5.3.2', '5.3.3', '5.4.3', 'next', 'latest']

And then in your project you only use target ESNext for your task build that perhaps you we only supporting ESNext onwards from Typescript 4.9 which I believe was ES2022.

Also I imagine there is a minimum node version requirement for typebox node 14 and 16 are EOL.

So... maybe flatMap is an option? Since you only test for ES2022 anyway and node 14 supported it.

Code for review

In regards to my 70 second "startup" time I imagine it is somewhere in the complex setup of the project.

Just a background. I can really give you code.... but here is maybe more a detailed setup

I have a decent laptop Windows 11 32GB memory, 12th gen Intel CPU etc so it won't be much there.
I have a pnpm monorepo that consists of
packages
--admin-app
--client-app
--api
--core
--some other nonrelated packages for cdn / gateways etc
I am not running NX or turborepo to "precompile" api or core... think "direct imports" from packages that are compiled out
API imports schemas from @core/kind/schema.ts and exposes them via a tRPC route
client and admin import routes from @api and occasionally the schema directly from @core
There are 40+ models (schemas) some with a complex dependencies...

The schemas are (now with hindsight badly constructed) in the form

const a = Type.Object({...set of props})
const b = Type.Object({...set of props})
const c = Type.Object({...set of props})
const schema = Type.Composite([a,b,c])

which I imagine is where all the "slowdowns" is occuring

Each schema also has a bunch of "validators" like "newSchema", "updateSchema", "schemaCollection" (for partial schemas) that are just compiled schemas. These are given to tRPC routes. Which if you know tRPC looks a little like this
```
const r = procedure.input(validator).query(input=>....returns schema_type object)
```
Now tRPC builds a AppRoute type of all of the routes which is "imported" by client apps. This also is a likely candidate for slowdowns as tRPC is also a bit "intensive" in how it builds that type.

So really it is the combination (perfect storm) of a monorepo (with no NX cache)... complex typebox schemas (using TComposite) and tRPC that adds to this slowdown. Im surprised it works at all TBH but early in the dev things were...ok but obviously as we go to more and more schemas. The startup build is tough work I imagine... and probably mostly related to recompiling the same bit of code multiple times (both tRPC and Typebox stuff)

When it is all loaded and I change one schema (and all the dependencies recompile) it can be seconds before it all settles down. So I am just looking to improve on things for that. But if VSCode linter is slow that is ... fine-ish but I was more concerned about perf in the API.

The big thing is when I use Typebox Value to prep a single "record" it is taking a bit of time. Our Customer entity for instance has say something like this shape

    const a = Type.Object({...20 props})
    const b = Type.Object({...20 props})
    const c = Type.Object({...20 props})
    const schema = Type.Composite([a,b,c])

in the props there are a few Type.Record(string, string) and a couple of generated custom enum types.
So I might do something like

    const d = Value.convert(schema, parameterData) // to convert any miss-"typed" data eg "1" to 1
    const d2 = Value.clean(schema, d2)  // To remove extra stuff
    const d4 = Type.cast(schema, d3)

Those 3 steps on our calc take 30ms on my big laptop whereas I would expect <1ms

Sorry this comment is getting really long but again I get it there are a bunch of people that depend on Typebox and anything that shakes things up too much might affect other projects. I was just offering some help to see if this approach was worth it. I appreciate you probably have a lot of other things on your plate.

sinclairzx81 · 2024-04-24T14:35:38Z

sinclairzx81
Apr 24, 2024
Maintainer

@kenttregenza Heya, Just to be clear on things, when you say the following.

Those 3 steps on our calc take 30ms on my big laptop whereas I would expect <1ms

Do you mean

Inference Performance - The time it takes the TypeScript language server to show a auto complete hint in editor (vscode)
Compiler Performance - The time in which it takes the TypeScript compiler to fully compile your project.
Composite Performance - The time in which it takes TypeBox to generate the types (via .reduce, observed as slow application start up)

Differences between Inference and Runtime Performance

It's important to note that if you're experiencing "Inference Performance" or "Compiler Performance" issues (above), this would be unrelated to the use of .reduce(). Similarly a flatMap() implementation wouldn't lead to improved Inference performance. The reason is that the internal runtime logic used to construct types is decoupled from the type level logic to infer static types.

Generally speaking, to improve "Inference or Compiler Performance", you would need to look specifically at the static inference logic. To improve "Composite Performance", there may be room to improve performance there via flatMap (or the Mapping3 pattern shown above)

Can you let me know which kind of performance issues you're experiencing?

6 replies

sinclairzx81 Apr 24, 2024
Maintainer

@kenttregenza Thanks for the clarification

Inference Performance

The inference performance (which is this the 70+ seconds) I imagine is mostly due to tRPC and how it builds the routes combined with how we have the pnpm monorepo "live" importing (so no use of a Nx or turborepo compiled cache). While Typebox "types" might be contributing here I couldn't definitely say..As far as i have read measuring Typescript type performance is a bit of a dark art so I couldn't tell if tyoe-fest utility type patterns are faster than Typebox type patterns.

I would have a guess the inference performance of 70 seconds would be more related to tRPC than TypeBox. There was a good blog post article published on this several months ago and do note that there is a clear inference overhead cost when using tRPC. TypeBox was measured in the blog post and was noted to scale somewhat better, however a 70 second inference result sounds completely unworkable. This said, the blog post can be found at the following URL.

https://dev.to/nicklucas/typescript-runtime-validators-and-dx-a-type-checking-performance-analysis-of-zodsuperstructyuptypebox-5416

There are some strategies to improve inference performance (which usually amounts to avoiding method chaining and reliance on return type inference), however I'm not an active tRPC user so unfortunately I can't really comment on best strategies to improve things there.

Validation Performance

I noticed this 1 endpoint was taking 150ms in our dev environment... So I just wrapped perf values and assic console logs to see what was taking the time. In this case my 3 rather larger schemas were taking a most of time... (Since I was caching the database reads) Simple removing the convert, clean and cast from the pipeline meant the tRPC call became < 3ms

So I would really need a reproduction to gain insight into why you're seeing a 150ms delay. But in lieu of this it would be good to remove tRPC and Database calls from the equation and just benchmark the Value.* operations in isolation. The following is a code snippet you can copy and paste to run locally. The code generates a large schema with 128 properties, with each property containing a Record<string, number> (somewhat matching the description you've provided).

Running locally, I get around 1ms average per validation pass.

import { Type, TSchema, Static } from '@sinclair/typebox'
import { Value } from '@sinclair/typebox/value'

// The Parsing Pipeline
function Parse<T extends TSchema>(schema: T, value: unknown): Static<T> {
  const defaulted = Value.Default(schema, value)
  const converted = Value.Convert(schema, defaulted)
  const cleaned = Value.Clean(schema, converted)
  return Value.Cast(schema, cleaned)
}

// Generates a schema with 128 properties, with each property a Record<string, number>
const keys = Type.TemplateLiteral('prop${0|1}${0|1}${0|1}${0|1}${0|1}${0|1}${0|1}')
const schema = Type.Mapped(keys, _ => Type.Record(Type.String(), Type.Number(), { default: { x: 1, y: 2, z: 3 }}))

// Generates a value matching the schema
const value = Value.Create(schema)

// Parses the value 10000 times (average 1ms per validation)
for(let i = 0; i < 10000; i++) {
  const start = Date.now()
  Parse(schema, value)
  console.log(Date.now() - start) // delta
}

If you can setup the script above to run locally, then replace the schema with your Composite type and send it a value similar to one causing a 150ms delay, this would help narrow down where the performance issues are. I'm a little bit skeptical that removing the convert, clean and cast functions from the pipeline would result in such a dramatic performance difference, except for perhaps extremely large objects with thousands of properties and varying data structures, but even then a 100ms+ difference seems far too steep.

If you can provide a reproduction (using the above code as a basis) ill certainly investigate optimizations to improve things, but first will need to narrow down exactly where the performance issues reside.

Let me know how you go.

sinclairzx81 Apr 25, 2024
Maintainer

Just a quick update....

I've pushed a update on 0.32.24 which applies a small optimization to Value.Convert. This function was internally using .reduce() on property keys, but have change this to use key enumeration. The Parse function above runs a sub millisecond delta. The Default, Clean and Cast should be relatively optimized as is.

Try 0.32.24 a let me know if this helps with the 150ms response.

kenttregenza Apr 25, 2024
Author

Honestly I ran tests between the two and maybe there is a reduction in ms .... Maybe how I am measuring is affected by some other factors. Unfortunately it is hard to 'really' measure one offs like we do in code. My initial timings was just to indicate most of the API call was spent in "Convert/Clean/Cast". Just in my latest measurements my best "guess" the are still plenty of ms left on the table.

That said I did put reduce vs for loop in some benchmarking tools with "similar" shape to your code changes and for loops are much, much quicker. Often 80% faster than the old reduce I'm sure that accounts for something right. In fact out of all the ways to do iterators the one without "function" calls is quicker... go figure... ;) ... except for LoDash which still to this day beats for loops.... Oh and I also compared Map vs POJO {} and POJO wins also.

By the way the one thing I did to get "really" get the ms down is to really limit my use of TComposite by changing how I build my schema

const a = Type.Object({...20 props})
const b = Type.Object({...20 props})
const c = Type.Object({...20 props})
const schema = Type.Composite([a,b,c])

becomes

const a = {...20 props}
const b = {...20 props}
const c = {...20 props}
const schema = Type.Object({...a, ...b, ...c})

Kinda makes sense right. Not having to navigate through layers of Typebox schemas. Just a slightly different way of assembling schema from props. Anyway I appreciate you looking into it. If I could I'd love to help out more converting more and more of those reducers into for loops seems like a very big win for a small change.

I'd also like to explore the way you build your types in Typebox and see if I can get some measurements/perf tracing happening. Did you start from scratch with your Typescripts and use of recursions. Was there any blogs, tech writeups that you used to make to inform your decisions or was it just time. Anyway maybe this is for another day.

sinclairzx81 Apr 25, 2024
Maintainer

Heya,

Honestly I ran tests between the two and maybe there is a reduction in ms .... Maybe how I am measuring is affected by some other factors. Unfortunately it is hard to 'really' measure one offs like we do in code. My initial timings was just to indicate most of the API call was spent in "Convert/Clean/Cast". Just in my latest measurements my best "guess" the are still plenty of ms left on the table.

I am very much open to optimizations where possible, however before approaching any optimizations, I would need to identify which parts of the codebase should be targeted for optimization first, then explore more efficient implementations that meet the requirements of the TypeBox test suite.

While I don't necessarily believe optimization would be as simple as broadly implementing .flatMap(), .entries(), .fromEntries() or loops (although there may be some cases where this is true), I do think a good approach may be to just visit each function in isolation, and explore the fastest possible implementations for that function while ensuring the test suite still passes. If common patterns emerge from that work, then these patterns can be incorporated into the codebase more broadly in later revisions.

As it stands, a lot of work has gone into optimizing TypeBox in the past, however the Value.* functions may be a little less optimal in places (mostly due to meeting functional requirements during bug fixes that occur during revisions where performance may be a secondary concern), however if there are places you can identify that may be running slow, I am more than happy to review the implementations there.

Just keep in mind, the TypeBox codebase is very large. It would be helpful for you to send me code links to places within the codebase you think could be targeted for optimization. I'm certainly happy to review each one in turn. Also keep in mind, there is a lot of nuance to many of the functions internal to TypeBox, so internal updates should be provisioned with a little bit of historical context (which I'm happy to provide)

By the way the one thing I did to get "really" get the ms down is to really limit my use of TComposite by changing how I build my schema

So just note that the return type of a TComposite will always be a TObject. The Composite function is a very complex type that merges multiple objects into one; and where the resulting TObject has TS intersection evaluation rules applied. However it still produces an plain TObject, and as such the following two types (A and B) are equivalent and would have the exact same performance (both inference and runtime validation performance)

import { Type } from '@sinclair/typebox'

const A = Type.Object({    // Hover A
  x: Type.Number(),
  y: Type.Number(),
  z: Type.Number()
})

const B = Type.Composite([  // Hover B
  Type.Object({ x: Type.Number() }),
  Type.Object({ y: Type.Number() }),
  Type.Object({ z: Type.Number() })
])

console.log(A)  // A has the same schematic as B
console.log(B)

I wouldn't anticipate the use of Composite being much of a problem in terms of validation performance, however the computations that merge multiple objects into one are very sophisticated and do have some overhead. However if the use of { ...a, ...b, ...c } works for you, you should use this approach, however just keep in mind there is a distinction in that Composite will evaluate for Intersect / Unions / Optional / Readonly types embedded in the Composite.

So, if you can point out some of the places in the codebase you're looking at, that will help narrow this conversation down a bit. Unfortunately, the discussion here is getting a bit too broad for me to provide helpful responses (or provide information where you could help contribute to resolving some of these issues). As I say. I am very open to optimizing TypeBox across the board (and also welcome community contribution), but will need to focus on specifics.

Keep me posted.
S

sinclairzx81 Apr 26, 2024
Maintainer

@kenttregenza Heya, just another quick update.

I've gone ahead and done a full sweep across both Type and Value modules implementing as many optimizations as I can find. I've adopted the following pattern throughout for object construction to avoid unnecessary object allocation (which would be slow for objects containing a large number of properties)

// before
return keys.reduce((Acc, key) => {
  return { ...Acc, [key]: ... } // repeated object allocation
}, {})

// after
const result = {} as Record<PropertyKey, unknown> // allocate once
for(const key of keys) result[key] = ... // dynamically assign new properties this way
return result

The above yields slightly better performance than before, however the main bottleneck turned out to be the internal TypeGuard checks which run deep value checks on the schematics during composition (this was slowing things down a lot). I've implemented a fast path for this (introducing a new KindGuard) so you should see faster composition performance as a result.

Additionally, I've optimized the internal Clone functions, but unfortunately, there is a lot of cloning happening during composition that will be difficult to optimize on. TypeBox tries very hard to ensure schematics are immutable (ensuring explicit modification to schematics doesn't impact referenced schematics elsewhere). I may look at better cloning strategies in future, but probably not on this revision.

Lastly, I've optimized as much of the Value.* functions as possible, however most of these optimizations scoped to the Encode / Decode functions (the Clean, Default, Cast were already fairly well optimized). However the may be potential to further optimize on the logic in places.

You can try out these optimizations on 0.32.25. I'm curious to see if these help improve some of the runtime performance you are seeing.

Keep me posted.

kenttregenza · 2024-04-26T05:27:23Z

kenttregenza
Apr 26, 2024
Author

You are a machine...

So anecdotally I'd say that halved the runtime performance we are experiencing. Thanks for that.

An interesting perf question/memory question (since I have no real way of measuring how Typescript builds its Intellisense tables) if I did this

const schema1 = Type.Object({
   name: Type.String(),
   language: Type.String()
})
const schema2 = Type.Object({
   title: Type.String(),
   note: Type.String()
})

and made it

const stringProp = Type.String()

const schema1 = Type.Object({
   name: stringProp,
   language: stringProp
})

const schema2 = Type.Object({
   title: stringProp,
   note: stringProp
})

I know this removes all the nice schema "options" but say I was fine with that the Q is if this would this improve inference on Typescript engine... since all the "strings propeties" are now essentially the same object under the hood. Would that reduce the burned on the Typescript language server/inference engine? Or should I just be doing what your latest code is doing... returning as never and defining the function calls with return "types"? I will investigate

In any event thank you again for the updates. I think it has really been a good step forward.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of reduce vs entries #846

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Use of reduce vs entries #846

kenttregenza Apr 24, 2024

The Problem

Array to array

Array of Objects to a newobject

Replies: 3 comments · 7 replies

sinclairzx81 Apr 24, 2024 Maintainer

kenttregenza Apr 24, 2024 Author

Javascript target?

Code for review

sinclairzx81 Apr 24, 2024 Maintainer

Differences between Inference and Runtime Performance

sinclairzx81 Apr 24, 2024 Maintainer

Inference Performance

Validation Performance

sinclairzx81 Apr 25, 2024 Maintainer

kenttregenza Apr 25, 2024 Author

sinclairzx81 Apr 25, 2024 Maintainer

sinclairzx81 Apr 26, 2024 Maintainer

kenttregenza Apr 26, 2024 Author

kenttregenza
Apr 24, 2024

Replies: 3 comments 7 replies

sinclairzx81
Apr 24, 2024
Maintainer

kenttregenza Apr 24, 2024
Author

sinclairzx81
Apr 24, 2024
Maintainer

sinclairzx81 Apr 24, 2024
Maintainer

sinclairzx81 Apr 25, 2024
Maintainer

kenttregenza Apr 25, 2024
Author

sinclairzx81 Apr 25, 2024
Maintainer

sinclairzx81 Apr 26, 2024
Maintainer

kenttregenza
Apr 26, 2024
Author