-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(core.js): introduce promise ring #9979
Conversation
Since breaching sectors in batches doesn't seem useful performance wise
Breaching ring sectors in batch doesn't seem to yield any performance gains ultimately, so I've dropped references to ring sectors and this should otherwise be good to merge, if it's correct. |
…g bounds Given that nextPromiseId is an increment ahead of the cursor last written to
I see about a 3% real-world improvement with this optimization in the HTTP throughput benchmark. However to my eye, this optimization seems rather complicated for such minimal benefit. If you don't mind, I'd rather leave this one queued up for the future and search for lower hanging fruit for now. |
Though it's not directly related to this pr, it seem that when promiseId touch Number.MAX_SAFE_INTEGER, next asyncOp might failed. |
Yeah, I thought of that and it would be relatively easy to handle, but it needs to be coordinated on the rust side. We could easily loop around or fallback to BigInts up to In practice, it should be hard to hit that limit. A deno program doing 10M opcalls/s non-stop, would take ~28.5 years to reach that limit:
Also, given that in |
I'm working on other async op improvements, but this improvement is pretty significant, it's a 10% reduction in async-op baseline at current values and it substantially helps improve tail latencies and close the tail latency gap with node on our tcp latency benches. These improvements come at a relatively small cost: ~20 lines of moderately complex JS. Ring buffers are a relatively common and well understood data-structure, nothing too exotic. A promise ring is naturally less straightforward that a plain Map or Object, but it significantly reduces the GC-pressure of promises whilst being faster than both. Unless we have a better alternative for holding promises, I would be in favor of landing this.
Promise Map has a better op baseline than Object but has higher tail latencies due to the GC pressure, Ring is better than both. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance improvements from this PR for throughput increate and tail latency decrease are too good to let it wait. LGTM!
const promiseTable = new Map(); | ||
const promiseMap = new Map(); | ||
const RING_SIZE = 4 * 1024; | ||
const NO_PROMISE = null; // Alias to null is faster than plain nulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What, really??!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it appears to save ~10ns/opcall in the benches I ran.
I haven't double-checked the v8 generated bytecode, but I assume it could be faster due to checking equality via referential equality instead of value equality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will definitely need to double check this and narrow it down, to see if there is a true performance difference or if it was just a statistical fluke in my local benches.
But it's easy to change either way and doesn't negatively impact performance or correctness in its current form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piscisaureus I've dumped the v8 bytecode and diffed the bytecode implementations of getPromise()
and setPromise()
This is another optimization to help improve the baseline overhead of async ops. It shaves off ~55ns/op or ~7% of the current total async op overhead.
Though it's only 7% of the total async-op overhead, by my estimates it reduces the overhead of promise storing/lookup by somewhere between 33-50%.
It achieves these gains by taking advantage of the sequential nature of promise IDs and optimistically stores them sequentially in a pre-allocated circular buffer and fallbacks to the promise Map for slow to resolve promises.
The promise ring will add some constant memory overhead, ~4B per slot so ~16kb for 4k slots, etc...
Benches
Todo
setPromise