Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(ops): Monomorphic sync op calls #15337

Merged
merged 46 commits into from
Aug 11, 2022

Conversation

aapoalas
Copy link
Collaborator

@aapoalas aapoalas commented Jul 28, 2022

Welcome to better optimised op calls! Currently opSync is called with parameters of every type and count. This most definitely makes the call megamorphic. Additionally, it seems that spread params leads to V8 not being able to optimise the calls quite as well (apparently Fast Calls cannot be used with spread params).

Monomorphising op calls should lead to some improved performance. Now that unwrapping of sync ops results is done on Rust side, this is pretty simple:

opSync("op_foo", param1, param2);
// -> turns to
ops.op_foo(param1, param2);

This means sync op calls are now just directly calling the native binding function. When V8 Fast API Calls are enabled, this will enable those to be called on the optimised path.

Monomorphising async ops likely requires using callbacks and is left as an exercise to the reader.

Performance

Comparison of nop op in a vacuum:

running 2 tests
test bench_op_nop        ... bench:      37,315 ns/iter (+/- 5,121)
test bench_op_nop_opSync ... bench:      37,925 ns/iter (+/- 4,071)

Comparison of FFI related ops:

benchmark                            time (avg)             (min … max)
-----------------------------------------------------------------------
ops.op_ffi_cstr_read(ptr)        121.16 ns/iter  (102.2 ns … 249.56 ns)
opSync('op_ffi_cstr_read', ptr)   131.3 ns/iter (111.42 ns … 302.48 ns)
ops.op_ffi_ptr_of(buf)           140.96 ns/iter (125.56 ns … 246.76 ns)
opSync('op_ffi_ptr_of', buf)     184.02 ns/iter (162.43 ns … 411.55 ns)

Discussion

There are good things about this:

  1. Performance
  2. Picking the fruits created by the ops macro PR.

Downsides are:

  1. None

cli/bench/deno_common.js Outdated Show resolved Hide resolved
core/01_core.js Show resolved Hide resolved
@ghost
Copy link

ghost commented Jul 29, 2022

If it makes it easier to get a first iteration done here, IMO it's reasonable to just do a swift inlining pass that ignores eluding unwrapOpResult.

@aapoalas
Copy link
Collaborator Author

If it makes it easier to get a first iteration done here, IMO it's reasonable to just do a swift inlining pass that ignores eluding unwrapOpResult.

Yeah, I think a lot of the files I'll just be doing that.

cli/bench/deno_common.js Outdated Show resolved Hide resolved
test_ffi/tests/bench.js Outdated Show resolved Hide resolved
@aapoalas aapoalas requested a review from a user August 11, 2022 10:45
@aapoalas
Copy link
Collaborator Author

So, apparently this is ready for review and merge if nothing out of order is found. The tests failed once in parsing strace output but the failure did not repeat on another go around, so apparently it was a flake.

test_ffi/tests/test.js Outdated Show resolved Hide resolved
test_ffi/tests/thread_safe_test_worker.js Outdated Show resolved Hide resolved
test_ffi/tests/thread_safe_test.js Outdated Show resolved Hide resolved
Copy link
Member

@littledivy littledivy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it!

Copy link
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @aapoalas, great work

@bartlomieju bartlomieju merged commit 2164f6b into denoland:main Aug 11, 2022
dsherret pushed a commit that referenced this pull request Aug 11, 2022
Welcome to better optimised op calls! Currently opSync is called with parameters of every type and count. This most definitely makes the call megamorphic. Additionally, it seems that spread params leads to V8 not being able to optimise the calls quite as well (apparently Fast Calls cannot be used with spread params).

Monomorphising op calls should lead to some improved performance. Now that unwrapping of sync ops results is done on Rust side, this is pretty simple:

```
opSync("op_foo", param1, param2);
// -> turns to
ops.op_foo(param1, param2);
```

This means sync op calls are now just directly calling the native binding function. When V8 Fast API Calls are enabled, this will enable those to be called on the optimised path.

Monomorphising async ops likely requires using callbacks and is left as an exercise to the reader.
@aapoalas aapoalas deleted the perf-ops-monomorphic-calls branch October 9, 2022 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants