Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(encoding/varint): native implementation #3213

Closed
wants to merge 7 commits into from
Closed

refactor(encoding/varint): native implementation #3213

wants to merge 7 commits into from

Conversation

iuioiua
Copy link
Collaborator

@iuioiua iuioiua commented Feb 21, 2023

This change replaces the Wasm implementation of encoding/varint with a native-TypeScript implementation ported from the x/varint module by @keithamus.

My benchmarks show that this implementation is 2x slower than its Wasm predecessor. However, the benefit is the implementation is simpler, and a step in CI is removed.

See #3123 (comment)

@iuioiua iuioiua changed the title refactor(encoding/varint): use native implementation refactor(encoding/varint): native implementation Feb 21, 2023
@iuioiua iuioiua marked this pull request as ready for review February 21, 2023 22:12
@iuioiua iuioiua requested a review from kt3k as a code owner February 21, 2023 22:12
*/
export function decodeU32(buf: Uint8Array, offset = 0): number {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’ll definitely want to somehow return how many bytes the varint was encoded in, so that consumers know which byte offset to continue reading data from.

Also, thanks ☺️

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. In that case, we'll have to deprecate the current API as the form of the return values will change. @kt3k, are you in favour?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should return the used bytes somehow.

we'll have to deprecate the current API as the form of the return values will change

I wonder if there's other design possibilities other than returning 2-tuple (Creating an array for each decode sounds a little expensive to me). For example, npm:varint sets the used bytes at varint.decode.bytes ref. https://github.com/chrisdickinson/varint#varintdecodebytes

How about exporting used bytes like export let decodedBytes and setting that value at each decode?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote would be against magic variables or properties on functions as they can release Zalgo in async contexts (though I don't have a compelling usecase for unpacking varints async 😂)

One option could be to simply add additional methods like decodeU32AndGetOffset(): [number, number]. I'm sure you can think of a better name though 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that could cause zalgo as decode doesn't involve any async operations. I think the code like the below works reliably:

const n = decode(...);
const bytes = SOME_MAGIC_VAR;

One option could be to simply add additional methods like decodeU32AndGetOffset(): [number, number]. I'm sure you can think of a better name though 😃

This seems obvious and a good option too 👍

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more about if you had something like:

const n = decode(...);
await someOp
const bytes = SOME_MAGIC_VAR

I can't think of a reason why you'd need to await between extraction... perhaps if you're streaming responses... but then you can shuffle the lines around without much difficulty. It's just a bit of an awkward api.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should return the used bytes somehow.

we'll have to deprecate the current API as the form of the return values will change

I wonder if there's other design possibilities other than returning 2-tuple (Creating an array for each decode sounds a little expensive to me). For example, npm:varint sets the used bytes at varint.decode.bytes ref. https://github.com/chrisdickinson/varint#varintdecodebytes

How about exporting used bytes like export let decodedBytes and setting that value at each decode?

How is it expensive to return a 2-tuple? I don't understand how it is.

If it turns out to be inexpensive to return a 2-tuple, would it be better to return { result: number, offset: number } for decodeX() and { buffer: UInt8Array, offset: number } for encodeX() instead?

We should introduce these as encode32Bit(), decode64Bit(), etc., in conjunction with the deprecation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keithamus

I can't think of a reason why you'd need to await between extraction... perhaps if you're streaming responses... but then you can shuffle the lines around without much difficulty. It's just a bit of an awkward api.

Ok. Now I agree that magic var design is confusing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iuioiua

We should introduce these as encode32Bit(), decode64Bit(), etc., in conjunction with the deprecation.

If we change the API names, how about following x/varint naming style? (decode, decode32, encode). Fortunately they don't have overlaps with the existing APIs, and also they look simpler than the current APIs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we'll do a straight port of x/varint. However, I think it's worth splitting encode() into encode32() and encode64() if there are performance gains to be had. I'll create a new issue for the exploration of this idea.

@iuioiua iuioiua marked this pull request as draft February 22, 2023 23:14
@iuioiua
Copy link
Collaborator Author

iuioiua commented Feb 23, 2023

Closing in favour of #3215.

@iuioiua iuioiua closed this Feb 23, 2023
@iuioiua iuioiua deleted the refactor-varint branch March 14, 2023 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants