Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Encoding API Improvements #498

Merged
merged 4 commits into from
Jul 27, 2024

Conversation

nabetti1720
Copy link
Contributor

@nabetti1720 nabetti1720 commented Jul 26, 2024

Description of changes

To improve compatibility with encodings supported by Node.js, the following actions have been taken.
https://nodejs.org/api/util.html#whatwg-supported-encodings

  • For encodings supported by LLRT, modified so that they can also be specified by aliases.
  • Reduced excessive normalization and made label judgments a little stricter.
  • Since iso-8859-1 was an alias, it was managed by windows-1252.
  • ENCODING_MAP was introduced with the expectation of improved maintenance and lookup performance.

Checklist

  • Created unit tests in tests/unit and/or in Rust for my feature if needed
  • Ran make fix to format JS and apply Clippy auto fixes
  • Made sure my code didn't add any additional warnings: make check
  • Updated documentation if needed (API.md/README.md/Other)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Contributor

@richarddavison richarddavison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nabetti1720! I like the aliasing and more simple approach. Not sure if this yields much performance benefit (except for maybe one less string allocation to normalize the key) but it's more maintainable for sore.

llrt_utils/src/encoding.rs Outdated Show resolved Hide resolved
@nabetti1720
Copy link
Contributor Author

nabetti1720 commented Jul 27, 2024

Hi @richarddavison .
Yes. In the previous implementation, it was necessary to create a new enum when dealing with aliases that exceeded the normalization range (e.g., treating unicode-1-1-utf8 as utf-8), and this proposal was made to avoid this.

Also, we believe that by minimizing the normalization, we can now evaluate with the same string as the actual label.

Lastly, sorry. Regarding lookup performance, I believe it is equivalent to O(1) both previously and now. I will correct what I described as an improvement. :)

types/buffer.d.ts Outdated Show resolved Hide resolved
@richarddavison richarddavison merged commit 701ce32 into awslabs:main Jul 27, 2024
8 checks passed
@nabetti1720 nabetti1720 deleted the feat/encoding-api branch July 27, 2024 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants