Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster UTF8 validation #705

Open
jonesmz opened this issue Oct 22, 2020 · 1 comment
Open

Faster UTF8 validation #705

jonesmz opened this issue Oct 22, 2020 · 1 comment

Comments

@jonesmz
Copy link
Contributor

jonesmz commented Oct 22, 2020

Just making the project aware of this faster algorithm. https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/

Possible ways to take advantage of this are to provide some kind of hook for user code to provide it's own UTF8 validation, or a compile time option to specify a UTF8 validation function as a dependency.

@kleunen
Copy link
Contributor

kleunen commented Oct 23, 2020

It seems to use vectorized (SIMD) instructions, i would say it goes a bit to far to have this kind of optimization. The UTF8 validation overhead is only the tiniest percentage of the whole workload. Not sure if optimization of this would give you any noticable performance gain. I wonder why boost locale does not have a validation function and select an optimized version based on CPU architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants