Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of length on UTF8String #11011

Merged
merged 1 commit into from
Apr 26, 2015

Conversation

ScottPJones
Copy link
Contributor

I rewrote the u8_charnum function to improve the speed of getting the length of a UTF-8 string.
(in my tests, it was approximately twice as fast as using u8_charnum, and many times faster than
the old approach using u8_strlen [which should be removed, along with u8_strwidth, because if called with a malformed string without a terminating \0, they can overrun the buffer, potentially causing an access violation])

@JeffBezanson
Copy link
Sponsor Member

Awesome, thanks!

@stevengj
Copy link
Member

LGTM.

@stevengj
Copy link
Member

As an aside, I noticed a bunch of places in string.jl that test length(s) == 1 and similar, which is O(n) when it should be O(1). We should really have a fast islength1(s) routine or something like that—should only be a couple lines long.

stevengj added a commit that referenced this pull request Apr 26, 2015
Improve performance of length on UTF8String
@stevengj stevengj merged commit 47059be into JuliaLang:master Apr 26, 2015
@stevengj stevengj added the domain:unicode Related to unicode characters and encodings label Apr 26, 2015
@ScottPJones ScottPJones deleted the spj/utf8_length branch April 27, 2015 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants