Improve performance of length on UTF8String #11011

ScottPJones · 2015-04-25T22:35:26Z

I rewrote the u8_charnum function to improve the speed of getting the length of a UTF-8 string.
(in my tests, it was approximately twice as fast as using u8_charnum, and many times faster than
the old approach using u8_strlen [which should be removed, along with u8_strwidth, because if called with a malformed string without a terminating \0, they can overrun the buffer, potentially causing an access violation])

JeffBezanson · 2015-04-25T23:21:25Z

Awesome, thanks!

stevengj · 2015-04-26T02:12:29Z

LGTM.

stevengj · 2015-04-26T02:14:22Z

As an aside, I noticed a bunch of places in string.jl that test length(s) == 1 and similar, which is O(n) when it should be O(1). We should really have a fast islength1(s) routine or something like that—should only be a couple lines long.

Improve performance of length on UTF8String

Improve performance of length on UTF8String

a0f85aa

stevengj added a commit that referenced this pull request Apr 26, 2015

Merge pull request #11011 from ScottPJones/spj/utf8_length

47059be

Improve performance of length on UTF8String

stevengj merged commit 47059be into JuliaLang:master Apr 26, 2015

stevengj added the domain:unicode Related to unicode characters and encodings label Apr 26, 2015

ScottPJones deleted the spj/utf8_length branch April 27, 2015 03:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of length on UTF8String #11011

Improve performance of length on UTF8String #11011

ScottPJones commented Apr 25, 2015

JeffBezanson commented Apr 25, 2015

stevengj commented Apr 26, 2015

stevengj commented Apr 26, 2015

Improve performance of length on UTF8String #11011

Improve performance of length on UTF8String #11011

Conversation

ScottPJones commented Apr 25, 2015

JeffBezanson commented Apr 25, 2015

stevengj commented Apr 26, 2015

stevengj commented Apr 26, 2015