Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix startswith/endswith implementation #26922

Merged
merged 3 commits into from
May 29, 2018

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Apr 27, 2018

This PR implements a check that prefix b does not match partial character in a in startswith(a, b).

@ararslan ararslan added the domain:strings "Strings!" label Apr 28, 2018
@ararslan ararslan requested a review from nalimilan April 28, 2018 00:39
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkamins knows better than I do whether it's correct. For a serious review better ask @StefanKarpinski.

function startswith(a::String, b::String)
cub = ncodeunits(b)
(ncodeunits(a) < cub || ccall(:memcmp, Int32, (Ptr{UInt8}, Ptr{UInt8}, UInt),
a, b, sizeof(b)) ≠ 0) && return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space for alignment. Would probably be more readable as a if.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewritten to use if now.

@bkamins
Copy link
Member Author

bkamins commented Apr 28, 2018

The logic is that b is a prefix of a if next valid index in a is just after ncodeunits(b) codeunits in string a. The reason is that we know that in b position ncodeunits(b) is the end of a character in b (but it does not have to be the end in a in general so we have to test for this).

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 3, 2018

I don't understand the original TODO: what's wrong with doing the naive byte-comparison and declaring success if they match?

@bkamins
Copy link
Member Author

bkamins commented May 3, 2018

I guess the original TODO says that it is not implemented yet 😄.
However given your question the answer is the following.
Consider strings:

x = "α"
y = "\xb1"

Now endswith(x, y) is false currently and it is a correct information.

However if you compared bytes you would say that there is a match because codeunit(x, 2) is the same as codeunit(y, 1).

@bkamins
Copy link
Member Author

bkamins commented May 4, 2018

I have added endswith implementation and extended both to handle SubString{String}.

@bkamins bkamins changed the title fix startswith implementation fix startswith/endswith implementation May 4, 2018
@bkamins
Copy link
Member Author

bkamins commented May 29, 2018

Is this fix good to merge?

@StefanKarpinski
Copy link
Sponsor Member

It seems like way too much of a performance hit, although these particular functions are usually not performance-critical – #26921 is more of a problem. These both need the same fast primitive: a memcmpx function that returns the index of the first byte where they differ.

@bkamins
Copy link
Member Author

bkamins commented May 29, 2018

Actually here the performance hit is much smaller than in #26921 and cannot be avoided (that is why here I wanted to merge, and I am waiting with the other one).
The reason is that where actually we can safely use memcmp (as I do now) and the only thing we have to check is if last characters in both strings are the same, which we have to do anyway (and memcmpx will not allow us to skip this test).

@StefanKarpinski
Copy link
Sponsor Member

Ok, fair enough. Let's merge this one then.

@StefanKarpinski StefanKarpinski merged commit 082be9a into JuliaLang:master May 29, 2018
@bkamins bkamins deleted the fix_startswith branch May 29, 2018 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants