Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix :noshift construction of an empty SubString #51923

Merged
merged 2 commits into from
Dec 30, 2023

Conversation

tecosaur
Copy link
Contributor

This fixes an edge case in the recently introduced :noshift substring constructor.

@tecosaur
Copy link
Contributor Author

Also, unless I was doing something wrong it appears that @inline SubString{String}(...) didn't elide the boundscheck.

@tecosaur tecosaur added the domain:strings "Strings!" label Oct 29, 2023
@@ -37,7 +37,7 @@ struct SubString{T<:AbstractString} <: AbstractString
return new(s, i-1, nextind(s,j)-i)
end
function SubString{T}(s::T, i::Int, j::Int, ::Val{:noshift}) where T<:AbstractString
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole method feels kind of weird... Is there any benchmark or anything to show the nextind call is significant which I guess is the reason for this to exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method isn't about avoiding the overhead, but being able to construct a SubString with the exact same offset and ncodeunits as another substring. This was added as part of AnnotatedStrings because it's very useful for e.g. taking regex matches on the underlying string and "lifting" them to the AnnotatedString without calling Expr(:new, ...) to avoid the SubString inner constructor.

Copy link
Sponsor Member

@KristofferC KristofferC Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be some internal thing to AnnotatedStrings (that maybe uses Expr(:new)) then because it feels weird to me to have a constructor that is undocumented and doesn't do what the docs of the constructor says that it does.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just something like

julia> Base.SubString{String}("fαr", 1, 1)
"f"

julia> Base.SubString{String}("fαr", 1, 1, Val{:noshift}())
"Error showing value of type SubString{String}:
ERROR: BoundsError: attempt to access 1-codeunit SubString{String} at index [3]
Stacktrace:
  [1] checkbounds
    @ ./strings/basic.jl:214 [inlined]
  [2] iterate
    @ ./strings/substring.jl:92 [inlined]
  [3] popfirst!
    @ ./iterators.jl:1492 [inlined]
  [4] iterate (repeats 2 times)
    @ ./iterators.jl:1503 [inlined]
  [5] escape_string(io::IOContext{Base.TTY}, s::SubString{String}, esc::Tuple{Char, Char}; keep::Tuple{})
    @ Base ./strings/io.jl:430
  [6] escape_string
    @ ./strings/io.jl:428 [inlined]
  [7] print_quoted
    @ ./strings/io.jl:462 [inlined]
  [8] show
    @ ./strings/io.jl:197 [inlined]

feels non ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially had eval(Expr(:new, ... but was told by someone (I forget if it was Stefan, Jameson, or someone else) that it would be preferable to have a variant inner constructor.

My 2c: I just need a way of directly contructing a substring without the index shifting, and I don't care how so much as that it's possible and that it works. We probably want to avoid eval.

@JeffBezanson JeffBezanson added needs tests Unit tests are required for this change kind:bugfix This change fixes an existing bug labels Nov 9, 2023
@tecosaur
Copy link
Contributor Author

Since this is a bugfix, would it be possible to merge this and redirect the design conversation into a separate issue/subsequent PR?

I've just added some basic tests for the :noshift constructor in a second commit, feel free to squash this or not, depending on how much you care about atomicity in the git history.

@tecosaur tecosaur removed the needs tests Unit tests are required for this change label Dec 14, 2023
@oscardssmith oscardssmith added the status:merge me PR is reviewed. Merge when all tests are passing label Dec 14, 2023
An empty constructed SubString is over the range 0:0. The :noshift
constructor is intended to allow for copying the raw range of a
SubString without modification. However, when trying to copy an empty
SubString, the 0:0 range runs afoul of the boundscheck. This can be
adjusted for by eliding the boundscheck when the range is 0:0.
It's nigh trivial, but why not have the test coverage.
@tecosaur
Copy link
Contributor Author

Regarding the "Merge when all tests are passing" part of the merge me label, I've just noticed the CI failures aren't unrelated here, but that there was a set -> est typo in @testset 🤦. I've just fixed that, and so CI test failures should be unrelated after this.

@tecosaur
Copy link
Contributor Author

tecosaur commented Dec 28, 2023

Confirmation: I can't see any sign of related failures in the failed CI tasks.

@IanButterworth
Copy link
Sponsor Member

I just re-triggered the failed job just to check

@IanButterworth IanButterworth merged commit 2091058 into JuliaLang:master Dec 30, 2023
7 checks passed
@inkydragon inkydragon removed the status:merge me PR is reviewed. Merge when all tests are passing label Jan 4, 2024
@tecosaur tecosaur deleted the substring-checkfix branch February 1, 2024 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!" kind:bugfix This change fixes an existing bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants