Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substrings mask annotations #53042

Closed
LilithHafner opened this issue Jan 24, 2024 · 8 comments
Closed

Substrings mask annotations #53042

LilithHafner opened this issue Jan 24, 2024 · 8 comments
Labels
domain:strings "Strings!" kind:bug Indicates an unexpected problem or unintended behavior

Comments

@LilithHafner
Copy link
Member

SubString{Base.AnnotatedString{String}} (produced by annotated_string[1:2]) is an un-annotated string in the sense that it compares equal to strings with the same content without annotations and unequal to strings with annotations.

julia> astr = Base.AnnotatedString("abc", [(1:2, :a => 1)])
"abc"

julia> astr == "abc"
false

julia> astr[1:2] == "ab"
true

julia> astr[1:2] == Base.AnnotatedString("ab", [(1:2, :a => 1)])
false

julia> typeof(astr[1:2])
SubString{Base.AnnotatedString{String}}
@LilithHafner LilithHafner added the domain:strings "Strings!" label Jan 24, 2024
@LilithHafner LilithHafner added this to the 1.11 milestone Jan 24, 2024
@JeffBezanson JeffBezanson added the kind:bug Indicates an unexpected problem or unintended behavior label Mar 7, 2024
@JeffBezanson
Copy link
Sponsor Member

We could patch up the == methods, but it's a bit whack-a-mole --- because of SubString and things like it, any AbstractString can potentially have annotations, so it seems all AbstractString code needs to handle annotations. Oops!

One way to approach it is to try to make AnnotatedString as closed as possible under various operations, e.g. have indexing return an AnnotatedString instead of a SubString. Possibly an AnnotatedString{SubString} but I'm not sure if that would cause problems too?

@JeffBezanson
Copy link
Sponsor Member

JeffBezanson commented Mar 8, 2024

One bug here is that annotations(::SubString) is implemented, making it inconsistent with the behavior in the OP. We could remove those methods and say only AnnotatedStrings have annotations, and use AnnotatedString{SubString} instead of SubString{AnnotatedString} whenever possible. What do you think @tecosaur ?

@tecosaur
Copy link
Contributor

tecosaur commented Apr 3, 2024

Apologies that I've been rather poor at responding to this; I've found this problem somewhat tricky.

For starters, the behaviour Lilith demonstrates is clearly erroneous, and Jeff's idea of using AnnotatedString{SubString} suggested by itself is somewhat appealing, at first glance.

However, I'm concerned that it will cause its own problems around code that uses parametric substring types. Currently calling match(::Regex, ::AnnotatedString{String}) returns a RegexMatch{Base.AnnotatedString{String}}. Why is this a problem? Well, RegexMatch uses SubString{S<:AbstractString} for the match, and I imagine that similar application code probably exists.

I haven't had any nice ideas on how to resolve this tension, beyond just going for the whack-a-mole SubString specialisation as a potential "least bad" option, but none of these options seem to work nicely and I haven't been able to come up with any nice alternatives 😕.

@tecosaur
Copy link
Contributor

From the comments above, I currently don't have a better idea than implementing a specialised equality methods for SubString + AnnotatedString, despite the potentially whack-a-mole nature.

If anybody has a better thought (cc: @JeffBezanson), please do let me know. If I don't hear anything in the near future I'll make a PR doing this. We can always make the solution better/nicer, but making the test case Lilith shows not broken at least seems like a clear improvement even if the mechanism isn't ideal.

@LilithHafner
Copy link
Member Author

SGTM. Fixing bugs is ususally a good first step :)

fingolfin pushed a commit that referenced this issue May 2, 2024
The least-bad idea I've had so far for fixing #53042.

I figure this fixes the bug raised there, and we can always switch to a
clearly-better solution if one appears.

The fact that only a string without annotations is equal to a
non-annotated string (in order to preserve the transitivity of
equality), makes the generic fallback methods for string comparison
insufficient.

As such, ==(::AnnoatedString, ::AbstractString) is implemented in
annotated.jl, but this issue re-appears when dealing with substrings.

The obvious solution is to just implement a specialised method for
substrings. This does seem potentially a bit whack-a-mole, but I'm
worried that cleverer solutions might come with subtle issues of their
own. For now, let's try the simple and obvious solution, and improve it
later if we can work out a nicer way of handling this issue in general.
KristofferC pushed a commit that referenced this issue May 6, 2024
The least-bad idea I've had so far for fixing #53042.

I figure this fixes the bug raised there, and we can always switch to a
clearly-better solution if one appears.

The fact that only a string without annotations is equal to a
non-annotated string (in order to preserve the transitivity of
equality), makes the generic fallback methods for string comparison
insufficient.

As such, ==(::AnnoatedString, ::AbstractString) is implemented in
annotated.jl, but this issue re-appears when dealing with substrings.

The obvious solution is to just implement a specialised method for
substrings. This does seem potentially a bit whack-a-mole, but I'm
worried that cleverer solutions might come with subtle issues of their
own. For now, let's try the simple and obvious solution, and improve it
later if we can work out a nicer way of handling this issue in general.

(cherry picked from commit 185f058)
@KristofferC
Copy link
Sponsor Member

This is (arguably) a bug in the new StyledStrings library but I don't see this being release blocking.

@KristofferC KristofferC removed this from the 1.11 milestone May 8, 2024
@tecosaur
Copy link
Contributor

tecosaur commented May 8, 2024

I believe the fix is in 1.11-beta2 anyway?

@KristofferC
Copy link
Sponsor Member

All I know is that this issue is still open. Feel free to close if it has been addressed.

@tecosaur tecosaur closed this as completed May 8, 2024
lazarusA pushed a commit to lazarusA/julia that referenced this issue Jul 12, 2024
The least-bad idea I've had so far for fixing JuliaLang#53042.

I figure this fixes the bug raised there, and we can always switch to a
clearly-better solution if one appears.

The fact that only a string without annotations is equal to a
non-annotated string (in order to preserve the transitivity of
equality), makes the generic fallback methods for string comparison
insufficient.

As such, ==(::AnnoatedString, ::AbstractString) is implemented in
annotated.jl, but this issue re-appears when dealing with substrings.

The obvious solution is to just implement a specialised method for
substrings. This does seem potentially a bit whack-a-mole, but I'm
worried that cleverer solutions might come with subtle issues of their
own. For now, let's try the simple and obvious solution, and improve it
later if we can work out a nicer way of handling this issue in general.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!" kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants