Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix excessive escaping when using ucl_object_fromstring() #262

Merged
merged 1 commit into from
Feb 1, 2024

Conversation

flowerysong
Copy link
Contributor

UCL_STRING_ESCAPE might be useful for something (though I'm not sure what) but it's not good default behaviour. Strings are escaped during output to meet the needs of the output format, so adding escape sequences while building the object results in things being escaped twice.

UCL_STRING_ESCAPE might be useful for something but it's not good
default behaviour. Strings are escaped during output to meet the needs
of the output format, so adding escape sequences while building the
object results in things being escaped twice.
@vstakhov
Copy link
Owner

This is clearly an example of breaking POLA policy. If one wants a behaviour to skip escaping, it is always possible to use ucl_object_fromstring_common with UCL_STRING_RAW flag. I don't want to switch the default behaviour of ucl_object_fromstring as it is clearly a major breaking change.

@flowerysong
Copy link
Contributor Author

I think it's far more astonishing that ucl_object_fromstring() is unsafe to use (I was certainly surprised.) Ultimately it's up to you, but POLA doesn't mean "never fix major bugs."

@vstakhov
Copy link
Owner

I understand and share your frustration: the original decision of the default behaviour of ucl_object_fromstring was definitely not right as it led to double escaping in emitter/parser chains. I would not say that this behaviour is unsafe, as it is actually too safe. The reason why I don't like the idea to merge it is that it might break some use cases when an API user expects that all data imported by this function is perfetly JSON safe. With your change this assumption is broken. And whilst I also think that it is a right thing I don't think that it is a good reason to break the (potentially) existing code after all.

@flowerysong
Copy link
Contributor Author

I called it "unsafe" because it destroys the integrity of the data that you hand it. We discovered this behaviour in a part of our program that reads in data from a JSON file, does some processing, then writes out the JSON file. Without knowing the number of times the file has gone through this process we cannot retrieve the original value.

Take as an example:

{
  "foo": "\\\\n"
}

This value could be \\n that has been processed once or \n that has been processed twice. There's no way to tell just by looking at the data.

It's simple to avoid this behaviour once you know it exists, but I believe that fixing the broken interface is better than leaving it there as a trap for the unwary.

@vstakhov
Copy link
Owner

Please don't close this issue, I'm still thinking of a better choice here. I think you are right but I just need to sit and check the use cases more carefully. Thank you!

@vstakhov vstakhov reopened this May 17, 2022
@dch
Copy link

dch commented Jul 15, 2023

changing existing API is hard, but perhaps its possible to add an additional API that addresses this.

@vstakhov
Copy link
Owner

vstakhov commented Feb 1, 2024

I think this should be really done

@vstakhov vstakhov merged commit 07a7dc3 into vstakhov:master Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants