Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textual format does not correctly decode unicode strings #192

Open
rogpeppe opened this issue Dec 11, 2019 · 0 comments
Open

textual format does not correctly decode unicode strings #192

rogpeppe opened this issue Dec 11, 2019 · 0 comments

Comments

@rogpeppe
Copy link

rogpeppe commented Dec 11, 2019

In JSON, the Unicode characters with code points 0x7f to 0xff can be encoded either as those characters directly, or with a Unicode escape sequence (e.g. \u00ff).

As such, JSON with either of these two alternatives should be treated the same by goavro.

Unfortunately, it does not do that.
This code demonstrates the issue: https://play.golang.org/p/FxpmTjfmI15

This issue means that it's not possible to take JSON that's been encoded with or round-tripped through a normal JSON encoder and decode it correctly with goavro.

For example, this means that Avro JSON data that's piped through the jq command can be corrupted:

% echo '"\u00ff"' | jq .
"ÿ"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant