Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let json:decode/3 keep whitespaces #8809

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

dgud
Copy link
Contributor

@dgud dgud commented Sep 13, 2024

json:decode/3 always stripped leading whitespaces in the Rest binary, which could be problematic if user expected them.

E.g json:decode(<<"foo\n bar">>, ok, #{}) returned:
{<<"foo">>, ok, <<"bar">>} instead of
{<<"foo">>, ok, <<"\n bar">>}.

If Rest only contains whitespaces they are removed, so that the user can match on empty binary to know if they should continue the decoding loop.

E.g json:decode(<<"foo\n ">>, ok, #{}) still returns:
{<<"foo">>, ok, <<>>}

@dgud dgud changed the base branch from master to maint September 13, 2024 09:41
Copy link
Contributor

github-actions bot commented Sep 13, 2024

CT Test Results

    2 files     95 suites   57m 32s ⏱️
2 153 tests 2 105 ✅ 48 💤 0 ❌
2 512 runs  2 462 ✅ 50 💤 0 ❌

Results for commit 5b6e52e.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@dgud dgud self-assigned this Sep 13, 2024
@dgud dgud force-pushed the dgud/stdlib/json-decode-keep-ws/OTP-19227 branch from c142d08 to ceb3c11 Compare September 13, 2024 10:14
@dgud
Copy link
Contributor Author

dgud commented Sep 13, 2024

@michalmuskala Can you take a look..

@@ -1386,16 +1386,18 @@ object_key(_, Original, Skip, Acc, Stack, Decode) ->

continue(<<Rest/bits>>, Original, Skip, Acc, Stack0, Decode, Value) ->
case Stack0 of
[] -> terminate(Rest, Original, Skip, Acc, Value);
[] -> terminate(Rest, Rest, Acc, Value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the binary pattern optimisation since the tail is now used in two paces - I'm fairly sure it will significantly affect performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what we could see, looking at the beam asm,
the only difference was that it creates a sub-binary in the case when Stack = [],
which should be ok since that is only for the last "term" in the json string.

But I'm not fluent in beam-asm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fine, the optimization is disabled for this particular call but not the others. Should this become a performance problem, it's not a huge deal for the compiler to allow the reused Rest context to be passed alongside a new tail binary representing Rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed up a new variant that postpones the creation and only does when necessary.

@rickard-green rickard-green added team:VM Assigned to OTP team VM team:PS Assigned to OTP team PS and removed team:VM Assigned to OTP team VM labels Sep 16, 2024
`json:decode/3` always stripped leading whitespaces in the `Rest` binary,
which could be problematic if user expected them.

E.g `json:decode(<<"foo\n bar">>, ok, #{})` returned:
    `{<<"foo">>, ok, <<"bar">>}` instead of
    `{<<"foo">>, ok, <<"\n bar">>}`.

If `Rest` only contains whitespaces they are removed, so that the user
can match on empty binary to know if they should continue the decoding loop.

E.g `json:decode(<<"foo\n  ">>, ok, #{})` still returns:
    `{<<"foo">>, ok, <<>>}`
@dgud dgud force-pushed the dgud/stdlib/json-decode-keep-ws/OTP-19227 branch from ea95878 to 5b6e52e Compare September 16, 2024 09:01
@dgud dgud added the testing currently being tested, tag is used by OTP internal CI label Sep 20, 2024
@dgud dgud merged commit 2ef18fe into erlang:maint Sep 23, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PS Assigned to OTP team PS testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants