Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JsonStringArrayExtractScalar function #22379

Merged
merged 3 commits into from
Jun 20, 2024
Merged

Conversation

Dith3r
Copy link
Member

@Dith3r Dith3r commented Jun 13, 2024

Description

Optimize common case of transform(cast(json_parse), json_extract_scalar) to be single and more performant call of JsonStringArrayExtractScalar.

Example query:

WITH t AS (SELECT a.id as id, a.data AS data FROM table_a AS a) 
SELECT b.data FROM table_b AS b, t AS a 
WHERE b.id = a.id AND  
CASE WHEN b.data <> 'second' THEN contains(transform(cast(json_parse(a.data) as array<json>), (x) -> json_extract_scalar("x", '$.name')), b.data) 
ELSE false END;

With change:

     │   CPU: 8.58s (94.13%), Scheduled: 8.93s (69.96%), Blocked: 1.89s (94.59%), Output: 3492632 rows (34.65MB)                                        

Without:

     │   CPU: 16.49s (95.06%), Scheduled: 17.24s (82.05%), Blocked: 2.81s (92.72%), Output: 3492632 rows (34.65MB)            

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(X) Release notes are required, with the following suggested text:

# General
* Improve performance of parsing JSON array and then extracting scalar value for each array entry

@cla-bot cla-bot bot added the cla-signed label Jun 13, 2024
@Dith3r Dith3r requested a review from sopel39 June 13, 2024 09:57
@Dith3r Dith3r changed the title Add JsonStringArrayExtract function [WIP] Add JsonStringArrayExtract function Jun 13, 2024
@Dith3r Dith3r marked this pull request as draft June 13, 2024 09:59
@Dith3r Dith3r changed the title [WIP] Add JsonStringArrayExtract function Add JsonStringArrayExtract function Jun 13, 2024
@Dith3r Dith3r changed the title Add JsonStringArrayExtract function Add JsonStringArrayExtractScalar function Jun 13, 2024
@Dith3r Dith3r force-pushed the ke/json branch 3 times, most recently from a8abebd to f314662 Compare June 14, 2024 13:02
@Dith3r Dith3r marked this pull request as ready for review June 14, 2024 13:02
continue;
}
JsonParser jsonParser = parser.readValueAsTree().traverse();
jsonParser.nextToken();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you check token is not null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all requierd checks are within extractor.

append(path.getScalarExtractor().extract(jsonParser), elementBuilder);
}
catch (JsonParseException e) {
append(null, elementBuilder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that comply with javadoc: https://trino.io/docs/current/functions/json.html#json_extract_scalar
What happens if malformed value is used? Maybe javadoc should be changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends, if the malformed value is json, is scalar or array/object.

@Dith3r Dith3r force-pushed the ke/json branch 2 times, most recently from 66e1707 to 935b149 Compare June 18, 2024 08:26
@Dith3r Dith3r requested a review from sopel39 June 18, 2024 08:49
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

void testEmptyArrayExtract()
{
Result result = extract("[]", "$.name");
assertThat(result.value()).isEqualTo(List.of());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isEqualTo(List.of()); -> isEmpty()

Copy link
Member Author

@Dith3r Dith3r Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result.value is Object and it will require to cast -> assertThat((List<?>) result.value()).isEmpty();

@Dith3r Dith3r force-pushed the ke/json branch 2 times, most recently from 504f718 to 6612645 Compare June 18, 2024 11:37
@Dith3r Dith3r requested a review from sopel39 June 18, 2024 11:39
@Dith3r Dith3r force-pushed the ke/json branch 2 times, most recently from 01cd6f3 to d794dd8 Compare June 18, 2024 12:11
@Dith3r Dith3r force-pushed the ke/json branch 3 times, most recently from f978d51 to 95923ec Compare June 19, 2024 13:29
Optimzie common case of transform(cast(json_parse(varchar_col) as array<json>), json -> json_extract_scalar(json, jsonPath)) case to
be single and more performant call of JsonStringArrayExtract.
@sopel39 sopel39 merged commit 04227e3 into trinodb:master Jun 20, 2024
96 checks passed
@github-actions github-actions bot added this to the 451 milestone Jun 20, 2024
@Test
void testEmptySliceExtract()
{
assertTrinoExceptionThrownBy(() -> assertions.expression("transform(cast(json_parse(a) as array<json>), x -> json_extract_scalar(\"x\", '$.name'))")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Angle brackets for parametric types are deprecated. Use parenthesis instead: array(json)

return new Result(full.type(), full.value, full.expression);
}

private static Optional<Expression> extractExpressionWithBindings(Plan plan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this method and the one below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get final expression how value was produced.

@@ -883,7 +906,7 @@ public ExpressionAssert assertThat()
.withRepresentation(ExpressionAssert.TYPE_RENDERER);
}

public record Result(Type type, Object value) {}
public record Result(Type type, Object value, Optional<Expression> expression) {}
Copy link
Member

@martint martint Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the optional expression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted expression from plan to check how value was produced. For example, if transformation was involved or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would it be empty?

Also, this change to QueryAssertions doesn't follow the intent and spirit of that API and abstraction. It should be changed to work with AssertJ like this:

assertThat(assertions.expression("..."))
    .matches(<ir expression>)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants