Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-4461] Add Selected.flattenedSchema #10766

Merged
merged 3 commits into from
Feb 8, 2020

Conversation

reuvenlax
Copy link
Contributor

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just some minor suggestions

Also: 💯 for your branch naming

// are being accessed. Currently Beam only supports wildcard descriptors.
// Once BEAM-4457 is fixed, fix this.
@FieldAccess("selectFields")
final FieldAccessDescriptor fieldAccess = FieldAccessDescriptor.withAllFields();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a minute to understand what's going on with "selectFields" here and in process. I had no idea that the value on FieldAccess could be an id referencing a member FieldAccessDescriptor until I dug into the code. Let's document that as part of BEAM-9217 as well.


// TODO: This should be the same as resolved so that Beam knows which fields
// are being accessed. Currently Beam only supports wildcard descriptors.
// Once BEAM-4457 is fixed, fix this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does BEAM-4457 unblock this? it doesn't mention anything about wildcard descriptors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old comment. I don't know that we've fully decided what to do about BEAM-4457


/**
* For nested fields, concatenate all the names separated by a _ character in the flattened
* schema.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field names can contain an underscore, right? So in theory we could still end up with a non-unique name here if the schema had two nested fields like:

  • layer1.layer2_layer3 -> "layer1_layer2_layer3"
  • layer1.layer2.layer3 -> "layer1_layer2_layer3"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, though I think we should maybe make underscore a reserved character in fields. We also don't prevent users from putting dots in field names, but all sorts of things will break if they do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added withFieldNameAs to allow the user to rename fields

* For nested fields, keep just the most-nested field name. Will fail if this name is not
* unique.
*/
public Flattened<T> keepMostNestedField() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe keepMostNestedFieldName for consistency? up to you

/**
* Selects every leaf-level field. This results in a a nested schema being flattened into a single
* top-level schema.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably indicate it will concat field names with underscores

/**
* This is the default naming policy for naming fields. Every field name in the path to a given
* field is concated with _ characters.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bike-shed: Nothing in this file actually makes this a default, seems to be a hold-over from when the function was in Unnest

/**
* This policy keeps the raw nested field name. If two differently-nested fields have the same
* name, unnesting will fail with this policy.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar bike-shed: This mentions unnesting which isn't a thing after this change, maybe change to flattening?

@@ -90,4 +91,71 @@ public void describeTo(Description description) {
description.appendText(expected.toString());
}
}

public static class RowFieldMatcher extends BaseMatcher<Row> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nifty, but maybe made it into this PR by accident? I don't see it used anywhere

* <p>Note that currently array and map values are not unnested.
*/
@Experimental(Kind.SCHEMAS)
public class Unnest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's no concern with removing Unnest outright (rather than deprecating for a period and pointing to Select.flattenedSchema) since this is all Experimental?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - should be fine I think.

.build();
return withNestedField(field, fieldAccess);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test for this in FieldAccessDescriptorTest, and adding a docstring

@TheNeuralBit TheNeuralBit self-requested a review February 7, 2020 02:08
@reuvenlax
Copy link
Contributor Author

retest this please

2 similar comments
@reuvenlax
Copy link
Contributor Author

retest this please

@reuvenlax
Copy link
Contributor Author

retest this please

@reuvenlax reuvenlax merged commit 2663354 into apache:master Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants