-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-4461] Add Selected.flattenedSchema #10766
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just some minor suggestions
Also: 💯 for your branch naming
// are being accessed. Currently Beam only supports wildcard descriptors. | ||
// Once BEAM-4457 is fixed, fix this. | ||
@FieldAccess("selectFields") | ||
final FieldAccessDescriptor fieldAccess = FieldAccessDescriptor.withAllFields(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a minute to understand what's going on with "selectFields" here and in process
. I had no idea that the value on FieldAccess
could be an id referencing a member FieldAccessDescriptor
until I dug into the code. Let's document that as part of BEAM-9217 as well.
|
||
// TODO: This should be the same as resolved so that Beam knows which fields | ||
// are being accessed. Currently Beam only supports wildcard descriptors. | ||
// Once BEAM-4457 is fixed, fix this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does BEAM-4457 unblock this? it doesn't mention anything about wildcard descriptors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an old comment. I don't know that we've fully decided what to do about BEAM-4457
|
||
/** | ||
* For nested fields, concatenate all the names separated by a _ character in the flattened | ||
* schema. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Field names can contain an underscore, right? So in theory we could still end up with a non-unique name here if the schema had two nested fields like:
- layer1.layer2_layer3 -> "layer1_layer2_layer3"
- layer1.layer2.layer3 -> "layer1_layer2_layer3"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, though I think we should maybe make underscore a reserved character in fields. We also don't prevent users from putting dots in field names, but all sorts of things will break if they do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added withFieldNameAs to allow the user to rename fields
* For nested fields, keep just the most-nested field name. Will fail if this name is not | ||
* unique. | ||
*/ | ||
public Flattened<T> keepMostNestedField() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe keepMostNestedFieldName
for consistency? up to you
/** | ||
* Selects every leaf-level field. This results in a a nested schema being flattened into a single | ||
* top-level schema. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably indicate it will concat field names with underscores
/** | ||
* This is the default naming policy for naming fields. Every field name in the path to a given | ||
* field is concated with _ characters. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bike-shed: Nothing in this file actually makes this a default, seems to be a hold-over from when the function was in Unnest
/** | ||
* This policy keeps the raw nested field name. If two differently-nested fields have the same | ||
* name, unnesting will fail with this policy. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar bike-shed: This mentions unnesting which isn't a thing after this change, maybe change to flattening?
@@ -90,4 +91,71 @@ public void describeTo(Description description) { | |||
description.appendText(expected.toString()); | |||
} | |||
} | |||
|
|||
public static class RowFieldMatcher extends BaseMatcher<Row> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks nifty, but maybe made it into this PR by accident? I don't see it used anywhere
* <p>Note that currently array and map values are not unnested. | ||
*/ | ||
@Experimental(Kind.SCHEMAS) | ||
public class Unnest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there's no concern with removing Unnest outright (rather than deprecating for a period and pointing to Select.flattenedSchema
) since this is all Experimental?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah - should be fine I think.
.build(); | ||
return withNestedField(field, fieldAccess); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a test for this in FieldAccessDescriptorTest
, and adding a docstring
retest this please |
2 similar comments
retest this please |
retest this please |
4618ae8
to
dca23a2
Compare
R: @TheNeuralBit