-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply FLS to the contents of IgnoredSourceFieldMapper #109931
Apply FLS to the contents of IgnoredSourceFieldMapper #109931
Conversation
Hi @kkrik-es, I've created a changelog YAML for you. |
…rce_fls' into fix/synthetic-source/ignored_source_fls
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one question. Otherwise looks good.
server/src/main/java/org/elasticsearch/index/mapper/IgnoredSourceFieldMapper.java
Outdated
Show resolved
Hide resolved
// The field value contains an object or an array, reconstruct it from filtered values. | ||
visitor.binaryField(fieldInfo, IgnoredSourceFieldMapper.encodeFromMap(mappedNameValue, transformedField)); | ||
} else { | ||
// The field value contains no object, so no filtering happened. Use the original value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the topValue check here. If no filtering happened shouldn't transformedField
be equal to mappedNameValue.map()
? I think if a top level field is a leaf field then topValue
could be not a Map or List? (depending on which value is first pulled from the iterator)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map by construction contains the field name at the top, pointing to the contents. If the contents don't correspond to START_ARRAY or START_OBJECT, this is a leaf field with a single value. If the map is not empty, no filtering has been applied so we can use the incoming value as is.
On the other hand, if the field contains an object or an array, there may be changes deeper in the subobjects and subfields. In this case, we use transformedField
to play it safe, reconstructing the field value from the (potentially filtered) field contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but what if the original document (value
variable here) is like this:
field1: value
field2:
field3: value
The only allowed field is field1.
So transformedField
would look like this:
field1: value
Which would cause value
to be passed down. Which I think is incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this will be treated as two separate calls to this function, as there will be different StoredFields for each. The first one will pass as-is, the second one will be filtered and itstransformedField
will be empty.
I'll add a yaml test to verify, though the ones with - '{"name": "A", "object":{ "secret":"mission", "public":"interest" }}'
should be covering this iiuc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to checking the unfiltered map, to play it safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so if the field in ignored field is an object or an array, then the mappedNameValue.map()
contains one object / list. In that case transformedField
may not have all the sub keys of the original map (mappedNameValue.map()
). There for we need to serialize transformedField
.
Whereas if the field in ignored field is a value and transformedField
isn't empty, this means it is safe to use the original byte array as an optimization.
Maybe clarify this a bit in the comment? For example the fact that each map has only one top level entry, other entries or in different stored fields. And why it is safe to reuse value
incase the one entry is value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, ptal.
|
||
// Check if the field contains a single value or an object. | ||
@SuppressWarnings("unchecked") | ||
XContentBuilder xContentBuilder = (content instanceof Map<?, ?> objectMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can't you check for Map<String, ?>
and remove unchecked
and cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting:
error: Object cannot be safely cast to Map<String,?>
XContentBuilder xContentBuilder = (content instanceof Map<String, ?> objectMap)
I miss C++ templates..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never expected to see that statement from anybody.
|
||
|
||
--- | ||
Field with ignored_malformed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -399,6 +400,20 @@ public void binaryField(FieldInfo fieldInfo, byte[] value) throws IOException { | |||
Map<String, Object> transformedSource = filter(result.v2(), filter, 0); | |||
XContentBuilder xContentBuilder = XContentBuilder.builder(result.v1().xContent()).map(transformedSource); | |||
visitor.binaryField(fieldInfo, BytesReference.toBytes(BytesReference.bytes(xContentBuilder))); | |||
} else if (IgnoredSourceFieldMapper.NAME.equals(fieldInfo.name)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a unit test for this logic to FieldSubsetReaderTests
? Things like _source filtering are also unit tested there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source is easy to read and write, but _ignored_source
contents are encoded..
I think this requires either exposing parts of IgnoredSourceFieldMapper
and XContentDataHelper
as public, or making this a MapperServiceTestCase
subclass. Any preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making some methods of IgnoredSourceFieldMapper
and XContentDataHelper
public for the purpose of testing is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with MapperServiceTestCase
to check more code paths, still not short..
// The field value contains an object or an array, reconstruct it from filtered values. | ||
visitor.binaryField(fieldInfo, IgnoredSourceFieldMapper.encodeFromMap(mappedNameValue, transformedField)); | ||
} else { | ||
// The field value contains no object, so no filtering happened. Use the original value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but what if the original document (value
variable here) is like this:
field1: value
field2:
field3: value
The only allowed field is field1.
So transformedField
would look like this:
field1: value
Which would cause value
to be passed down. Which I think is incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This mapper stores source for other fields, so it needs to be separately checked.
This PR also adds storing the source of arrays when
ignore_dynamic_beyond_limit
applies. It's a small fix, but I can move it to a separate one if preferred.