Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply FLS to the contents of IgnoredSourceFieldMapper #109931

Merged

Conversation

kkrik-es
Copy link
Contributor

This mapper stores source for other fields, so it needs to be separately checked.

This PR also adds storing the source of arrays when ignore_dynamic_beyond_limit applies. It's a small fix, but I can move it to a separate one if preferred.

@elasticsearchmachine
Copy link
Collaborator

Hi @kkrik-es, I've created a changelog YAML for you.

@kkrik-es kkrik-es requested a review from martijnvg June 19, 2024 15:53
@kkrik-es kkrik-es marked this pull request as ready for review June 19, 2024 15:53
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one question. Otherwise looks good.

// The field value contains an object or an array, reconstruct it from filtered values.
visitor.binaryField(fieldInfo, IgnoredSourceFieldMapper.encodeFromMap(mappedNameValue, transformedField));
} else {
// The field value contains no object, so no filtering happened. Use the original value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the topValue check here. If no filtering happened shouldn't transformedField be equal to mappedNameValue.map()? I think if a top level field is a leaf field then topValue could be not a Map or List? (depending on which value is first pulled from the iterator)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map by construction contains the field name at the top, pointing to the contents. If the contents don't correspond to START_ARRAY or START_OBJECT, this is a leaf field with a single value. If the map is not empty, no filtering has been applied so we can use the incoming value as is.

On the other hand, if the field contains an object or an array, there may be changes deeper in the subobjects and subfields. In this case, we use transformedField to play it safe, reconstructing the field value from the (potentially filtered) field contents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but what if the original document (value variable here) is like this:

field1: value
field2:
  field3: value

The only allowed field is field1.

So transformedField would look like this:

field1: value

Which would cause value to be passed down. Which I think is incorrect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this will be treated as two separate calls to this function, as there will be different StoredFields for each. The first one will pass as-is, the second one will be filtered and itstransformedField will be empty.

I'll add a yaml test to verify, though the ones with - '{"name": "A", "object":{ "secret":"mission", "public":"interest" }}' should be covering this iiuc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to checking the unfiltered map, to play it safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so if the field in ignored field is an object or an array, then the mappedNameValue.map() contains one object / list. In that case transformedField may not have all the sub keys of the original map (mappedNameValue.map()). There for we need to serialize transformedField.

Whereas if the field in ignored field is a value and transformedField isn't empty, this means it is safe to use the original byte array as an optimization.

Maybe clarify this a bit in the comment? For example the fact that each map has only one top level entry, other entries or in different stored fields. And why it is safe to reuse value incase the one entry is value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, ptal.


// Check if the field contains a single value or an object.
@SuppressWarnings("unchecked")
XContentBuilder xContentBuilder = (content instanceof Map<?, ?> objectMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can't you check for Map<String, ?> and remove unchecked and cast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting:

 error: Object cannot be safely cast to Map<String,?>
        XContentBuilder xContentBuilder = (content instanceof Map<String, ?> objectMap)

I miss C++ templates..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never expected to see that statement from anybody.



---
Field with ignored_malformed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -399,6 +400,20 @@ public void binaryField(FieldInfo fieldInfo, byte[] value) throws IOException {
Map<String, Object> transformedSource = filter(result.v2(), filter, 0);
XContentBuilder xContentBuilder = XContentBuilder.builder(result.v1().xContent()).map(transformedSource);
visitor.binaryField(fieldInfo, BytesReference.toBytes(BytesReference.bytes(xContentBuilder)));
} else if (IgnoredSourceFieldMapper.NAME.equals(fieldInfo.name)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a unit test for this logic to FieldSubsetReaderTests? Things like _source filtering are also unit tested there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source is easy to read and write, but _ignored_source contents are encoded..

I think this requires either exposing parts of IgnoredSourceFieldMapper and XContentDataHelper as public, or making this a MapperServiceTestCase subclass. Any preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making some methods of IgnoredSourceFieldMapper and XContentDataHelper public for the purpose of testing is ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with MapperServiceTestCase to check more code paths, still not short..

// The field value contains an object or an array, reconstruct it from filtered values.
visitor.binaryField(fieldInfo, IgnoredSourceFieldMapper.encodeFromMap(mappedNameValue, transformedField));
} else {
// The field value contains no object, so no filtering happened. Use the original value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but what if the original document (value variable here) is like this:

field1: value
field2:
  field3: value

The only allowed field is field1.

So transformedField would look like this:

field1: value

Which would cause value to be passed down. Which I think is incorrect?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kkrik-es kkrik-es merged commit 9c603a3 into elastic:main Jun 21, 2024
15 checks passed
@kkrik-es kkrik-es mentioned this pull request Jun 27, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants