Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support synthetic source together with ignore_malformed in histogram fields #109882

Merged

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Jun 18, 2024

Contributes to #106483.

@lkts lkts added the :StorageEngine/Mapping The storage related side of mappings label Jun 18, 2024
Copy link

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

* Typical use case is to gather field values from doc_values and append malformed values
* stored in a different field in case of ignore_malformed being enabled.
*/
public class CompositeSyntheticFieldLoader implements SourceLoader.SyntheticFieldLoader {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed after implementing this that this is very close to what ObjectMapper.SyntheticSourceFieldLoader does. Maybe we can unify some code later.

This is also an alternative approach to current implementation of f.e. SortedNumericDocValuesSyntheticFieldLoader where malformed values handling is implemented explicitly. That logic is repeated in multiple loaders that handle different doc values types. I obviously didn't refactor that in this PR but wanted to gather some thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think in a followup we can explore how to have a common base class for this class and ObjectMapper.SyntheticSourceFieldLoader.

@lkts
Copy link
Contributor Author

lkts commented Jun 18, 2024

@elasticmachine update branch

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* Typical use case is to gather field values from doc_values and append malformed values
* stored in a different field in case of ignore_malformed being enabled.
*/
public class CompositeSyntheticFieldLoader implements SourceLoader.SyntheticFieldLoader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think in a followup we can explore how to have a common base class for this class and ObjectMapper.SyntheticSourceFieldLoader.

private List<Object> values;

public MalformedValuesLayer(String fieldName) {
this.fieldName = fieldName + "._ignore_malformed";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"._ignore_malformed" should be a const somewhere.

if (v instanceof BytesRef r) {
XContentDataHelper.decodeAndWrite(b, r);
} else {
b.value(v);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for this one? I thought malformed values are always encoded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's f.e. text we skip encoding in some fields. This is for compatibility with existing code.

if (binaryValue == null) {
return;
}
b.startObject();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed from b.startObject(simpleName()); ?

Copy link
Contributor Author

@lkts lkts Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because composite loader writes that now. It is possible that there are malformed values so this is now not an object but an array that contains an object.

id: 2
- match:
_source:
latency: [{"values": [2.0], "counts": [2]}, {"values": [1.0], "counts": [1], "hello": "world"}, 123, 456, "fox"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We miss that we got [123, 456] as a pair.. Not a biggie, wonder if there's an easy way to catch the array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional, this is how it works everywhere.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just a few minor ones.

@lkts lkts merged commit 8bc5ecd into elastic:main Jun 20, 2024
15 checks passed
@lkts lkts deleted the feature/histogram_synthetic_source_ignore_malformed branch June 20, 2024 16:09
@felixbarny felixbarny mentioned this pull request Aug 6, 2024
50 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants