Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Add conflict resolution strategies for set #31808

Closed
michalpristas opened this issue Mar 18, 2024 · 29 comments
Closed

[pkg/ottl] Add conflict resolution strategies for set #31808

michalpristas opened this issue Mar 18, 2024 · 29 comments
Assignees
Labels
discussion needed Community discussion needed enhancement New feature or request pkg/ottl

Comments

@michalpristas
Copy link
Contributor

michalpristas commented Mar 18, 2024

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

Related to a rename work here
it would be useful to have the option to specify conflict resolution strategy for set as well.

Describe the solution you'd like

scenarios i have in mind at the moment are

  • upsert: is the current behavior and will be default, in case target value is:
    • missing: new field with new value is created
    • present: value is overwritten
  • update:
    • missing: no-op
    • present: value is updated
  • fail:
    • missing: new field with new value is created
    • present: value is not updated, exiting with error
  • insert:
    • missing: new field with new value is created
    • present: no-op

we would end up with following signature: set(target, value, [Optional] conflict_resolution_strategy)

Describe alternatives you've considered

No response

Additional context

No response

@michalpristas michalpristas added enhancement New feature or request needs triage New item requiring triage labels Mar 18, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

I like this idea, although we'll need to me sure we clearly define what missing means as it would be different when setting an attribute versus a otlp field. missing for an attribute could mean "the key is not present" (attributes["missing"] returns nil), but for an otlp field like span name it will always be present on the span telemetry, but it may be empty string.

I'd like to align the names with the attribute processor's: insert, update, and upsert. Any additional feature, such as fail, can be named independently.

@michalpristas
Copy link
Contributor Author

with the PoC i have for this i was playing with idea of checking nil as you said.

for otlp fields i have a question though. empty string check is fairly easy and i don't think setting field to empty string is something done on purpose. what about other types numbers, bools. What's the behavior there? Can it happen that these are present and set to default values (0, false)? or would these also default to ""?
If default value is the case, than I'm not sure we can tell (especially with boolean) if it's false on purpose, or false because it's uninitialized.

@TylerHelmuth
Copy link
Member

If default value is the case, than I'm not sure we can tell (especially with boolean) if it's false on purpose, or false because it's uninitialized.

Yes this will definitely be a problem. There are fields like dropped_attributes_count where 0 is a meaningful value. The metric's is_monotonic is another place.

@michalpristas
Copy link
Contributor Author

i will try to play with these to figure out what the proper solution should be. but i can see that with insert mode and default value user can have mixed experience easily for these fields.

@michalpristas
Copy link
Contributor Author

Playing with different fields today and yes, for some number/boolean fields it's most likely impossible to tell whether value is valid or default.

i see empty string values are usually uninitialized (missing)

so i was thinking about handling default value for number/boolean as not missing.
handling nil value as missing
and handling "" value as missing as well and documenting it very well.

user when working with transformations, specifying a field to be set should be field aware and can make best decision.
also considering upsert default behavior (so we don't change anything by default), presence of the field does not really matter

i'd love to hear your thoughts. maybe I'm trying to make it unnecessary complicated and not user friendly

@TylerHelmuth
Copy link
Member

As you say, we'd stick with upsert as the default. With that in mind, for update or insert, I think treating any default value as "missing" probably makes the most sense as it is consistent.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Mar 28, 2024

One thing to consider: both update and insert logic can already be handled by where clauses:

  • update: set(attribute["test"], "pass") where attributes["test"] != nil
  • insert: set(attribute["test"], "pass") where attributes["test"] == nil

@evan-bradley what are your thoughts.

@TylerHelmuth
Copy link
Member

@michalpristas since the Where clause can technically handle update and insert, I think we only want to add this feature if:

  1. The ergonomics of set(attribute["test"], "pass", "update") is better than set(attribute["test"], "pass") where attributes["test"] != nil
  2. There are other conflict resolution strategies, like fail, that a Where clause can't handle.

Can you expand some more on the fail strategy? Is this a strategy for which you have an active user case?

@michalpristas
Copy link
Contributor Author

for fail i don't have a use case in my pipelines. i just wanted to cover one last use case when it comes to target exists and missing/present matrix.

otherwise i'm fine with suggestions

@kernelpanic77
Copy link
Contributor

Can I take this up @TylerHelmuth @michalpristas ?

@TylerHelmuth
Copy link
Member

@kernelpanic77 we're still discussing if we need this feature or if the Where clause is sufficient. We need to answer the questions from #31808 (comment).

For 1., I don't personally view set(attribute["test"], "pass", "update") as much better than set(attribute["test"], "pass") where attributes["test"] != nil because set(attribute["test"], "pass", "update") requires me to understand what update implies, but also I've been working with OTTL for a long time so the Where clause and != nil syntax is already known to me. If new users this is maybe not the case. What is your opinion?

For 2., I see the power in allowing future use cases, but I'd like to have some concrete use cases before basing the design on the possibility.

Normally adding an optional parameter would be no problem bc it is optional, but the missing conflict with this one makes it trickier. I dont want to add the feature unless we are sure we need it and we are sure about how we'll handle missing, since any changes to that logic would be a breaking change.

@michalpristas
Copy link
Contributor Author

i agree, for me when i looked into ottl functions i did not even know that there is a possibility to do where statements. so for first time users having explicit arguments may play out well.
some of our users are not that technical (this is also why you see me creating issues about things that can be done using regex or other way). I'd like them to have similar experience when they migrate to otel. If their config becomes longer and less readable it would be a tough sell. (sorry for me ranting about not that related stuff)

we were talking in case of missing, attributes are easy, it is nil check. the same as with where statements
we were discussing default values to be considered missing. i think this is the trickiest part. also in case of where statements it requires users to know a bit about go as a language now to be able to assess default values.

I'm just trying to have this user in mind.

@TylerHelmuth
Copy link
Member

i agree, for me when i looked into ottl functions i did not even know that there is a possibility to do where statements. so for first time users having explicit arguments may play out well.

This is not the first time I've heard this 😞 OTTL docs and discoverability need some love.

also in case of where statements it requires users to know a bit about go as a language now to be able to assess default values.

Very true, thats a really good argument for adding this feature.

@evan-bradley
Copy link
Contributor

@evan-bradley what are your thoughts.

Sorry for the long delay. In general, I'm a fan of the "one correct way to do things" approach where we keep things as consistent as possible and try to build on existing patterns to help users solve problems. I feel like it makes the language more predictable.

I think this feature is good to consider, but my gut feeling is that we should continue to rely on where clauses. I have a few points supporting this:

  1. @TylerHelmuth has already mentioned that a user would need to determine which each option means, which I think adds mental overhead when reading the statements.
  2. This may not be a real issue, but this would open a footgun for users to introduce potentially non-obvious no-op statements:
set(attribute["test"], "pass", "update") where attributes["test"] == nil # Will never do anything
  1. This would essentially introduce an embedded where clause, which would make statements less readable when one is still used:
# These statements are equivalent:
- set(attribute["test"], "pass", "update") where attributes["other"] == nil
- set(attribute["test"], "pass") where attribute["test"] != nil and attributes["other"] == nil

That said, there are a few points in favor of this feature:

  1. Not all our users have programming experience, so the less syntax is involved, the better.
  2. It's possible users really would just prefer to specify a named strategy instead of relying on nil checks in a where clause.

I'm not going to suggest closing this issue, but before we implement it, I would like to see feedback from others that this is something they would like to see. Thanks @michalpristas for the detailed issue and input on this proposal.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@andrzej-stencel
Copy link
Member

As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies:

  • strategy "upsert" - existing function set
  • strategy "update" - function update
  • strategy "insert" - function insert
  • strategy "fail" - function insertOrFail

@evan-bradley
Copy link
Contributor

@michalpristas Is this issue still relevant even if #31901 is closed?

@evan-bradley
Copy link
Contributor

As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies

@andrzej-stencel Thanks for the input. How do you think separate functions compare to using OTTL's where clauses to achieve equivalent functionality?

@andrzej-stencel
Copy link
Member

andrzej-stencel commented Jul 1, 2024

As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies

@andrzej-stencel Thanks for the input. How do you think separate functions compare to using OTTL's where clauses to achieve equivalent functionality?

If I understand correctly, the following statements would be equivalent (as already mentioned above in this thread):

  • strategy

    • currently available OTTL with conditions
    • proposed strategy parameter
    • proposed new function
  • upsert

    • set(attributes["myattr"], "myvalue")
    • set(attributes["myattr"], "myvalue", "upsert")
    • set(attributes["myattr"], "myvalue")
  • update

    • set(attributes["myattr"], "myvalue") where attributes["myattr"] != nil
    • set(attributes["myattr"], "myvalue", "update")
    • update(attributes["myattr"], "myvalue")
  • insert

    • set(attributes["myattr"], "myvalue") where attributes["myattr"] == nil
    • set(attributes["myattr"], "myvalue", "insert")
    • insert(attributes["myattr"], "myvalue")
  • insertOrFail

    • not possible
    • set(attributes["myattr"], "myvalue", "insertOrFail")
    • insertOrFail(attributes["myattr"], "myvalue")

Looking at this, I definitely don't like the additional strategy parameter in the set function, as I think it's hard to read.

I think the new strategy functions update(), insert(), insertOrFail() increase the readability a bit. This might be especially noticeable in more complex scenarios. For example, I believe the following:

  • insert(attributes["service"], attributes["service.name"] where attributes["service.name"] != nil

is easier to read than:

  • set(attributes["service"], attributes["service.name"] where attributes["service.name"] != nil and attributes["service"] == nil

On the other hand, you could argue that the second option is more explicit. I suppose this goes down to a personal preference? 😄

@michalpristas
Copy link
Contributor Author

can you please assign this to me?

@evan-bradley
Copy link
Contributor

@michalpristas Assigned, thanks for volunteering. I'd like to hold off on a PR for now until we've confirmed that we would like to add this functionality.

@andrzej-stencel Thanks for the input. I'm clearly biased, but I still find the verbosity of the where clauses to be more obvious for what is going on. If I'm reading OTTL statements, it would be immediately obvious to me when set is run based on the where clause conditions, and it wouldn't be obvious that insert or update strategies may throw errors without reading the documentation. While not an official OTTL policy (and I'm open to discussing the merits of this), I generally would prefer for OTTL to have a single obvious way to do things and using where clauses for this is in line with that.

@TylerHelmuth what are your thoughts?

@michalpristas
Copy link
Contributor Author

michalpristas commented Jul 3, 2024

i like what @andrzej-stencel suggested, it allows you to use where clause if you want, but in case you're not that technical, you don't understand inner working and when field is nil or "" it provides you with very clear options

@michalpristas
Copy link
Contributor Author

michalpristas commented Jul 3, 2024

when i take a look for example pipeline we have here and produce something otel like (not working)

we can see clearly that spotting updates is much easier with function calls than with an ocean of where clause.
I care a lot about readability, and i think solution proposed by @andrzej-stencel is most readable so far

- set(attribute["user.id"], '{{{zoom.operator_id}}}') 
- set(attribute["user.email"], '{{{zoom.operator}}}') 
- set(attribute["user.id"], '{{{zoom.user.id}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.email"], '{{{zoom.user.email}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null && ctx.zoom?.user?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.full_name"], '{{zoom.old_values.first_name}} {{zoom.old_values.last_name}}') where ctx.zoom?.old_values?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.user.id}}}') where attribute["user.target.id"] != nil
- set(attribute["user.target.id"], '{{{zoom.user.id}}}') where attribute["user.target.id"] != nil
- set(attribute["user.target.email"], '{{{zoom.user.email}}}') where attribute["user.target.email"] != nil
- set(attribute["user.target.email"], '{{{zoom.user.email}}}') where attribute["user.target.email"] != nil
- set(attribute["user.target.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where attribute["user.target.full_name"] != nil
    where '(ctx.zoom?.old_values != null || ctx.zoom?.operator != null || ctx.zoom?.operator_id != null) && ctx.zoom?.user?.first_name != null'
- set(attribute["user.changes.id"], '{{{zoom.user.id}}}')
    where ctx.zoom?.old_values?.id != null && ctx.zoom?.old_values?.id != ctx.zoom?.user?.id
- set(attribute["user.changes.email"], '{{{zoom.user.email}}}')
    where ctx.zoom?.old_values?.email != null && ctx.zoom?.old_values?.email != ctx.zoom?.user?.email
- set(attribute["user.changes.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
    where ctx.zoom?.old_values?.first_name != null && ctx.zoom?.old_values?.last_name != null && (ctx.zoom?.old_values?.last_name != ctx.zoom?.user?.last_name || ctx.zoom?.old_values?.first_name != ctx.zoom?.user?.first_name)
- set(attribute["event.kind"], pipeline_error)
- set(attribute["user.id"], '{{{zoom.operator_id}}}') 
- set(attribute["user.email"], '{{{zoom.operator}}}') 
- set(attribute["user.id"], '{{{zoom.user.id}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.email"], '{{{zoom.user.email}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null && ctx.zoom?.user?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.full_name"], '{{zoom.old_values.first_name}} {{zoom.old_values.last_name}}') where ctx.zoom?.old_values?.first_name != null
- update(attribute["user.target.id"], '{{{zoom.user.id}}}')
- update(attribute["user.target.id"], '{{{zoom.user.id}}}')
- update(attribute["user.target.email"], '{{{zoom.user.email}}}')
- update(attribute["user.target.email"], '{{{zoom.user.email}}}')
- update(attribute["user.target.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
    where '(ctx.zoom?.old_values != null || ctx.zoom?.operator != null || ctx.zoom?.operator_id != null) && ctx.zoom?.user?.first_name != null'
- set(attribute["user.changes.id"], '{{{zoom.user.id}}}')
    where ctx.zoom?.old_values?.id != null && ctx.zoom?.old_values?.id != ctx.zoom?.user?.id
- set(attribute["user.changes.email"], '{{{zoom.user.email}}}')
    where ctx.zoom?.old_values?.email != null && ctx.zoom?.old_values?.email != ctx.zoom?.user?.email
- set(attribute["user.changes.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
    where ctx.zoom?.old_values?.first_name != null && ctx.zoom?.old_values?.last_name != null && (ctx.zoom?.old_values?.last_name != ctx.zoom?.user?.last_name || ctx.zoom?.old_values?.first_name != ctx.zoom?.user?.first_name)
- set(attribute["event.kind"], pipeline_error)

@TylerHelmuth
Copy link
Member

ctx.zoom?.operator_id == null

what magic is this

@michalpristas
Copy link
Contributor Author

michalpristas commented Jul 4, 2024

no worries this is just untranslated condition, i just did a quick fulltext transform to illustrate the case
it comes from our painless script

In painless to avoid NullPointerException exceptions you can use null safe operators, such as ?., and write your scripts to be null safe

@michalpristas
Copy link
Contributor Author

@evan-bradley @TylerHelmuth

are we in an agreement here and can proceed or should we discuss during SIG or create a poll?
not sure what the process is

@evan-bradley
Copy link
Contributor

@michalpristas If you could come to a SIG meeting, I think that would be valuable. If we can't come to an agreement, a poll could also be a good option.

@michalpristas
Copy link
Contributor Author

closing this issue as discussed in SIG. we'll stick with where clauses and for missing capability of insertOrFail we could use filter, validation or extension points (to be delivered)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed Community discussion needed enhancement New feature or request pkg/ottl
Projects
None yet
Development

No branches or pull requests

5 participants