[pkg/ottl] Add conflict resolution strategies for `set` #31808

michalpristas · 2024-03-18T11:09:02Z

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

Related to a rename work here
it would be useful to have the option to specify conflict resolution strategy for set as well.

Describe the solution you'd like

scenarios i have in mind at the moment are

upsert: is the current behavior and will be default, in case target value is:
- missing: new field with new value is created
- present: value is overwritten
update:
- missing: no-op
- present: value is updated
fail:
- missing: new field with new value is created
- present: value is not updated, exiting with error
insert:
- missing: new field with new value is created
- present: no-op

we would end up with following signature: set(target, value, [Optional] conflict_resolution_strategy)

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-03-18T11:11:07Z

Pinging code owners:

pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley

See Adding Labels via Comments if you do not have permissions to add labels yourself.

TylerHelmuth · 2024-03-18T17:29:26Z

I like this idea, although we'll need to me sure we clearly define what missing means as it would be different when setting an attribute versus a otlp field. missing for an attribute could mean "the key is not present" (attributes["missing"] returns nil), but for an otlp field like span name it will always be present on the span telemetry, but it may be empty string.

I'd like to align the names with the attribute processor's: insert, update, and upsert. Any additional feature, such as fail, can be named independently.

michalpristas · 2024-03-20T05:48:02Z

with the PoC i have for this i was playing with idea of checking nil as you said.

for otlp fields i have a question though. empty string check is fairly easy and i don't think setting field to empty string is something done on purpose. what about other types numbers, bools. What's the behavior there? Can it happen that these are present and set to default values (0, false)? or would these also default to ""?
If default value is the case, than I'm not sure we can tell (especially with boolean) if it's false on purpose, or false because it's uninitialized.

TylerHelmuth · 2024-03-20T16:46:50Z

If default value is the case, than I'm not sure we can tell (especially with boolean) if it's false on purpose, or false because it's uninitialized.

Yes this will definitely be a problem. There are fields like dropped_attributes_count where 0 is a meaningful value. The metric's is_monotonic is another place.

michalpristas · 2024-03-21T07:02:16Z

i will try to play with these to figure out what the proper solution should be. but i can see that with insert mode and default value user can have mixed experience easily for these fields.

michalpristas · 2024-03-26T13:11:28Z

Playing with different fields today and yes, for some number/boolean fields it's most likely impossible to tell whether value is valid or default.

i see empty string values are usually uninitialized (missing)

so i was thinking about handling default value for number/boolean as not missing.
handling nil value as missing
and handling "" value as missing as well and documenting it very well.

user when working with transformations, specifying a field to be set should be field aware and can make best decision.
also considering upsert default behavior (so we don't change anything by default), presence of the field does not really matter

i'd love to hear your thoughts. maybe I'm trying to make it unnecessary complicated and not user friendly

TylerHelmuth · 2024-03-26T21:31:40Z

As you say, we'd stick with upsert as the default. With that in mind, for update or insert, I think treating any default value as "missing" probably makes the most sense as it is consistent.

TylerHelmuth · 2024-03-28T14:03:07Z

One thing to consider: both update and insert logic can already be handled by where clauses:

update: set(attribute["test"], "pass") where attributes["test"] != nil
insert: set(attribute["test"], "pass") where attributes["test"] == nil

@evan-bradley what are your thoughts.

TylerHelmuth · 2024-04-11T17:09:42Z

@michalpristas since the Where clause can technically handle update and insert, I think we only want to add this feature if:

The ergonomics of set(attribute["test"], "pass", "update") is better than set(attribute["test"], "pass") where attributes["test"] != nil
There are other conflict resolution strategies, like fail, that a Where clause can't handle.

Can you expand some more on the fail strategy? Is this a strategy for which you have an active user case?

michalpristas · 2024-04-16T09:13:01Z

for fail i don't have a use case in my pipelines. i just wanted to cover one last use case when it comes to target exists and missing/present matrix.

otherwise i'm fine with suggestions

kernelpanic77 · 2024-04-17T17:10:04Z

Can I take this up @TylerHelmuth @michalpristas ?

TylerHelmuth · 2024-04-17T18:57:17Z

@kernelpanic77 we're still discussing if we need this feature or if the Where clause is sufficient. We need to answer the questions from #31808 (comment).

For 1., I don't personally view set(attribute["test"], "pass", "update") as much better than set(attribute["test"], "pass") where attributes["test"] != nil because set(attribute["test"], "pass", "update") requires me to understand what update implies, but also I've been working with OTTL for a long time so the Where clause and != nil syntax is already known to me. If new users this is maybe not the case. What is your opinion?

For 2., I see the power in allowing future use cases, but I'd like to have some concrete use cases before basing the design on the possibility.

Normally adding an optional parameter would be no problem bc it is optional, but the missing conflict with this one makes it trickier. I dont want to add the feature unless we are sure we need it and we are sure about how we'll handle missing, since any changes to that logic would be a breaking change.

michalpristas · 2024-04-18T07:14:00Z

i agree, for me when i looked into ottl functions i did not even know that there is a possibility to do where statements. so for first time users having explicit arguments may play out well.
some of our users are not that technical (this is also why you see me creating issues about things that can be done using regex or other way). I'd like them to have similar experience when they migrate to otel. If their config becomes longer and less readable it would be a tough sell. (sorry for me ranting about not that related stuff)

we were talking in case of missing, attributes are easy, it is nil check. the same as with where statements
we were discussing default values to be considered missing. i think this is the trickiest part. also in case of where statements it requires users to know a bit about go as a language now to be able to assess default values.

I'm just trying to have this user in mind.

TylerHelmuth · 2024-04-18T15:44:51Z

i agree, for me when i looked into ottl functions i did not even know that there is a possibility to do where statements. so for first time users having explicit arguments may play out well.

This is not the first time I've heard this 😞 OTTL docs and discoverability need some love.

also in case of where statements it requires users to know a bit about go as a language now to be able to assess default values.

Very true, thats a really good argument for adding this feature.

evan-bradley · 2024-04-18T17:16:30Z

@evan-bradley what are your thoughts.

Sorry for the long delay. In general, I'm a fan of the "one correct way to do things" approach where we keep things as consistent as possible and try to build on existing patterns to help users solve problems. I feel like it makes the language more predictable.

I think this feature is good to consider, but my gut feeling is that we should continue to rely on where clauses. I have a few points supporting this:

@TylerHelmuth has already mentioned that a user would need to determine which each option means, which I think adds mental overhead when reading the statements.
This may not be a real issue, but this would open a footgun for users to introduce potentially non-obvious no-op statements:

set(attribute["test"], "pass", "update") where attributes["test"] == nil # Will never do anything

This would essentially introduce an embedded where clause, which would make statements less readable when one is still used:

# These statements are equivalent:
- set(attribute["test"], "pass", "update") where attributes["other"] == nil
- set(attribute["test"], "pass") where attribute["test"] != nil and attributes["other"] == nil

That said, there are a few points in favor of this feature:

Not all our users have programming experience, so the less syntax is involved, the better.
It's possible users really would just prefer to specify a named strategy instead of relying on nil checks in a where clause.

I'm not going to suggest closing this issue, but before we implement it, I would like to see feedback from others that this is something they would like to see. Thanks @michalpristas for the detailed issue and input on this proposal.

github-actions · 2024-06-18T03:31:31Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley

See Adding Labels via Comments if you do not have permissions to add labels yourself.

andrzej-stencel · 2024-06-24T10:12:14Z

As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies:

strategy "upsert" - existing function set
strategy "update" - function update
strategy "insert" - function insert
strategy "fail" - function insertOrFail

evan-bradley · 2024-07-01T15:10:56Z

@michalpristas Is this issue still relevant even if #31901 is closed?

evan-bradley · 2024-07-01T15:12:23Z

As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies

@andrzej-stencel Thanks for the input. How do you think separate functions compare to using OTTL's where clauses to achieve equivalent functionality?

andrzej-stencel · 2024-07-01T16:16:52Z

As an end user, the strategy parameter would be confusing to me. This is obviously a subjective, personal opinion. What would be easy to understand in my opinion is to have separate functions implementing different strategies

@andrzej-stencel Thanks for the input. How do you think separate functions compare to using OTTL's where clauses to achieve equivalent functionality?

If I understand correctly, the following statements would be equivalent (as already mentioned above in this thread):

strategy
- currently available OTTL with conditions
- proposed strategy parameter
- proposed new function
upsert
- set(attributes["myattr"], "myvalue")
- set(attributes["myattr"], "myvalue", "upsert")
- set(attributes["myattr"], "myvalue")
update
- set(attributes["myattr"], "myvalue") where attributes["myattr"] != nil
- set(attributes["myattr"], "myvalue", "update")
- update(attributes["myattr"], "myvalue")
insert
- set(attributes["myattr"], "myvalue") where attributes["myattr"] == nil
- set(attributes["myattr"], "myvalue", "insert")
- insert(attributes["myattr"], "myvalue")
insertOrFail
- not possible
- set(attributes["myattr"], "myvalue", "insertOrFail")
- insertOrFail(attributes["myattr"], "myvalue")

Looking at this, I definitely don't like the additional strategy parameter in the set function, as I think it's hard to read.

I think the new strategy functions update(), insert(), insertOrFail() increase the readability a bit. This might be especially noticeable in more complex scenarios. For example, I believe the following:

insert(attributes["service"], attributes["service.name"] where attributes["service.name"] != nil

is easier to read than:

set(attributes["service"], attributes["service.name"] where attributes["service.name"] != nil and attributes["service"] == nil

On the other hand, you could argue that the second option is more explicit. I suppose this goes down to a personal preference? 😄

michalpristas · 2024-07-03T12:59:27Z

can you please assign this to me?

evan-bradley · 2024-07-03T14:30:02Z

@michalpristas Assigned, thanks for volunteering. I'd like to hold off on a PR for now until we've confirmed that we would like to add this functionality.

@andrzej-stencel Thanks for the input. I'm clearly biased, but I still find the verbosity of the where clauses to be more obvious for what is going on. If I'm reading OTTL statements, it would be immediately obvious to me when set is run based on the where clause conditions, and it wouldn't be obvious that insert or update strategies may throw errors without reading the documentation. While not an official OTTL policy (and I'm open to discussing the merits of this), I generally would prefer for OTTL to have a single obvious way to do things and using where clauses for this is in line with that.

@TylerHelmuth what are your thoughts?

michalpristas · 2024-07-03T14:55:05Z

i like what @andrzej-stencel suggested, it allows you to use where clause if you want, but in case you're not that technical, you don't understand inner working and when field is nil or "" it provides you with very clear options

michalpristas · 2024-07-03T15:53:19Z

when i take a look for example pipeline we have here and produce something otel like (not working)

we can see clearly that spotting updates is much easier with function calls than with an ocean of where clause.
I care a lot about readability, and i think solution proposed by @andrzej-stencel is most readable so far

- set(attribute["user.id"], '{{{zoom.operator_id}}}') 
- set(attribute["user.email"], '{{{zoom.operator}}}') 
- set(attribute["user.id"], '{{{zoom.user.id}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.email"], '{{{zoom.user.email}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null && ctx.zoom?.user?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.full_name"], '{{zoom.old_values.first_name}} {{zoom.old_values.last_name}}') where ctx.zoom?.old_values?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.user.id}}}') where attribute["user.target.id"] != nil
- set(attribute["user.target.id"], '{{{zoom.user.id}}}') where attribute["user.target.id"] != nil
- set(attribute["user.target.email"], '{{{zoom.user.email}}}') where attribute["user.target.email"] != nil
- set(attribute["user.target.email"], '{{{zoom.user.email}}}') where attribute["user.target.email"] != nil
- set(attribute["user.target.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where attribute["user.target.full_name"] != nil
    where '(ctx.zoom?.old_values != null || ctx.zoom?.operator != null || ctx.zoom?.operator_id != null) && ctx.zoom?.user?.first_name != null'
- set(attribute["user.changes.id"], '{{{zoom.user.id}}}')
    where ctx.zoom?.old_values?.id != null && ctx.zoom?.old_values?.id != ctx.zoom?.user?.id
- set(attribute["user.changes.email"], '{{{zoom.user.email}}}')
    where ctx.zoom?.old_values?.email != null && ctx.zoom?.old_values?.email != ctx.zoom?.user?.email
- set(attribute["user.changes.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
    where ctx.zoom?.old_values?.first_name != null && ctx.zoom?.old_values?.last_name != null && (ctx.zoom?.old_values?.last_name != ctx.zoom?.user?.last_name || ctx.zoom?.old_values?.first_name != ctx.zoom?.user?.first_name)
- set(attribute["event.kind"], pipeline_error)

- set(attribute["user.id"], '{{{zoom.operator_id}}}') 
- set(attribute["user.email"], '{{{zoom.operator}}}') 
- set(attribute["user.id"], '{{{zoom.user.id}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.email"], '{{{zoom.user.email}}}')  where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null
- set(attribute["user.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}') where ctx.zoom?.operator == null && ctx.zoom?.operator_id == null && ctx.zoom?.user?.first_name != null
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.id"], '{{{zoom.old_values.id}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.email"], '{{{zoom.old_values.email}}}') 
- set(attribute["user.target.full_name"], '{{zoom.old_values.first_name}} {{zoom.old_values.last_name}}') where ctx.zoom?.old_values?.first_name != null
- update(attribute["user.target.id"], '{{{zoom.user.id}}}')
- update(attribute["user.target.id"], '{{{zoom.user.id}}}')
- update(attribute["user.target.email"], '{{{zoom.user.email}}}')
- update(attribute["user.target.email"], '{{{zoom.user.email}}}')
- update(attribute["user.target.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
    where '(ctx.zoom?.old_values != null || ctx.zoom?.operator != null || ctx.zoom?.operator_id != null) && ctx.zoom?.user?.first_name != null'
- set(attribute["user.changes.id"], '{{{zoom.user.id}}}')
    where ctx.zoom?.old_values?.id != null && ctx.zoom?.old_values?.id != ctx.zoom?.user?.id
- set(attribute["user.changes.email"], '{{{zoom.user.email}}}')
    where ctx.zoom?.old_values?.email != null && ctx.zoom?.old_values?.email != ctx.zoom?.user?.email
- set(attribute["user.changes.full_name"], '{{zoom.user.first_name}} {{zoom.user.last_name}}')
    where ctx.zoom?.old_values?.first_name != null && ctx.zoom?.old_values?.last_name != null && (ctx.zoom?.old_values?.last_name != ctx.zoom?.user?.last_name || ctx.zoom?.old_values?.first_name != ctx.zoom?.user?.first_name)
- set(attribute["event.kind"], pipeline_error)

TylerHelmuth · 2024-07-03T21:17:42Z

ctx.zoom?.operator_id == null

what magic is this

michalpristas · 2024-07-04T07:22:47Z

no worries this is just untranslated condition, i just did a quick fulltext transform to illustrate the case
it comes from our painless script

In painless to avoid NullPointerException exceptions you can use null safe operators, such as ?., and write your scripts to be null safe

michalpristas · 2024-07-10T15:41:28Z

@evan-bradley @TylerHelmuth

are we in an agreement here and can proceed or should we discuss during SIG or create a poll?
not sure what the process is

evan-bradley · 2024-07-10T15:58:18Z

@michalpristas If you could come to a SIG meeting, I think that would be valuable. If we can't come to an agreement, a poll could also be a good option.

michalpristas · 2024-07-10T16:51:14Z

closing this issue as discussed in SIG. we'll stick with where clauses and for missing capability of insertOrFail we could use filter, validation or extension points (to be delivered)

michalpristas added enhancement New feature or request needs triage New item requiring triage labels Mar 18, 2024

github-actions bot added the pkg/ottl label Mar 18, 2024

TylerHelmuth added discussion needed Community discussion needed and removed needs triage New item requiring triage labels Mar 18, 2024

This was referenced Mar 19, 2024

Weekly Report: 2024-03-12 - 2024-03-19 #31825

Closed

Weekly Report: 2024-03-12 - 2024-03-19 asuresh4/opentelemetry-collector-contrib#11544

Open

aleksmaus mentioned this issue Mar 21, 2024

[pkg/ottl] Add rename function #31901

Closed

michalpristas mentioned this issue Mar 25, 2024

OTTL additional converters #31930

Open

26 tasks

michalpristas mentioned this issue Mar 28, 2024

[pkg/ottl] Conflict resolution strategy for set function #32020

Closed

github-actions bot added the Stale label Jun 18, 2024

TylerHelmuth removed the Stale label Jun 18, 2024

evan-bradley assigned michalpristas Jul 3, 2024

michalpristas closed this as completed Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pkg/ottl] Add conflict resolution strategies for `set` #31808

[pkg/ottl] Add conflict resolution strategies for `set` #31808

michalpristas commented Mar 18, 2024 •

edited

Loading

github-actions bot commented Mar 18, 2024

TylerHelmuth commented Mar 18, 2024

michalpristas commented Mar 20, 2024

TylerHelmuth commented Mar 20, 2024

michalpristas commented Mar 21, 2024

michalpristas commented Mar 26, 2024

TylerHelmuth commented Mar 26, 2024

TylerHelmuth commented Mar 28, 2024 •

edited

Loading

TylerHelmuth commented Apr 11, 2024

michalpristas commented Apr 16, 2024

kernelpanic77 commented Apr 17, 2024

TylerHelmuth commented Apr 17, 2024

michalpristas commented Apr 18, 2024

TylerHelmuth commented Apr 18, 2024

evan-bradley commented Apr 18, 2024

github-actions bot commented Jun 18, 2024

andrzej-stencel commented Jun 24, 2024

evan-bradley commented Jul 1, 2024

evan-bradley commented Jul 1, 2024

andrzej-stencel commented Jul 1, 2024 •

edited

Loading

michalpristas commented Jul 3, 2024

evan-bradley commented Jul 3, 2024

michalpristas commented Jul 3, 2024 •

edited

Loading

michalpristas commented Jul 3, 2024 •

edited

Loading

TylerHelmuth commented Jul 3, 2024

michalpristas commented Jul 4, 2024 •

edited

Loading

michalpristas commented Jul 10, 2024

evan-bradley commented Jul 10, 2024

michalpristas commented Jul 10, 2024

[pkg/ottl] Add conflict resolution strategies for set #31808

[pkg/ottl] Add conflict resolution strategies for set #31808

Comments

michalpristas commented Mar 18, 2024 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Mar 18, 2024

TylerHelmuth commented Mar 18, 2024

michalpristas commented Mar 20, 2024

TylerHelmuth commented Mar 20, 2024

michalpristas commented Mar 21, 2024

michalpristas commented Mar 26, 2024

TylerHelmuth commented Mar 26, 2024

TylerHelmuth commented Mar 28, 2024 • edited Loading

TylerHelmuth commented Apr 11, 2024

michalpristas commented Apr 16, 2024

kernelpanic77 commented Apr 17, 2024

TylerHelmuth commented Apr 17, 2024

michalpristas commented Apr 18, 2024

TylerHelmuth commented Apr 18, 2024

evan-bradley commented Apr 18, 2024

github-actions bot commented Jun 18, 2024

andrzej-stencel commented Jun 24, 2024

evan-bradley commented Jul 1, 2024

evan-bradley commented Jul 1, 2024

andrzej-stencel commented Jul 1, 2024 • edited Loading

michalpristas commented Jul 3, 2024

evan-bradley commented Jul 3, 2024

michalpristas commented Jul 3, 2024 • edited Loading

michalpristas commented Jul 3, 2024 • edited Loading

TylerHelmuth commented Jul 3, 2024

michalpristas commented Jul 4, 2024 • edited Loading

michalpristas commented Jul 10, 2024

evan-bradley commented Jul 10, 2024

michalpristas commented Jul 10, 2024

[pkg/ottl] Add conflict resolution strategies for `set` #31808

[pkg/ottl] Add conflict resolution strategies for `set` #31808

michalpristas commented Mar 18, 2024 •

edited

Loading

TylerHelmuth commented Mar 28, 2024 •

edited

Loading

andrzej-stencel commented Jul 1, 2024 •

edited

Loading

michalpristas commented Jul 3, 2024 •

edited

Loading

michalpristas commented Jul 3, 2024 •

edited

Loading

michalpristas commented Jul 4, 2024 •

edited

Loading