Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(vrl): RFC for return expression #19828

Merged
merged 4 commits into from
Feb 15, 2024

Conversation

mladedav
Copy link
Contributor

@mladedav mladedav commented Feb 8, 2024

Partly based on #7496 but closer to a subset from #19766 which does not include any breaking changes such as adding a new reserved keyword to VRL.

Copy link
Contributor

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mladedav, this RFC looks good. I see return as a valuable feature to have. Left one question.

rfcs/2023-02-08-7496-vrl-return.md Outdated Show resolved Hide resolved
rfcs/2023-02-08-7496-vrl-return.md Outdated Show resolved Hide resolved
@mladedav
Copy link
Contributor Author

Thanks, @pront! Sorry, but I don't see any question?

### User Experience

- A `return` expression causes the VRL program to terminate, keeping any modifications made to the event.
- The keyword is used by itself with no parameters and `.` will be sent to the output.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth considering returning an expression, it can be . or a variable or something that resolves to a constant. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that is the end l goal to go for. I've tried to keep the scope small, but it can also be part of the RFC, with the current scope being probably the first implementation and having positional arguments being enhancement of that.

I opted for omitting it since it felt to me that writing

. = expr
return

is only slightly worse than

return expr

Do you think it's better to have it in this RFC or having it split as a later enhancement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think even if expressions are allowed, they should be optional, allowing simple return instead of the more explicit return .

But now I see that could be a contentious decision and some people can prefer that the language wants them to be explicit, so I'd like to also know your opinion on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would take inspiration from https://doc.rust-lang.org/std/keyword.return.html so I would like to return an expression. See here, we can use both the returned value and the target ..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify a bit more: would return foo use the value of foo as the event to emit? Or would it still use .? If the latter, what would the "returned value" be used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would take inspiration from https://doc.rust-lang.org/std/keyword.return.html so I would like to return an expression. See here, we can use both the returned value and the target ..

I'm not sure what you mean, would you like to have only return expr and no return without arguments?

I think there will always be a difference between Rust and VRL because the latter already implicitly uses . when returning at the end of a program while the former implicitly returns the result of the last expression at the end of a function. So in VRL, it makes much more sense to me to have a parameter-less return, returning the same thing as if it were the end of the program.

To clarify a bit more: would return foo use the value of foo as the event to emit? Or would it still use .? If the latter, what would the "returned value" be used for?

I would expect it to emit foo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify a bit more: would return foo use the value of foo as the event to emit? Or would it still use .? If the latter, what would the "returned value" be used for?

I would expect it to emit foo.

Likewise. I'd suggest we could allow return both with and without an "argument". If an "argument" is provided, then it would return that, otherwise it would return .. My thinking is that return X could be useful for VRL conditions that have to evaluate to a boolean (where you might want to return true or return false early) whereas just return would be more common in remap programs where you emit an entire event.

- New keywords that are not currently a reserved keyword can be added to the language. This would, however, constitute a breaking change.
- This feature can also be rejected as it does not add any functionality that cannot be currently expressed.

## Outstanding Questions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen if the return keyword is used in a closure like:

tally = {}
for_each(array!(.tags)) -> |index, value| {
    if index == 1 {
    return
  }
}

? My instinct is to disallow the return keyword inside of closures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few options:

  1. Disallow return inside closures
  2. Stop the closure execution
  3. Stop the VRL program and return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like option 1 the best. We could always migrate to option 2 or 3 in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect option 2 because that's what most languages do. While 3 could be useful, I think that would be too surprising for many users so I would not use that with the return keyword.

For option 2, it also does not make sense to allow argumentless returns, because there's nothing like ., so I would deny them in the first iteration, if that's what we end up on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that option 2 is probably the most familiar to users. I was thinking of starting with option 1 purely because it'd likely be easier to implement, but if we can implement option 2 that would be preference.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions, but overall this proposal makes sense to me and would be a nice usability improvement.

@mladedav
Copy link
Contributor Author

I've added return <expression> to scope and explicitly put return expressions inside closures out of scope.

Feel free to comment if I've misunderstood the discussions and you'd for example rather have the behavior inside closures defined here.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! I think what you have here now sounds good to me. I'd like to get @pront's review too.

@pront
Copy link
Contributor

pront commented Feb 15, 2024

Two concerns:

  1. I am not sure about implicitly returning ., can we always require an expression?
  2. Regarding making closures a special case, let's make sure we throw a compilation error if we detect a return inside the body of a closure.

@mladedav
Copy link
Contributor Author

Ok, I've changed the proposal to not include the implicit returned value. As long as you and @jszwedko agree on this, I have no problem with that. It's also possible to change it and add the implicit remove later without a breaking change if you change your mind later, though I don't think it's going to be an issue since return . vs return is very little overhead.

I've added a line about returns in closure being a compilation error in the user experience section.

@jszwedko
Copy link
Member

Thanks @mladedav !

Given allowing naked return statements in the future (that would return .) wouldn't be a breaking change, I'm ok with preceeding with simply return <expression> for now and considering it later to keep this unblocked.

@jszwedko jszwedko requested a review from pront February 15, 2024 14:49
@jszwedko jszwedko added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Feb 15, 2024
@jszwedko
Copy link
Member

Thanks for this proposal @mladedav ! I realized I'm not sure: is this something you might be interested in implementing?

@jszwedko jszwedko added this pull request to the merge queue Feb 15, 2024
Copy link

Regression Detector Results

Run ID: 60b99b82-4463-4a6a-a973-50e9ea15f556
Baseline: 342b48c
Comparison: 2f1c785
Total CPUs: 7

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI
syslog_log2metric_humio_metrics ingress throughput +2.73 [+2.55, +2.91]
datadog_agent_remap_blackhole_acks ingress throughput +1.33 [+1.22, +1.45]
http_text_to_http_json ingress throughput +0.82 [+0.69, +0.96]
http_elasticsearch ingress throughput +0.74 [+0.68, +0.81]
fluent_elasticsearch ingress throughput +0.61 [+0.12, +1.09]
socket_to_socket_blackhole ingress throughput +0.35 [+0.28, +0.43]
http_to_http_noack ingress throughput +0.17 [+0.07, +0.26]
syslog_log2metric_tag_cardinality_limit_blackhole ingress throughput +0.09 [-0.03, +0.22]
syslog_splunk_hec_logs ingress throughput +0.08 [+0.03, +0.14]
http_to_http_json ingress throughput +0.06 [-0.02, +0.14]
http_to_http_acks ingress throughput +0.03 [-1.29, +1.36]
datadog_agent_remap_datadog_logs_acks ingress throughput +0.03 [-0.06, +0.12]
splunk_hec_indexer_ack_blackhole ingress throughput +0.00 [-0.14, +0.15]
splunk_hec_to_splunk_hec_logs_acks ingress throughput +0.00 [-0.16, +0.16]
enterprise_http_to_http ingress throughput -0.03 [-0.12, +0.05]
splunk_hec_to_splunk_hec_logs_noack ingress throughput -0.07 [-0.18, +0.04]
http_to_s3 ingress throughput -0.17 [-0.44, +0.11]
otlp_http_to_blackhole ingress throughput -0.52 [-0.67, -0.38]
otlp_grpc_to_blackhole ingress throughput -0.61 [-0.70, -0.53]
datadog_agent_remap_datadog_logs ingress throughput -0.72 [-0.81, -0.62]
syslog_loki ingress throughput -0.72 [-0.79, -0.65]
file_to_blackhole egress throughput -0.75 [-3.34, +1.84]
syslog_log2metric_splunk_hec_metrics ingress throughput -1.55 [-1.69, -1.42]
syslog_humio_logs ingress throughput -2.08 [-2.20, -1.97]
splunk_hec_route_s3 ingress throughput -2.15 [-2.64, -1.66]
syslog_regex_logs2metric_ddmetrics ingress throughput -2.16 [-2.29, -2.03]
datadog_agent_remap_blackhole ingress throughput -2.26 [-2.36, -2.15]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

@pront
Copy link
Contributor

pront commented Feb 15, 2024

Regression Detector Results

Run ID: 60b99b82-4463-4a6a-a973-50e9ea15f556 Baseline: 342b48c Comparison: 2f1c785 Total CPUs: 7

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Fine details of change detection per experiment

Explanation

It's funny that this check ran since we only added an .md file.

Merged via the queue into vectordotdev:master with commit 2f1c785 Feb 15, 2024
50 of 52 checks passed
@mladedav
Copy link
Contributor Author

@jszwedko Yeah, I'd like to give it a shot. I'm just not exactly sure about timeframes and such .

@jszwedko
Copy link
Member

@jszwedko Yeah, I'd like to give it a shot. I'm just not exactly sure about timeframes and such .

No rush!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: rfc no-changelog Changes in this PR do not need user-facing explanations in the release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants