Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Adaptive Filter Processor #32841

Closed
2 tasks
ppnaik1890 opened this issue May 3, 2024 · 3 comments
Closed
2 tasks

Proposal: Adaptive Filter Processor #32841

ppnaik1890 opened this issue May 3, 2024 · 3 comments
Labels
closed as inactive needs triage New item requiring triage Sponsor Needed New component seeking sponsor Stale

Comments

@ppnaik1890
Copy link

The purpose and use-cases of the new component

We will like to propose a new transform for OTel for adaptively filtering the metric and log collection based on external inputs.

Motivation:

The Adaptive Filter Processor dynamically adjusts the telemetry data based on the available budget, prioritizing metrics or logs that are most relevant or indicative of critical events. It optimizes transmission by adhering to external bandwidth limits, ensuring efficient data delivery without network congestion. This ensures that telemetry data is transmitted efficiently, even in resource-constrained environments.

Example configuration for the component

processors:
  # name of the processor 
  adaptivefilterprocessor:
    budget: 1GB/s
    cooloffperiod: 1m
    minthreshold: 0.8GB/s
    backofftimer: random/exponential
    defaultpriority: 2
    priorityorder:
      - priority: 1
          matchrule:
            metric: 
              - 'name == "my.metric" and resource.attributes["my_label"] == "abc123"'
            logs:
              - 'IsMatch(body, ".*password.*")' 
      - priority: 2
          matchrule:
            logs:
              - 'severity_number < SEVERITY_NUMBER_WARN'
           

The adaptive filter processor has the following parameters.

It first takes the allocated budget in terms of bandwidth for telemetry data processing and transmission for the cluster. A waiting period, known as the cooloffperiod, is observed before activating the filtering mechanism, in case the surge in bandwidth usage is temporary. The key parameter is the priorityorder which orders the metrics and logs in terms of priority. These priority are matched based on match rules on the labels present in the metrics and logs. Priority level 1 signifies the highest importance. It's important to note that if certain telemetry data falls into priority level 1, it is deemed crucial and transmitted regardless of the specified budget. Metrics and logs not covered by any priority rules in the priority order are assigned the defaultpriority. Furthermore, if a metric or log matches multiple priority rules, the higher priority level is chosen.

As this filter can be dynamically adjusted, it's necessary to return to the default configuration once the telemetry overload is resolved, specifically when the current bandwidth usage falls below the minthreshold. The cooloffperiod is utilized again to prevent sporadic drops in bandwidth utilization. However, it may happen that the issue requires more than the cooloffperiod which is when backofftimer triggers to reduce the checks by increasing the interval either randomly and exponentially.

Telemetry data types supported

metric and logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute and support it as a representative of the vendor.

Code Owner(s)

No response

Sponsor (optional)

No response

Additional context

No response

@ppnaik1890 ppnaik1890 added needs triage New item requiring triage Sponsor Needed New component seeking sponsor labels May 3, 2024
@ppnaik1890
Copy link
Author

If accepted we will like to contribute this enhancement.

Copy link
Contributor

github-actions bot commented Jul 3, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

github-actions bot commented Sep 1, 2024

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed as inactive needs triage New item requiring triage Sponsor Needed New component seeking sponsor Stale
Projects
None yet
Development

No branches or pull requests

1 participant