Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric Exporters: Specify/Support Batching per-data point on top of per-metric? #3494

Open
alxbl opened this issue May 10, 2023 · 0 comments
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback

Comments

@alxbl
Copy link
Member

alxbl commented May 10, 2023

I was initially going to open this with the opentelemetry-dotnet SDK (specifically for the OTLP exporter), but this could technically apply to all exporters or the OTLP exporter spec itself, so I'd like to see if there is interest on having this be part of the spec or if this is rather an implementation detail left up for interpretation to each exporter. In the latter case, I would/should probably open this issue specifically for the .NET SDK. Let me know.

What are you trying to achieve?

For large amount of data points for a single metric in a process, I would like the OTLP exporter to batch the data points across several OTLP requests to the collector to avoid prohibitively large messages that get rejected by the collector.

The problem right now is that batching happens at the metric-level, so if a process is reporting a single metric for many different services, this can lead to extremely large batches.

What did you expect to see?

The exporter could split the data over several messages and send them over a short time interval to reduce the spikiness of network bandwidth and support large amounts of data points.

Additional context.

For business reasons, we have a single process which ends up monitoring several thousands (even hundreds of thousands) of non-telemetry-aware devices and reports their up metric on their behalf (possibly also a target_info as well) I was performing some scale tests with 1-5 labels per data point and ran into some issues with default collector configurations at around 50,000-60,000 data points. The main issue is that the message ends up being too large to be accepted by the collector. I know I could modify the maximum message size in the collector, but by doing that, I am not able to address the burst of network bandwidth that will inevitably occur every export interval.

I experimented with adding batching per-data point, and I was able to scale up to 500,000 data points per process very easily (could probably go higher).

My main reason for creating this issue is that I would like to avoid having to maintain a fork of the SDK for these changes, so I want to know if this is something that is of interest to the OpenTelemetry maintainers or if it's an edge case that is best handled by a custom exporter/forking the existing OTLP exporter implementation.

I can provide a proof-of-concept patch (for .NET SDK) if that helps. The code is currently experimental and incomplete, because I was only attempting to put some numbers on the possible performance gains.

EDIT: If this is something there is interest for, I would be happy to contribute to the development or specification effort

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback
Projects
None yet
Development

No branches or pull requests

3 participants