Poor performance of EMF Exporter causes metric lost in the long run #388

bjrara · 2021-03-07T06:52:31Z

What happened

Metrics exported by EMF exporter were lost by 1-3 minutes during the long run, causing gaps in the dashboard. Followed by a false burst, it automatically recovered from the stuck.

Usually metrics were reported at a constant rate, but when the collector "stucked", it was unsteady.

After adding more debugging logs, we found it took 30 seconds to digest one chunk of metrics by average, e.g. metrics from one Pod in terms of using prometheus receiver.

2021-03-05T22:47:12.090Z	INFO	[email protected]/emf_exporter.go:125	Start processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.39.233","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}
2021-03-05T22:47:48.359Z	INFO	[email protected]/emf_exporter.go:152	Finish processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.39.233","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}

It took 3 minutes to finish the same process at the time of metrics lost.

2021-03-05T22:44:37.697Z	INFO	[email protected]/emf_exporter.go:125	Start processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.39.233","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}
2021-03-05T22:47:11.933Z	INFO	[email protected]/emf_exporter.go:152	Finish processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.39.233","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}

Root cause

Although metrics are grouped before the pushing happens, every logEventBatch is sent one by one, causing high network latencies, which drags down the performance.

How to solve it

Pushing EMF logs in batch.

The text was updated successfully, but these errors were encountered:

bjrara · 2021-03-07T07:05:44Z

The performance is improved from 30s -> 200ms after applying changes from open-telemetry/opentelemetry-collector-contrib#2572

2021-03-06T06:56:27.196Z	INFO	[email protected]/emf_exporter.go:125	Start processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.42.106","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}
2021-03-06T06:56:27.399Z	INFO	[email protected]/emf_exporter.go:162	Finish processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.42.106","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}

bjrara · 2021-03-12T18:02:19Z

Close this issue because the fix is merged.

bjrara self-assigned this Mar 7, 2021

bjrara mentioned this issue Mar 7, 2021

Fix concurrency in emf exporter open-telemetry/opentelemetry-collector-contrib#2571

Merged

bjrara mentioned this issue Mar 7, 2021

EMF exporter performance: send EMF logs in batches open-telemetry/opentelemetry-collector-contrib#2572

Merged

bjrara closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance of EMF Exporter causes metric lost in the long run #388

Poor performance of EMF Exporter causes metric lost in the long run #388

bjrara commented Mar 7, 2021

bjrara commented Mar 7, 2021 •

edited

Loading

bjrara commented Mar 12, 2021

Poor performance of EMF Exporter causes metric lost in the long run #388

Poor performance of EMF Exporter causes metric lost in the long run #388

Comments

bjrara commented Mar 7, 2021

What happened

Root cause

How to solve it

bjrara commented Mar 7, 2021 • edited Loading

bjrara commented Mar 12, 2021

bjrara commented Mar 7, 2021 •

edited

Loading