Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance of EMF Exporter causes metric lost in the long run #388

Closed
bjrara opened this issue Mar 7, 2021 · 2 comments
Closed

Poor performance of EMF Exporter causes metric lost in the long run #388

bjrara opened this issue Mar 7, 2021 · 2 comments
Assignees

Comments

@bjrara
Copy link
Member

bjrara commented Mar 7, 2021

What happened

Metrics exported by EMF exporter were lost by 1-3 minutes during the long run, causing gaps in the dashboard. Followed by a false burst, it automatically recovered from the stuck.

image

Usually metrics were reported at a constant rate, but when the collector "stucked", it was unsteady.

image

After adding more debugging logs, we found it took 30 seconds to digest one chunk of metrics by average, e.g. metrics from one Pod in terms of using prometheus receiver.

2021-03-05T22:47:12.090Z	INFO	[email protected]/emf_exporter.go:125	Start processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.39.233","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}
2021-03-05T22:47:48.359Z	INFO	[email protected]/emf_exporter.go:152	Finish processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.39.233","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}

It took 3 minutes to finish the same process at the time of metrics lost.

2021-03-05T22:44:37.697Z	INFO	[email protected]/emf_exporter.go:125	Start processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.39.233","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}
2021-03-05T22:47:11.933Z	INFO	[email protected]/emf_exporter.go:152	Finish processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.39.233","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}

Root cause

Although metrics are grouped before the pushing happens, every logEventBatch is sent one by one, causing high network latencies, which drags down the performance.

How to solve it

Pushing EMF logs in batch.

@bjrara
Copy link
Member Author

bjrara commented Mar 7, 2021

The performance is improved from 30s -> 200ms after applying changes from open-telemetry/opentelemetry-collector-contrib#2572

2021-03-06T06:56:27.196Z	INFO	[email protected]/emf_exporter.go:125	Start processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.42.106","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}
2021-03-06T06:56:27.399Z	INFO	[email protected]/emf_exporter.go:162	Finish processing resource metrics	{"component_kind": "exporter", "component_type": "awsemf", "component_name": "awsemf", "labels": {"ClusterName":"otel-sample-cluster","TaskId":"kubernetes-pod-appmesh-envoy","host.name":"192.168.42.106","port":"9901","scheme":"http","service.name":"kubernetes-pod-appmesh-envoy"}}

@bjrara
Copy link
Member Author

bjrara commented Mar 12, 2021

Close this issue because the fix is merged.

@bjrara bjrara closed this as completed Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant