Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Splunk HEC exporter send profiling data separately from logs #4464

Merged

Conversation

gsmirnov-splk
Copy link
Contributor

@gsmirnov-splk gsmirnov-splk commented Aug 6, 2021

Description:
Splunk Profiling uses OTLP logs as the data transport at this time. To make entitlements work on the server side, we need to send this data separately, with a special HTTP header. This is achieved by keeping two separate buffers: one for regular logs and one for profiling data. The data is differentiated by looking at the InstrumentationLibraryName of log records, and the buffers are flushed separately. I did some minor refactoring, but did not go far with it yet, so @bjsignalfx @tigrannajaryan and @dmitryax, your feedback is very welcome.

Testing:
Functionality is tested by the new Test_pushLogData_ShouldAddHeadersForProfilingData and TestSubLogs tests. The bugfix is verified in the Test_pushLogData_ShouldReturnUnsentLogsOnly test. I also made a docker image from this branch and tried pointing a profiling agent at it. I could see the correct HTTP headers on the HEC ingest endpoint.

Performance-wise, I added the Benchmark_pushLogData_<A>_<B>_<C>_<D> tests, where:

  • A is the number of resource bundles
  • B is the number of profiling logs in each bundle
  • C is the number of non-profiling logs in each bundle
  • D is the size of the buffer used for batching requests before sending via HTTP.

After running them locally (amd64, WSL2, Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz)

Before:

Benchmark_pushLogData_100_10_10_1024-16             5373           6636589 ns/op
Benchmark_pushLogData_100_10_0_1024-16             10000           3377953 ns/op
Benchmark_pushLogData_100_0_10_1024-16             10000           3405957 ns/op
Benchmark_pushLogData_1_10_0_1024-16              971089             38885 ns/op
Benchmark_pushLogData_1_0_10_1024-16              960072             39215 ns/op
Benchmark_pushLogData_10_100_100_1024-16            5359           6955722 ns/op
Benchmark_pushLogData_10_0_100_1024-16             10000           3463448 ns/op
Benchmark_pushLogData_10_100_0_1024-16             10000           3458179 ns/op
Benchmark_pushLogData_10_10_10_256-16              39250            931376 ns/op
Benchmark_pushLogData_10_10_10_1024-16             53074            691860 ns/op
Benchmark_pushLogData_10_10_10_8K-16               38308            939991 ns/op
Benchmark_pushLogData_10_10_10_1M-16               34166           1080792 ns/op
Benchmark_pushLogData_10_1_1_1024-16              522446             68905 ns/op

After:

Benchmark_pushLogData_100_10_10_1024-16             5185           6987410 ns/op
Benchmark_pushLogData_100_10_0_1024-16              9970           3565623 ns/op
Benchmark_pushLogData_100_0_10_1024-16             10000           3511145 ns/op
Benchmark_pushLogData_1_10_0_1024-16              855631             43389 ns/op
Benchmark_pushLogData_1_0_10_1024-16              862240             43327 ns/op
Benchmark_pushLogData_10_100_100_1024-16            4666           7604861 ns/op
Benchmark_pushLogData_10_0_100_1024-16              9837           3603110 ns/op
Benchmark_pushLogData_10_100_0_1024-16             10000           3736247 ns/op
Benchmark_pushLogData_10_10_10_256-16              34082           1011595 ns/op
Benchmark_pushLogData_10_10_10_1024-16             50281            714967 ns/op
Benchmark_pushLogData_10_10_10_8K-16               36055            997702 ns/op
Benchmark_pushLogData_10_10_10_1M-16               28776           1333783 ns/op
Benchmark_pushLogData_10_1_1_1024-16              509085             71227 ns/op

So, the overhead per op is 3-4% when there is no profiling data and 5-9% with mixed data. It also heavily depends on incoming batch size (as log record count) and the outgoing buffer size (as bytes). I could bring it up to 23% by using 1MB bodies for HTTP calls.

However, if I add a 10ms backoff to simulate HTTP call latency, the overhead based on the CPU time reported by go benchmarking suite drops to ~0%.

@gsmirnov-splk gsmirnov-splk requested review from a team and djaglowski August 6, 2021 21:19
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 6, 2021

CLA Signed

The committers are authorized under a signed CLA.

@dmitryax
Copy link
Member

dmitryax commented Aug 6, 2021

@gsmirnov-splk thanks for putting the benchmark tests, the overhead looks acceptable to me. Could you please move the bugfix to another PR?

@dmitryax
Copy link
Member

dmitryax commented Aug 9, 2021

Also please take a look at the linter issues

@gsmirnov-splk
Copy link
Contributor Author

@gsmirnov-splk thanks for putting the benchmark tests, the overhead looks acceptable to me. Could you please move the bugfix to another PR?

Sure, here it is: #4467

Working on addressing the other comments now.

Copy link
Contributor

@atoulme atoulme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes look ok to me, with the explicit understanding that this is not the final form of profiling, but it fits right now for this use case.

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tigrannajaryan
Copy link
Member

Needs rebasing and updating the commit message / PR description (bug fix is no longer part of this PR).

@gsmirnov-splk gsmirnov-splk changed the title Make Splunk HEC exporter send profiling data separately from logs, also fix a resend bug Make Splunk HEC exporter send profiling data separately from logs Aug 10, 2021
# Conflicts:
#	exporter/splunkhecexporter/client.go
#	exporter/splunkhecexporter/client_test.go
@tigrannajaryan tigrannajaryan merged commit 6c4ce1a into open-telemetry:main Aug 10, 2021
hex1848 pushed a commit to hex1848/opentelemetry-collector-contrib that referenced this pull request Jun 2, 2022
…pc/otelgrpc (open-telemetry#4464)

Bumps [go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc](https://github.com/open-telemetry/opentelemetry-go-contrib) from 0.26.1 to 0.27.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md)
- [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.26.1...zpages/v0.27.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants