Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc(feature): SDK Spans Aggregator #126

Merged
merged 16 commits into from
Jan 19, 2024
Merged

Conversation

philipphofmann
Copy link
Member

@philipphofmann philipphofmann commented Nov 28, 2023

This RFC aims to find a strategy for batching spans together to avoid sending one envelope per span.

Rendered RFC

Copy link
Member

@markushi markushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, I left a few comments!

text/0126-batch-span-ingestion.md Outdated Show resolved Hide resolved
text/0126-batch-span-ingestion.md Outdated Show resolved Hide resolved
@philipphofmann philipphofmann marked this pull request as ready for review December 6, 2023 11:34
@philipphofmann philipphofmann changed the title rfc(feature): Batch Span Ingestion rfc(feature): SDK Span Buffer Dec 6, 2023
@AbhiPrasad
Copy link
Member

One high level thing is if you had the chance to look at https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md?

Instead of span count, does it make sense to use byte size as a weight like we are doing for metrics aggregation? Should this be an implemented on the client or the transport?

@philipphofmann
Copy link
Member Author

@AbhiPrasad, thanks for the link to OTel. I should also look for existing concepts in OTel for every feature we build. The Span Buffer is similar to the Batch Processor of OTel. I added the option to set the timeout to 0 to this RFC. Do you think we should align our Span Buffer more with the OTel Batch Processor? If I'm correct, we must send span children in the same envelope as their parents. So that's a requirement the OTel Batch Processor doesn't specify. Therefore, I think having our version of Span Buffer specific to spans is okay. Furthermore, I don't think we have needs for send_batch_max_size, metadata_keys, and metadata_cardinality_limit. Or am I missing something?

Instead of span count, does it make sense to use byte size as a weight like we are doing for metrics aggregation?

How do you count the byte sizes of the objects? Do you serialize them and keep track of the byte size? I think that's a great idea because you don't care about how many spans you keep in memory, but more about how much memory they use.

Should this be an implemented on the client or the transport?

As the Span Buffer must only accept spans after sampling and any other processing that could strip out data and keeping track of the byte size is easier after serializing, I would vote for the transport. If you agree, I'm going to update the RFC.

@philipphofmann
Copy link
Member Author

@AbhiPrasad, please have a look at #126 (comment).

@AbhiPrasad
Copy link
Member

Do you think we should align our Span Buffer more with the OTel Batch Processor? If I'm correct, we must send span children in the same envelope as their parents. So that's a requirement the OTel Batch Processor doesn't specify. Therefore, I think having our version of Span Buffer specific to spans is okay

We don't necessarily have the requirement that span children are sent in the same envelope as their parents, otherwise we run into the same issue with transactions (buffering a bunch of stuff in memory).

I do think having our own specific spec for a span buffer is makes sense though, and that send_batch_max_size, metadata_keys, and metadata_cardinality_limit seem unnecessary for our use case atm.

How do you count the byte sizes of the objects? Do you serialize them and keep track of the byte size? I think that's a great idea because you don't care about how many spans you keep in memory, but more about how much memory they use.

so I was actually wrong about this, we don't keep the exact byte size, but instead an approximate weight based on type of metric/what it contains. We can try a similar heuristic as well, just weight by different properties on the span.

As the Span Buffer must only accept spans after sampling and any other processing that could strip out data and keeping track of the byte size is easier after serializing, I would vote for the transport. If you agree, I'm going to update the RFC.

Transport sounds like the correct decision to me!

@philipphofmann
Copy link
Member Author

philipphofmann commented Jan 12, 2024

We don't necessarily have the requirement that span children are sent in the same envelope as their parents,

I thought that was a requirement. @phacops, can you confirm that span children don't need to be sent in the same envelope as their parents?

we don't keep the exact byte size, but instead an approximate weight based on type of metric/what it contains. We can try a similar heuristic as well, just weight by different properties on the span.

Okay, so we could put weights on different properties and sum them. For example, a span with only simple properties has a weight of 10. Then, each tag again adds 1 weight, each data object 5, etc. I see that Python has that https://github.com/getsentry/sentry-python/blob/61a4621336fd81b69e5259af599e5779d78dce3f/sentry_sdk/metrics.py#L114-L118. I will come up with a spec.

Update 1
I decided to use the serialized dictionary to calculate the weights because we need to serialize the spans. Also, I think the client is better suited for the span buffer, as the transport is only aware of envelopes, and we already have the MetricsAggregator in the client on JS.

I think we could rename the SpanBuffer to SpansAggregator to align with metrics. Do you agree with @AbhiPrasad?

Update 2
As the SDKs need to serialize the spans and don't need to change them in the span buffer, we could also serialize them and count the serialized bytes. That would work on Cocoa, but I'm unsure about other platforms. Would that work on JS @AbhiPrasad?

@AbhiPrasad
Copy link
Member

I think we could rename the SpanBuffer to SpansAggregator to align with metrics

That matches my mental model, let's do it!

As the SDKs need to serialize the spans and don't need to change them in the span buffer, we could also serialize them and count the serialized bytes

Hmm I think this varies based on platform, not everyone can serialize, maybe we make this requirement flexible depending on sdk? So SDK can decide on weighting methodology?

Copy link
Member

@markushi markushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvements, looks good to me!

@philipphofmann
Copy link
Member Author

philipphofmann commented Jan 17, 2024

Hmm I think this varies based on platform, not everyone can serialize, maybe we make this requirement flexible depending on sdk? So SDK can decide on weighting methodology?

@AbhiPrasad, this already is in the RFC:

The span aggregator uses a combination of timeout and span size, which is the serialized span size in bytes. If counting the serialized span byte size is not possible on a platform, it can implement the alternative called weight, which is an approximation of how much bytes a span allocates. A section below explains weight in more detail.

@philipphofmann philipphofmann changed the title rfc(feature): SDK Span Buffer rfc(feature): SDK Spans Aggregator Jan 17, 2024
@philipphofmann philipphofmann merged commit 8b6597e into main Jan 19, 2024
2 checks passed
@philipphofmann philipphofmann deleted the rfc/batch-span-ingestion branch January 19, 2024 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants