Smarter waiting for late spans in tailsamplingprocessor #31498

djluck · 2024-02-29T02:36:15Z

Component(s)

No response

Is your feature request related to a problem? Please describe.

Currently we have to deploy the tailsamplingprocessor with a reasonably large decision_wait value (2m). We do this in order to be able to capture our long-tail traces but this imposes an unwelcome cost due to how completed traces are buffered in memory:

Memory use is higher than it needs to be: once we've received the root span, waiting the full decision_wait period seems unnecessary (although it makes sense to wait a short period for any lagging spans that are a part of the trace).
Completed traces are delayed for decision_wait, even though they are ready to view. In our case, this adds two minutes of latency to every trace!

Describe the solution you'd like

I'm not entirely sure what the solution looks like here but some thoughts:

Once we receive the root span, we consider the trace complete and send it on.
For traces that start externally, it would be useful to have some mechanism to consider a span as the "root" and to trigger the above behavior.
We might maintain a list of trace ids for recently sampled traces: this would ensure that any late spans can be forwarded on appropriately.

Describe alternatives you've considered

I'm aware of no other alternatives.

Additional context

The services we tail sample are deal with some very high throughput and can process 100K spans/ sec+. Attempting to tail sample at this volume imposes a significant memory overhead considering we need to effectively buffer 12mil spans (120s x 100K spans).

The text was updated successfully, but these errors were encountered:

github-actions · 2024-02-29T19:23:26Z

Pinging code owners for processor/tailsampling: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself.

jiekun · 2024-03-01T03:45:32Z

Hello. I think this idea is great. Currently, in tail sampling, there is no check to determine whether a span is the root span. Only the time tick triggers the actual analysis.

I would like to suggest some modifications base on your idea:

Receiving the root span does not guarantee that all spans in the trace have reached the collector, but it can be assumed that most spans have been generated and are on their way to the collector. Therefore, if the decision_wait is set to 30s, upon receiving the root span, it doesn't need to wait for the entire duration but can still wait for a shorter period, such as 5s, to ensure that the remaining spans are received.
For those services that generate async span, user can stick to the current mechanism of decision_wait, and wait for 30s.

So the configuration layout could be:

processors:
  tail_sampling:
    decision_wait: 30s
    # new config here. default empty, which is equal to current mechanism.
    # it could also be set to 0s so that it will start analysing once the root span is received.
    decision_wait_after_root_span: 5s  
    num_traces: 100
    expected_new_traces_per_sec: 10

djluck · 2024-03-02T04:29:41Z

Hey @jiekun, thanks for your thoughts! You raise some good points, especially around async spans (I hadn't considered this edge case).

For the above reason, perhaps it's not enough to just use a shorter wait for any lagging or async spans? Perhaps maintaining a set of trace ids that we have sampled in the past X minutes is a better approach. It'll consume less memory that storing all spans for the full duration of decision_wait so we can make the duration X much longer. This will allow us to instantly forward on any late spans and handle the long tail of traces far more gracefully.

There's also the question of spans that arrive late for traces that we decided not to sample- what should we do with these? Do these become interesting if they've arrived late?

jiekun · 2024-03-04T05:57:46Z

Perhaps maintaining a set of trace ids that we have sampled in the past X minutes is a better approach.

I understand your perspective of reducing memory usage while maintaining sampling accuracy, which is great! However, I'm concerned that it might make the processor more complex and less easy to understand. Additionally, for users who don't need to worry about asynchronous spans, if they upgrade without adjusting the tail sampling latency, they will only notice an increase in memory consumption due to storing additional trace IDs.

I still support these new ideas, but they may require the support of the maintainer. Personally, I would be more than happy to implement them in the our internal collector :)

jiekun · 2024-03-04T06:02:04Z

BTW, may I ask if you plan to submit a PR for those ideas or just the feature request?

I would like to split them into (at least) 2 parts:

Support decision_wait_after_root_span.
optimize / add trace_id cache and sample async span which arrive later.

They can be implemented independently if we have support from the maintainer.

jpkrohling · 2024-03-04T11:37:38Z

I've been talking to a few people about a decision cache, which should solve the second problem. The idea would be to have a simple map of trace id with boolean values, indicating whether they were sampled. Note that we want to cache both a positive and a negative answer: we don't want to sample spans for a trace that was rejected before, and we want to sample spans for traces that were accepted. A limitation we need to document is that this cache isn't distributed, so, a scalable tail-sampling setup will likely still have the same problems as today if spans get into different collectors where decisions were made, potentially because of topology changes.

My original idea was to implement a ring buffer as the cache, so that we have a fixed number of decisions in memory. I also considered a LRU, but not sure this brings any benefits.

jpkrohling · 2024-03-04T11:38:03Z

cc @kentquirk, as I believe you are interested in those components as well

djluck · 2024-03-04T20:20:53Z

Thanks for your thoughts @jpkrohling, I had a similar thought about the negative case but was thinking about a different solution: what about pairing the set of sampled trace IDs with a bloom filter that contained the set of non-sampled trace ids?

The idea of this approach is based on the premise that the tail sampler is likely to reject more traces than it accepts. Using a map for the accepted + rejected spans would increase memory use further. A bloom filter could optimize the rejected case.

EDIT: I thought about it a bit more and modelled some potential parameters, I don't think the bloom filter would be the first implementation choice- the map would be simpler and not require orders of more memory.

djluck · 2024-03-05T07:45:22Z

@jiekun I'm interested in submitting a PR but if you're keen to work this too, we could always split the work between us- happy either way 👍

jpkrohling · 2024-03-05T10:53:29Z

and not require orders of more memory

Right, traceID is only 16 bytes, which means that ~10MiB is enough to store more than 650k entries in the cache, if my math is right.

jpkrohling · 2024-03-05T12:06:41Z

I recorded some of the things I had in mind here: #31580

crobert-1 · 2024-03-05T18:38:42Z

It looks like there's been a lot of good discussion here, and some actions items have been noted. Removing needs triage.

djluck · 2024-03-05T21:53:27Z

I'm happy to starting thinking about #31583 if that's fine with everyone. I can move future discussion around the decision cache into this issue.

EDIT: wrong issue link, was referring to decision cache only

github-actions · 2024-05-06T03:30:03Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jamesmoessis · 2024-05-21T01:05:29Z

Personally also wondering if disk can play a role in storing longer lived traces. Or additionally an option to compress the spans in memory before caching them (if you are willing to take the CPU tradeoff).

github-actions · 2024-07-22T03:31:44Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/tailsampling: @jpkrohling

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djluck added enhancement New feature or request needs triage New item requiring triage labels Feb 29, 2024

crobert-1 added the processor/tailsampling Tail sampling processor label Feb 29, 2024

This was referenced Mar 5, 2024

Weekly Report: 2024-02-27 - 2024-03-05 #31560

Closed

Weekly Report: 2024-02-27 - 2024-03-05 asuresh4/opentelemetry-collector-contrib#11543

Open

crobert-1 removed the needs triage New item requiring triage label Mar 5, 2024

github-actions bot added the Stale label May 6, 2024

crobert-1 removed the Stale label May 6, 2024

jpkrohling mentioned this issue May 7, 2024

Refactor tail-sampling processor #31580

Open

4 tasks

github-actions bot added the Stale label Jul 22, 2024

jpkrohling removed the Stale label Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smarter waiting for late spans in tailsamplingprocessor #31498

Smarter waiting for late spans in tailsamplingprocessor #31498

djluck commented Feb 29, 2024 •

edited

Loading

github-actions bot commented Feb 29, 2024

jiekun commented Mar 1, 2024 •

edited

Loading

djluck commented Mar 2, 2024

jiekun commented Mar 4, 2024 •

edited

Loading

jiekun commented Mar 4, 2024

jpkrohling commented Mar 4, 2024

jpkrohling commented Mar 4, 2024

djluck commented Mar 4, 2024 •

edited

Loading

djluck commented Mar 5, 2024

jpkrohling commented Mar 5, 2024

jpkrohling commented Mar 5, 2024

crobert-1 commented Mar 5, 2024

djluck commented Mar 5, 2024 •

edited

Loading

github-actions bot commented May 6, 2024

jamesmoessis commented May 21, 2024

github-actions bot commented Jul 22, 2024

Smarter waiting for late spans in tailsamplingprocessor #31498

Smarter waiting for late spans in tailsamplingprocessor #31498

Comments

djluck commented Feb 29, 2024 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Feb 29, 2024

jiekun commented Mar 1, 2024 • edited Loading

djluck commented Mar 2, 2024

jiekun commented Mar 4, 2024 • edited Loading

jiekun commented Mar 4, 2024

jpkrohling commented Mar 4, 2024

jpkrohling commented Mar 4, 2024

djluck commented Mar 4, 2024 • edited Loading

djluck commented Mar 5, 2024

jpkrohling commented Mar 5, 2024

jpkrohling commented Mar 5, 2024

crobert-1 commented Mar 5, 2024

djluck commented Mar 5, 2024 • edited Loading

github-actions bot commented May 6, 2024

jamesmoessis commented May 21, 2024

github-actions bot commented Jul 22, 2024

djluck commented Feb 29, 2024 •

edited

Loading

jiekun commented Mar 1, 2024 •

edited

Loading

jiekun commented Mar 4, 2024 •

edited

Loading

djluck commented Mar 4, 2024 •

edited

Loading

djluck commented Mar 5, 2024 •

edited

Loading