Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge span metrics and service graph into span metrics connector #26648

Open
jpkrohling opened this issue Sep 12, 2023 · 11 comments
Open

Merge span metrics and service graph into span metrics connector #26648

jpkrohling opened this issue Sep 12, 2023 · 11 comments
Labels
connector/servicegraph connector/spanmetrics never stale Issues marked with this label will be never staled and automatically removed processor/servicegraph Service graph processor processor/spanmetrics Span Metrics processor

Comments

@jpkrohling
Copy link
Member

Component(s)

connector/servicegraph, connector/spanmetrics, processor/servicegraph, processor/spanmetrics

Describe the issue you're reporting

I would like to combine those four components into one, given that they, at their cores, they are about converting spans to metrics in a way or another:

  • span metrics processor
  • service graph processor
  • span metrics connector
  • service graph connector

Besides, I would like the result to use the semantic conventions as much as possible. Given that the current span metrics connector is still alpha, we should probably be fine in determining the metrics to be deprecated in the next version already, changing them to their new names in N+2 already (0.87.0, if everything goes as planned).

For work to start on this, I need buy-in from the current span metrics processor/connector code owner, @albertteoh. Given that @mapno and @rlankfo worked on service graph processor and will work on the connector, it would also make sense to make them code owners.

@albertteoh, are you on board with the proposed changes?

@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@albertteoh
Copy link
Contributor

Thanks for opening this discussion, @jpkrohling.

I would like to combine those four components into one, given that they, at their cores, they are about converting spans to metrics in a way or another

What are the primary motivations for consolidation?

I can definitely see benefits of less maintenance overhead by decommissioning the deprecated processor variants, assuming this is what you mean by combining.

I'm just a little less clear on the benefits of consolidating the service graph and spanmetrics connectors, though I agree, they both aggregate spans to metrics, but they serve different use cases AFAICT.

determining the metrics to be deprecated in the next version already, changing them to their new names in N+2 already (0.87.0, if everything goes as planned).

Are you able to please elaborate more on this or point me to an issue that details the proposed name changes and rationale?

@jpkrohling
Copy link
Member Author

jpkrohling commented Sep 19, 2023

What are the primary motivations for consolidation?

It really is about having two similar components being consolidated. The use-cases are somewhat different, but we've seen that they are mostly used together.

Are you able to please elaborate more on this or point me to an issue that details the proposed name changes and rationale?

The idea is to use the semantic conventions as much as possible. I haven't done a summary of changes yet, but I assume there might be things that do not conform with the conventions or things that didn't exist when the component was first created. If you are OK with the general idea, we'd move forward and compile a list of changes, which shouldn't be many.

@albertteoh
Copy link
Contributor

I'm okay with the general idea, and I'm also in favour of having more owners to the connector, which will benefit users because we'll have more resources available to handle feature/bug requests and questions.

Let's move forward on this.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 22, 2023
@jpkrohling jpkrohling added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Nov 28, 2023
@JaredTan95
Copy link
Member

JaredTan95 commented Dec 4, 2023

Good idea. Currently spanmetrics and servicegraph processor have similar configurations and functions in many places.
Like the recent ongoing requirement for servicegraph: #29628 #27881. If the two processors can be maintained together. I think it's more reasonable.

@mapno
Copy link
Contributor

mapno commented Dec 12, 2023

My concern with merging these two components is that they have different requirements. Servicegraph needs for all spans of a trace to be processed together in order to extract connections (HTTP requests, db calls, etc), whereas spanmetrics can process spans independently of each other. By merging, we'd be introducing this limitation to the spanmetrics connector/processor.

@JaredTan95
Copy link
Member

JaredTan95 commented Dec 12, 2023

My concern with merging these two components is that they have different requirements. Servicegraph needs for all spans of a trace to be processed together in order to extract connections (HTTP requests, db calls, etc), whereas spanmetrics can process spans independently of each other. By merging, we'd be introducing this limitation to the spanmetrics connector/processor.

BTW,I have a question.For now,we can using otel col LB to route span data from the same traceid to the same otel col instance. Can we use an external caching component(For example, extension the store with external cache component,maybe like redis) to make the graph processor stateless so I don't have to use LB?

@fstab
Copy link
Member

fstab commented Dec 13, 2023

spanmetrics can process spans independently of each other.

The most complex scenario here is: Tail sampling and exemplars. If spanmetrics process spans independent of each other, how to make sure that traces containing an exemplar are sampled?

@mapno
Copy link
Contributor

mapno commented Dec 14, 2023

an we use an external caching component(For example, extension the store with external cache component,maybe like redis) to make the graph processor stateless so I don't have to use LB?

Actually, yes. It's not a bad idea. What we'd need is an implementation of the store pkg that works with an external db/cache. It could even be shared by multiple collector instances if it can be made safe with transactions or similar techniques.

@JaredTan95
Copy link
Member

an we use an external caching component(For example, extension the store with external cache component,maybe like redis) to make the graph processor stateless so I don't have to use LB?

Actually, yes. It's not a bad idea. What we'd need is an implementation of the store pkg that works with an external db/cache. It could even be shared by multiple collector instances if it can be made safe with transactions or similar techniques.

Yes, we've recently tried to implement this internally, and if it goes well, we'll contribute upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
connector/servicegraph connector/spanmetrics never stale Issues marked with this label will be never staled and automatically removed processor/servicegraph Service graph processor processor/spanmetrics Span Metrics processor
Projects
None yet
Development

No branches or pull requests

6 participants