Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contour support for Envoy's stats per route #4637

Open
izturn opened this issue Jul 25, 2022 · 18 comments · May be fixed by #6531
Open

Contour support for Envoy's stats per route #4637

izturn opened this issue Jul 25, 2022 · 18 comments · May be fixed by #6531
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@izturn
Copy link
Member

izturn commented Jul 25, 2022

Please describe the problem you have
At envoyproxy/envoy#3351 @stevesloka advise envoy to expose metrics per vhost, now this feature has been released along with envoy v1.23 as route-stat-prefix (the pr is envoyproxy/envoy#21302), shall we want to support it too?

@izturn izturn added kind/feature Categorizes issue or PR as related to a new feature. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. labels Jul 25, 2022
@skriss skriss modified the milestone: 1.24.0 Jul 25, 2022
@wilsonwu
Copy link
Member

Any plan for this feature support?

@sunjayBhatia
Copy link
Member

Seems reasonable to support with a few considerations:

We would definitely take community contributions to help speed this up, otherwise we've got this prioritized for 1.24.0 currently

@sunjayBhatia sunjayBhatia removed the lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. label Aug 4, 2022
@github-actions
Copy link

github-actions bot commented Oct 4, 2022

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2022
@wilsonwu
Copy link
Member

wilsonwu commented Oct 8, 2022

We are planning to do tests for this feature, update later.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 9, 2022
@wilsonwu
Copy link
Member

Load test results come:

We did two rounds load test:

  1. 10k routes without vhost metrics
  2. 10k routes with vhost metrics

All tests env is 1 instance 4C4G envoy v1.23

Test results:
The 1st round:
Below image show envoy start CPU and memory: CPU 2%, memory 6% (almost 250m)
image
After sent requests to 10k routes randomly, CPU and memory like below: CPU 400%, memory 8% (almost 350m)
image

The 2nd round:
Below image show envoy start CPU and memory: CPU 2% - 3%, memory 7.5% (almost 330m)
image

After sent requests to 10k routes randomly, CPU and memory like below: CPU 400%, memory 10% (almost 450m)
image

I think the vhost metrics only make envoy start memory high, for load performance, it is fine.

Hope this test can help you make decision.

@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2022
@wilsonwu
Copy link
Member

Merry Christmas to guys, if any discussion need, let's going on.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2022
@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 24, 2023
@sunjayBhatia sunjayBhatia added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 27, 2023
@sunjayBhatia
Copy link
Member

Sorry for the lack of responses on this one @wilsonwu will try to look at this again soon!

@wilsonwu
Copy link
Member

Sorry for the lack of responses on this one @wilsonwu will try to look at this again soon!

Thanks Sunjay, if the test result acceptable, we can move on for some design work.

@wilsonwu
Copy link
Member

Hi guys, let's going on, @sunjayBhatia, any update for this.

@skriss
Copy link
Member

skriss commented Apr 13, 2023

@wilsonwu I'm going to add this to the 1.26 milestone for now and will plan to look at it after 1.25 is released at the end of this month.

@skriss skriss added this to the 1.26.0 milestone Apr 13, 2023
@wilsonwu
Copy link
Member

@wilsonwu I'm going to add this to the 1.26 milestone for now and will plan to look at it after 1.25 is released at the end of this month.

Good to hear that, we will starting contribute it.

@alibo
Copy link

alibo commented Jun 26, 2023

Considering this feature has not been implemented yet, I wonder if there's an alternative option to monitor aggregated traffic of a HTTPProxy/Ingress in contour? envoy metrics show the traffic of each backend pod and I can't see an easy way to relate them to a specified HTTPProxy/Ingress object especially if multiple HTTPProxy/Ingress objects point to the same service/pods

@sunjayBhatia
Copy link
Member

@wilsonwu sorry this is so late but when doing the experiment above, did you use a static stat prefix for all routes associated with a virtualhost or do something similar to what is described here: #5535 (comment) ? Naively I'm thinking a static stat prefix would have less resource impact and also not offer the granularity needed to actually differentiate the stats between different routes on a route/upstream

@wilsonwu
Copy link
Member

@wilsonwu sorry this is so late but when doing the experiment above, did you use a static stat prefix for all routes associated with a virtualhost or do something similar to what is described here: #5535 (comment) ? Naively I'm thinking a static stat prefix would have less resource impact and also not offer the granularity needed to actually differentiate the stats between different routes on a route/upstream

Sorry for the late reply, already comment in the PR.

@sunjayBhatia
Copy link
Member

per the ongoing discussion on the related PR, looks like this will slip to 1.27.0

@sunjayBhatia sunjayBhatia modified the milestones: 1.26.0, 1.27.0 Aug 9, 2023
@skriss skriss modified the milestones: 1.27.0, 1.28.0 Oct 9, 2023
@skriss skriss modified the milestones: 1.28.0, 1.29.0 Jan 29, 2024
@skriss skriss modified the milestones: 1.29.0, 1.30.0 May 2, 2024
@skriss skriss removed this from the 1.30.0 milestone May 24, 2024
@rtreffer-rddt
Copy link

I am running into this issue, I want to map and filter incoming vs. outgoing traffic, potentially by route.
We are using HTTPProxy objects, which seems absent from #5535.

I would be up for reviving #5535, extended it for HTTPProxy and giving it another shot.

One thing that I am unsure is how to tag the routes in the HTTPProxy as we use quite some match conditions. On suggestion was the index in the route array. I am wondering if a metricsTag would be acceptable as an alternative. It would give people an escape hatch to do custom names (thus avoiding the need for a one-size-fits-all generator) and it would allow for logical names (e.g. metricsTag: "login") instead of generated names. (This would also allow one to group routes)

@rtreffer-rddt rtreffer-rddt linked a pull request Jun 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

6 participants