Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] add prometheus support for proxy & httpserver. #877

Merged
merged 6 commits into from
Dec 21, 2022

Conversation

LokiWager
Copy link
Collaborator

Background

Add Prometheus support for Easegress Objects & Filters, for details, see metrics.md.

Changing

  1. Add Prometheus helper function.
    • check metric & labels
    • check duplication for metric
    • create Prometheus metric and registered to DefaultRegisterer automatically.
  2. Create metrics for HttpServer
    • collect these metrics in pool.collectMetrics function.
Metric Type Description Labels
httpserver_health gauge show the status for the http server: 1 for ready, 0 for down clusterName, clusterRole, instanceName, name, kind
httpserver_total_requests counter the total count of http requests clusterName, clusterRole, instanceName, name, kind, routerKind, backend
httpserver_total_response counter the total count of http resposne clusterName, clusterRole, instanceName, name, kind, routerKind, backend
httpserver_total_error_requests counter the total count of http error requests clusterName, clusterRole, instanceName, name, kind, routerKind, backend
httpserver_requests_duration histogram request processing duration histogram clusterName, clusterRole, instanceName, name, kind, routerKind, backend
httpserver_requests_size_bytes histogram a histogram of the total size of the request. Includes body clusterName, clusterRole, instanceName, name, kind, routerKind, backend
httpserver_response_size_bytes histogram a histogram of the total size of the returned response body clusterName, clusterRole, instanceName, name, kind, routerKind, backend
  1. Create metrics for Proxy
    • collect these metrics in serveHTTP function
Metric Type Description Labels
proxy_total_connections counter the total count of proxy connections clusterName, clusterRole, instanceName, name, kind, loadBalancePolicy, filterPolicy
proxy_total_error_connections counter the total count of proxy error connections clusterName, clusterRole, instanceName, name, kind, loadBalancePolicy, filterPolicy
proxy_request_body_size histogram a histogram of the total size of the request clusterName, clusterRole, instanceName, name, kind, loadBalancePolicy, filterPolicy
proxy_response_body_size histogram a histogram of the total size of the response clusterName, clusterRole, instanceName, name, kind, loadBalancePolicy, filterPolicy
  1. Add exporter URI
Get /apis/v2/metrics

implementation

Why not take the easemonitor approach to implementation

  1. not enough timeliness to collect every 5 seconds
  2. Can't take full advantage of the Prometheus ecosystem
  3. labels acquisition is more invasive to the original code, and the customization is serious

@codecov-commenter
Copy link

codecov-commenter commented Dec 17, 2022

Codecov Report

Base: 76.03% // Head: 76.32% // Increases project coverage by +0.29% 🎉

Coverage data is based on head (c2976bd) compared to base (be4d396).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #877      +/-   ##
==========================================
+ Coverage   76.03%   76.32%   +0.29%     
==========================================
  Files         110      110              
  Lines       12741    12867     +126     
==========================================
+ Hits         9687     9821     +134     
+ Misses       2507     2501       -6     
+ Partials      547      545       -2     
Impacted Files Coverage Δ
pkg/filters/proxy/pool.go 81.97% <100.00%> (+2.67%) ⬆️
pkg/object/httpserver/mux.go 80.68% <100.00%> (+1.27%) ⬆️
pkg/object/httpserver/runtime.go 66.33% <100.00%> (+7.70%) ⬆️
pkg/filters/headerlookup/headerlookup.go 84.61% <0.00%> (-1.40%) ⬇️
pkg/object/autocertmanager/autocertmanager.go 93.46% <0.00%> (-0.82%) ⬇️
pkg/cluster/op.go 66.46% <0.00%> (+1.82%) ⬆️
pkg/cluster/syncer.go 83.33% <0.00%> (+5.76%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

|--------| ---- | ----------- |-------------------------------------------------------------------------|
| httpserver_health | gauge | show the status for the http server: 1 for ready, 0 for down | clusterName, clusterRole, instanceName, name, kind |
| httpserver_total_requests | counter | the total count of http requests | clusterName, clusterRole, instanceName, name, kind, routerKind, backend |
| httpserver_total_response | counter | the total count of http resposne | clusterName, clusterRole, instanceName, name, kind, routerKind, backend |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| httpserver_total_response | counter | the total count of http resposne | clusterName, clusterRole, instanceName, name, kind, routerKind, backend |
| httpserver_total_responses | counter | the total count of http responses | clusterName, clusterRole, instanceName, name, kind, routerKind, backend |

| httpserver_total_error_requests | counter | the total count of http error requests | clusterName, clusterRole, instanceName, name, kind, routerKind, backend |
| httpserver_requests_duration | histogram | request processing duration histogram | clusterName, clusterRole, instanceName, name, kind, routerKind, backend |
| httpserver_requests_size_bytes | histogram | a histogram of the total size of the request. Includes body | clusterName, clusterRole, instanceName, name, kind, routerKind, backend |
| httpserver_response_size_bytes | histogram | a histogram of the total size of the returned response body | clusterName, clusterRole, instanceName, name, kind, routerKind, backend |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| httpserver_response_size_bytes | histogram | a histogram of the total size of the returned response body | clusterName, clusterRole, instanceName, name, kind, routerKind, backend |
| httpserver_responses_size_bytes | histogram | a histogram of the total size of the returned response body | clusterName, clusterRole, instanceName, name, kind, routerKind, backend |

and the description is different from httpserver_requests_size_bytes, is it correct?

)

// newMetrics create the ProxyMetrics.
func (p *Proxy) newMetrics(name string) *metrics {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propose to put all the content of this file into pool.go, and change the receiver of this function to ServerPool.

@@ -47,6 +47,7 @@ type (
mux struct {
httpStat *httpstat.HTTPStat
topN *httpstat.TopN
metrics *metrics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found any usage of this field, if this is true, please remove it.

)

// newMetrics create the HttpServerMetrics.
func (r *runtime) newMetrics(name string) *metrics {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put the content of this file to runtime.go and mux.go and remove this file.

return summaryMap[metricName]
}

func getAndValid(metricName string, labels []string) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid is an adjective.

Suggested change
func getAndValid(metricName string, labels []string) (string, error) {
func getAndValidate(metricName string, labels []string) (string, error) {

}

// ValidMetricName check if the metric name is valid
func ValidMetricName(name string) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

}

// ValidLabelName check if the label name is valid
func ValidLabelName(label string) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

// NewGauge create the gauge metric
func NewGauge(metric string, help string, labels []string) *prometheus.
GaugeVec {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

return nil
}

if m, find := counterMap[metricName]; find {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will NewCounter be called concurrently? if yes, here could be a race condition.

@zhao-kun
Copy link
Collaborator

@suchen-sci please review the PR ASAP

@suchen-sci suchen-sci merged commit a374719 into easegress-io:main Dec 21, 2022
@LokiWager LokiWager deleted the prometheus_support_v2 branch February 20, 2023 06:32
@LokiWager
Copy link
Collaborator Author

#760

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants