Skip to content

Commit

Permalink
[connectors/spanmetrics] Drop Prometheus-specific metrics labels sani…
Browse files Browse the repository at this point in the history
…tization. (open-telemetry#18757)
  • Loading branch information
kovrus committed Feb 27, 2023
1 parent fed335d commit ca0d35c
Show file tree
Hide file tree
Showing 8 changed files with 42 additions and 150 deletions.
13 changes: 13 additions & 0 deletions .chloggen/spanmetrics-connector-drop-sanitization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
change_type: breaking

component: spanmetricsconnector

note: Drop Prometheus-specific metrics labels sanitization.

issues: [18678]

subtext: |
The spanmeterics connector is the OTel component, therefore, we would
like to strip Prometheus-specific parts from it. Metric names and attributes
conversion to Prometheus conventions should happen in Prometheus components,
e.g. Prometheus Remote Write exporter.
48 changes: 21 additions & 27 deletions connector/spanmetricsconnector/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,38 +14,29 @@

## Overview

Aggregates Request, Error and Duration (R.E.D) metrics from span data.
Aggregates Request, Error and Duration (R.E.D) OpenTelemetry metrics from span data.

**Request** counts are computed as the number of spans seen per unique set of dimensions, including Errors.
For example, the following metric shows 142 calls:
```
calls_total{http_method="GET",http_status_code="200",span_name="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 142
```
Multiple metrics can be aggregated if, for instance, a user wishes to view call counts just on `service_name` and `span_name`.
**Request** counts are computed as the number of spans seen per unique set of
dimensions, including Errors. Multiple metrics can be aggregated if, for instance,
a user wishes to view call counts just on `service.name` and `span.name`.

**Error** counts are computed from the Request counts which have an "Error" Status Code metric dimension.
For example, the following metric indicates 220 errors:
```
calls_total{http_method="GET",http_status_code="503",span_name="/checkout",service_name="frontend",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_ERROR"} 220
```
**Error** counts are computed from the Request counts which have an `Error` Status Code metric dimension.

**Duration** is computed from the difference between the span start and end times and inserted into the
relevant latency histogram time bucket for each unique set dimensions.
For example, the following latency buckets indicate the vast majority of spans (9K) have a 100ms latency:
```
latency_bucket{http_method="GET",http_status_code="200",label1="value1",span_name="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="2"} 327
latency_bucket{http_method="GET",http_status_code="200",label1="value1",span_name="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="6"} 751
latency_bucket{http_method="GET",http_status_code="200",label1="value1",span_name="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="10"} 1195
latency_bucket{http_method="GET",http_status_code="200",label1="value1",span_name="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="100"} 10180
latency_bucket{http_method="GET",http_status_code="200",label1="value1",span_name="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="250"} 10180
...
```

Each metric will have _at least_ the following dimensions because they are common across all spans:
- Service name
- Span Name
- Span kind
- Status code
Each metric will have _at least_ the following dimensions because they are common
across all spans:

- `service.name`
- `span.name`
- `span.kind`
- `status.code`

## Configurations

If you are not already familiar with connectors, you may find it helpful to first
visit the [Connectors README].

The following settings can be optionally configured:

Expand All @@ -68,7 +59,7 @@ The following settings can be optionally configured:

## Examples

The following is a simple example usage of the spanmetrics connector.
The following is a simple example usage of the `spanmetrics` connector.

For configuration examples on other use cases, please refer to [More Examples](#more-examples).

Expand Down Expand Up @@ -106,3 +97,6 @@ service:
For more example configuration covering various other use cases, please visit the [testdata directory](../../connector/spanmetricsconnector/testdata).
[development]: https://github.com/open-telemetry/opentelemetry-collector#development
[Connectors README]:https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md
[Exporter Pipeline Type]:https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#exporter-pipeline-type
[Receiver Pipeline Type]:https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#receiver-pipeline-type
30 changes: 4 additions & 26 deletions connector/spanmetricsconnector/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ import (
"time"

"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/featuregate"
"go.opentelemetry.io/collector/pdata/pmetric"
)

Expand All @@ -28,12 +27,6 @@ const (
cumulative = "AGGREGATION_TEMPORALITY_CUMULATIVE"
)

var dropSanitizationGate = featuregate.GlobalRegistry().MustRegister(
"connector.spanmetrics.PermissiveLabelSanitization",
featuregate.StageAlpha,
featuregate.WithRegisterDescription("Controls whether to change labels starting with '_' to 'key_'"),
)

// Dimension defines the dimension name and optional default value if the Dimension is missing from a span attribute.
type Dimension struct {
Name string `mapstructure:"name"`
Expand Down Expand Up @@ -62,9 +55,6 @@ type Config struct {

AggregationTemporality string `mapstructure:"aggregation_temporality"`

// skipSanitizeLabel if enabled, labels that start with _ are not sanitized
skipSanitizeLabel bool

// MetricsEmitInterval is the time period between when metrics are flushed or emitted to the configured MetricsExporter.
MetricsFlushInterval time.Duration `mapstructure:"metrics_flush_interval"`

Expand All @@ -76,7 +66,7 @@ var _ component.ConfigValidator = (*Config)(nil)

// Validate checks if the processor configuration is valid
func (c Config) Validate() error {
err := validateDimensions(c.Dimensions, dropSanitizationGate.IsEnabled())
err := validateDimensions(c.Dimensions)
if err != nil {
return err
}
Expand All @@ -100,30 +90,18 @@ func (c Config) GetAggregationTemporality() pmetric.AggregationTemporality {
return pmetric.AggregationTemporalityCumulative
}

// validateDimensions checks duplicates for reserved dimensions and additional dimensions. Considering
// the usage of Prometheus related exporters, we also validate the dimensions after sanitization.
func validateDimensions(dimensions []Dimension, skipSanitizeLabel bool) error {
// validateDimensions checks duplicates for reserved dimensions and additional dimensions.
func validateDimensions(dimensions []Dimension) error {
labelNames := make(map[string]struct{})
for _, key := range []string{serviceNameKey, spanKindKey, statusCodeKey} {
for _, key := range []string{serviceNameKey, spanKindKey, statusCodeKey, spanNameKey} {
labelNames[key] = struct{}{}
labelNames[sanitize(key, skipSanitizeLabel)] = struct{}{}
}
labelNames[spanNameKey] = struct{}{}

for _, key := range dimensions {
if _, ok := labelNames[key.Name]; ok {
return fmt.Errorf("duplicate dimension name %s", key.Name)
}
labelNames[key.Name] = struct{}{}

sanitizedName := sanitize(key.Name, skipSanitizeLabel)
if sanitizedName == key.Name {
continue
}
if _, ok := labelNames[sanitizedName]; ok {
return fmt.Errorf("duplicate dimension name %s after sanitization", sanitizedName)
}
labelNames[sanitizedName] = struct{}{}
}

return nil
Expand Down
40 changes: 4 additions & 36 deletions connector/spanmetricsconnector/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,6 @@ func TestLoadConfig(t *testing.T) {
&Config{
AggregationTemporality: cumulative,
DimensionsCacheSize: defaultDimensionsCacheSize,
skipSanitizeLabel: dropSanitizationGate.IsEnabled(),
MetricsFlushInterval: 15 * time.Second,
},
simpleCfg.Connectors[component.NewID(typeStr)],
Expand All @@ -73,7 +72,6 @@ func TestLoadConfig(t *testing.T) {
},
AggregationTemporality: delta,
DimensionsCacheSize: 1500,
skipSanitizeLabel: dropSanitizationGate.IsEnabled(),
MetricsFlushInterval: 30 * time.Second,
},
fullCfg.Connectors[component.NewID(typeStr)],
Expand All @@ -93,10 +91,9 @@ func TestGetAggregationTemporality(t *testing.T) {

func TestValidateDimensions(t *testing.T) {
for _, tc := range []struct {
name string
dimensions []Dimension
expectedErr string
skipSanitizeLabel bool
name string
dimensions []Dimension
expectedErr string
}{
{
name: "no additional dimensions",
Expand All @@ -116,13 +113,6 @@ func TestValidateDimensions(t *testing.T) {
},
expectedErr: "duplicate dimension name service.name",
},
{
name: "duplicate dimension with reserved labels after sanitization",
dimensions: []Dimension{
{Name: "service_name"},
},
expectedErr: "duplicate dimension name service_name",
},
{
name: "duplicate additional dimensions",
dimensions: []Dimension{
Expand All @@ -131,31 +121,9 @@ func TestValidateDimensions(t *testing.T) {
},
expectedErr: "duplicate dimension name service_name",
},
{
name: "duplicate additional dimensions after sanitization",
dimensions: []Dimension{
{Name: "http.status_code"},
{Name: "http!status_code"},
},
expectedErr: "duplicate dimension name http_status_code after sanitization",
},
{
name: "we skip the case if the dimension name is the same after sanitization",
dimensions: []Dimension{
{Name: "http_status_code"},
},
},
{
name: "duplicate dimension",
dimensions: []Dimension{
{Name: "status_code"},
},
expectedErr: "duplicate dimension name status_code",
},
} {
t.Run(tc.name, func(t *testing.T) {
tc.skipSanitizeLabel = false
err := validateDimensions(tc.dimensions, tc.skipSanitizeLabel)
err := validateDimensions(tc.dimensions)
if tc.expectedErr != "" {
assert.EqualError(t, err, tc.expectedErr)
} else {
Expand Down
37 changes: 0 additions & 37 deletions connector/spanmetricsconnector/connector.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,8 @@ import (
"bytes"
"context"
"sort"
"strings"
"sync"
"time"
"unicode"

"github.com/tilinna/clock"
"go.opentelemetry.io/collector/component"
Expand Down Expand Up @@ -434,41 +432,6 @@ func getDimensionValue(d dimension, spanAttr pcommon.Map, resourceAttr pcommon.M
return v, ok
}

// copied from prometheus-go-metric-exporter
// sanitize replaces non-alphanumeric characters with underscores in s.
func sanitize(s string, skipSanitizeLabel bool) string {
if len(s) == 0 {
return s
}

// Note: No length limit for label keys because Prometheus doesn't
// define a length limit, thus we should NOT be truncating label keys.
// See https://github.com/orijtech/prometheus-go-metrics-exporter/issues/4.
s = strings.Map(sanitizeRune, s)
if unicode.IsDigit(rune(s[0])) {
s = "key_" + s
}
// replace labels starting with _ only when skipSanitizeLabel is disabled
if !skipSanitizeLabel && strings.HasPrefix(s, "_") {
s = "key" + s
}
// labels starting with __ are reserved in prometheus
if strings.HasPrefix(s, "__") {
s = "key" + s
}
return s
}

// copied from prometheus-go-metric-exporter
// sanitizeRune converts anything that is not a letter or digit to an underscore
func sanitizeRune(r rune) rune {
if unicode.IsLetter(r) || unicode.IsDigit(r) {
return r
}
// Everything else turns into an underscore
return '_'
}

// buildMetricName builds the namespace prefix for the metric name.
func buildMetricName(namespace string, name string) string {
if namespace != "" {
Expand Down
18 changes: 0 additions & 18 deletions connector/spanmetricsconnector/connector_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -391,24 +391,6 @@ func TestBuildKeyWithDimensions(t *testing.T) {
}
}

func TestSanitize(t *testing.T) {
cfg := createDefaultConfig().(*Config)
require.Equal(t, "", sanitize("", cfg.skipSanitizeLabel), "")
require.Equal(t, "key_test", sanitize("_test", cfg.skipSanitizeLabel))
require.Equal(t, "key__test", sanitize("__test", cfg.skipSanitizeLabel))
require.Equal(t, "key_0test", sanitize("0test", cfg.skipSanitizeLabel))
require.Equal(t, "test", sanitize("test", cfg.skipSanitizeLabel))
require.Equal(t, "test__", sanitize("test_/", cfg.skipSanitizeLabel))
// testcases with skipSanitizeLabel flag turned on
cfg.skipSanitizeLabel = true
require.Equal(t, "", sanitize("", cfg.skipSanitizeLabel), "")
require.Equal(t, "_test", sanitize("_test", cfg.skipSanitizeLabel))
require.Equal(t, "key__test", sanitize("__test", cfg.skipSanitizeLabel))
require.Equal(t, "key_0test", sanitize("0test", cfg.skipSanitizeLabel))
require.Equal(t, "test", sanitize("test", cfg.skipSanitizeLabel))
require.Equal(t, "test__", sanitize("test_/", cfg.skipSanitizeLabel))
}

func TestConnectorUpdateExemplars(t *testing.T) {
// ----- conditions -------------------------------------------------------
factory := NewFactory()
Expand Down
1 change: 0 additions & 1 deletion connector/spanmetricsconnector/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ func createDefaultConfig() component.Config {
return &Config{
AggregationTemporality: "AGGREGATION_TEMPORALITY_CUMULATIVE",
DimensionsCacheSize: defaultDimensionsCacheSize,
skipSanitizeLabel: dropSanitizationGate.IsEnabled(),
MetricsFlushInterval: 15 * time.Second,
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,11 @@ connectors:
dimensions:
# If the span is missing http.method, the connector will insert
# the http.method dimension with value 'GET'.
# For example, in the following scenario, http.method is not present in a span and so will be added as a dimension to the metric with value "GET":
# - calls_total{http_method="GET",span_name="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 1
- name: http.method
default: GET

# If a default is not provided, the http.status_code dimension will be omitted
# if the span does not contain http.status_code.
# For example, consider a scenario with two spans, one span having http.status_code=200 and another missing http.status_code. Two metrics would result with this configuration, one with the http_status_code omitted and the other included:
# - calls_total{http_status_code="200",span_name="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 1
# - calls_total{span_name="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 1
- name: http.status_code

# The aggregation temporality of the generated metrics.
Expand Down

0 comments on commit ca0d35c

Please sign in to comment.