Skip to content

Latest commit

 

History

History
152 lines (119 loc) · 6.54 KB

File metadata and controls

152 lines (119 loc) · 6.54 KB

Span Metrics Processor

Status
Stability deprecated: traces
Distributions contrib, observiq, splunk, sumo

Note: The spanmetrics processor is deprecated in favour of the spanmetrics connector.

Note: Currently experimental and subject to breaking changes (e.g. change from processor to exporter/translator component). See: #403.

Aggregates Request, Error and Duration (R.E.D) metrics from span data.

Request counts are computed as the number of spans seen per unique set of dimensions, including Errors. For example, the following metric shows 142 calls:

calls_total{http_method="GET",http_status_code="200",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 142

Multiple metrics can be aggregated if, for instance, a user wishes to view call counts just on service_name and operation.

Error counts are computed from the Request counts which have an "Error" Status Code metric dimension. For example, the following metric indicates 220 errors:

calls_total{http_method="GET",http_status_code="503",operation="/checkout",service_name="frontend",span_kind="SPAN_KIND_CLIENT",status_code="STATUS_CODE_ERROR"} 220

Duration is computed from the difference between the span start and end times and inserted into the relevant latency histogram time bucket for each unique set dimensions. For example, the following latency buckets indicate the vast majority of spans (9K) have a 100ms latency:

latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="2"} 327
latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="6"} 751
latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="10"} 1195
latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="100"} 10180
latency_bucket{http_method="GET",http_status_code="200",label1="value1",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="250"} 10180
...

Each metric will have at least the following dimensions because they are common across all spans:

  • Service name
  • Operation
  • Span kind
  • Status code

This processor lets traces to continue through the pipeline unmodified.

The following settings are required:

  • metrics_exporter: the name of the exporter that this processor will write metrics to. This exporter must be present in a pipeline.

The following settings can be optionally configured:

  • latency_histogram_buckets: the list of durations defining the latency histogram buckets.

    • Default: [2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms, 400ms, 800ms, 1s, 1400ms, 2s, 5s, 10s, 15s]
  • dimensions: the list of dimensions to add together with the default dimensions defined above.

    Each additional dimension is defined with a name which is looked up in the span's collection of attributes or resource attributes (AKA process tags) such as ip, host.name or region.

    If the named attribute is missing in the span, the optional provided default is used.

    If no default is provided, this dimension will be omitted from the metric.

  • dimensions_cache_size: the size of cache for storing Dimensions to improve collectors memory usage.

    • Default: 1000.
  • aggregation_temporality: Defines the aggregation temporality of the generated metrics. One of either AGGREGATION_TEMPORALITY_CUMULATIVE or AGGREGATION_TEMPORALITY_DELTA.

    • Default: AGGREGATION_TEMPORALITY_CUMULATIVE
  • namespace: Defines the namespace of the generated metrics. If namespace provided, generated metric name will be added namespace. prefix.

  • metrics_flush_interval: Defines the flush interval of the generated metrics.

    • Default: 15s.

Examples

The following is a simple example usage of the spanmetrics processor.

For configuration examples on other use cases, please refer to More Examples.

The full list of settings exposed for this processor are documented here.

receivers:
  jaeger:
    protocols:
      thrift_http:
        endpoint: "0.0.0.0:14278"

  # Dummy receiver that's never used, because a pipeline is required to have one.
  otlp/spanmetrics:
    protocols:
      grpc:
        endpoint: "localhost:12345"

  otlp:
    protocols:
      grpc:
        endpoint: "localhost:55677"

processors:
  batch:
  spanmetrics:
    metrics_exporter: otlp/spanmetrics
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions:
      - name: http.method
        default: GET
      - name: http.status_code
    dimensions_cache_size: 1000
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"     
    metrics_flush_interval: 15s

exporters:
  jaeger:
    endpoint: localhost:14250

  otlp/spanmetrics:
    endpoint: "localhost:55677"
    tls:
      insecure: true

  prometheus:
    endpoint: "0.0.0.0:8889"

service:
  pipelines:
    traces:
      receivers: [jaeger]
      processors: [spanmetrics, batch]
      exporters: [jaeger]

    # The exporter name must match the metrics_exporter name.
    # The receiver is just a dummy and never used; added to pass validation requiring at least one receiver in a pipeline.
    metrics/spanmetrics:
      receivers: [otlp/spanmetrics]
      exporters: [otlp/spanmetrics]

    metrics:
      receivers: [otlp]
      exporters: [prometheus]

More Examples

For more example configuration covering various other use cases, please visit the testdata directory.