Skip to content

Latest commit

 

History

History

tanzuobservabilityexporter

Tanzu Observability (Wavefront) Exporter

Status
Stability beta
Supported pipeline types traces, metrics
Distributions contrib

This exporter supports sending metrics and traces to Tanzu Observability.

Prerequisites

Configuration

Given a Wavefront proxy at 10.10.10.10 configured with customTracingListenerPorts=30001, a basic configuration of the Tanzu Observability exporter follows:

receivers:
  examplereceiver:

processors:
  batch:
    timeout: 10s

exporters:
  tanzuobservability:
    traces:
      endpoint: "http:https://10.10.10.10:30001"
    metrics:
      endpoint: "http:https://10.10.10.10:2878"

service:
  pipelines:
    traces:
      receivers: [ examplereceiver ]
      processors: [ batch ]
      exporters: [ tanzuobservability ]
    metrics:
      receivers: [ examplereceiver ]
      processors: [ batch ]
      exporters: [ tanzuobservability ]

Advanced Configuration

Resource Attributes on Metrics

Client programs using an OpenTelemetry SDK can be configured to wrap all emitted telemetry (metrics, spans, logs) with a set of global key-value pairs, called resource attributes . By default, the Tanzu Observability Exporter includes resource attributes on spans but excludes them on metrics. To include resource attributes as tags on metrics, set the flag resource_attrs_included to true as per the example below.

Note: Tanzu Observability has a 254-character limit on tag key-value pairs. If a resource attribute exceeds this limit, the metric will not show up in Tanzu Observability.

Application Resource Attributes on Metrics

The Tanzu Observability Exporter will include application resource attributes on metrics (application, service.name , cluster, and shard). To exclude these resource attributes as tags on metrics, set the flag app_tags_excluded to true as per the example below.

Note: A tag service.name(if provided) becomes service on the transformed wavefront metric. However, if both the tags (service & service.name) are provided then the service tag will be included.

Queuing and Retries

This exporter uses OpenTelemetry Collector helpers to queue data and retry on failures.

Recommended Pipeline Processors

The memory_limiter processor is recommended to prevent out of memory situations on the collector. It allows performing periodic checks of memory usage – if it exceeds defined limits it will begin dropping data and forcing garbage collection to reduce memory consumption. Details and defaults here .

Note: The order matters when enabling multiple processors in a pipeline (e.g. the memory limiter and batch processors in the example config below). Please refer to the processors' documentation for more information.

Example Advanced Configuration

receivers:
  examplereceiver:

processors:
  memory_limiter:
    check_interval: 1s
    limit_percentage: 50
    spike_limit_percentage: 30
  batch:
    timeout: 10s

exporters:
  tanzuobservability:
    traces:
      endpoint: "http:https://10.10.10.10:30001"
    metrics:
      endpoint: "http:https://10.10.10.10:2878"
      resource_attrs_included: true
      app_tags_excluded: true
    retry_on_failure:
      max_elapsed_time: 3m
    sending_queue:
      queue_size: 10000

service:
  pipelines:
    traces:
      receivers: [ examplereceiver ]
      processors: [ memory_limiter, batch ]
      exporters: [ tanzuobservability ]
    metrics:
      receivers: [ examplereceiver ]
      processors: [ memory_limiter, batch ]
      exporters: [ tanzuobservability ]

Attributes Required by Tanzu Observability

Source

A source field is required in Tanzu Observability spans and metrics. The source is set to the first matching OpenTelemetry Resource Attribute:

  1. source
  2. host.name
  3. hostname
  4. host.id

To reduce duplicate data, the matched attribute is excluded from the tags on the exported Tanzu Observability span or metric. If none of the above resource attributes exist, the OpenTelemetry Collector's hostname is used as a fallback for source.

Application Identity Tags on Spans

Application identity tags of application and service are required for all spans in Tanzu Observability.

  • application is set to the value of the attribute application on the OpenTelemetry Span or Resource. Default is " defaultApp".
  • service is set the value of the attribute service or service.name on the OpenTelemetry Span or Resource. Default is "defaultService".

Data Conversion for Traces

  • Trace IDs and Span IDs are converted to UUIDs. For example, span IDs are left-padded with zeros to fit the correct size.
  • Events are converted to Span Logs.
  • Kind is converted to the span.kind tag.
  • If a Span's status code is error, a tag of error=true is added. If the status also has a description, it's set to otel.status_description.
  • TraceState is converted to the w3c.tracestate tag.

Data Conversion for Metrics

This section describes the process used by the Exporter when converting from OpenTelemetry Metrics to Tanzu Observability by Wavefront Metrics.

OpenTelemetry Metric Type Wavefront Metric Type Notes
Gauge Gauge
Cumulative Sum Cumulative Counter
Delta Sum Delta Counter
Cumulative Histogram (incl. Exponential) Cumulative Counters Details below.
Delta Histogram (incl. Exponential) Histogram
Summary Gauges Details below.

Cumulative Histogram Conversion (incl. Exponential)

A cumulative histogram is converted to multiple counter metrics: one counter per bucket in the histogram. Each counter has a special "le" tag that matches the upper bound of the corresponding bucket. The value of the counter metric is the sum of the histogram's corresponding bucket and all the buckets before it.

When working with OpenTelemetry Cumulative Histograms that have been converted to Wavefront Counters, these functions will be of use:

Example

Suppose a cumulative histogram named "http.response_times" has the following buckets and values:

Bucket Value
≤ 100ms 5
> 100ms to ≤ 200ms 20
> 200ms 100

The exporter sends the following metrics to tanzuobservability:

Name Tags Value
http.response_times le="100" 5
http.response_times le="200" 25
http.response_times le="+Inf" 125

Example WQL Query on a Cumulative Histogram

Using the cumulative histogram from the section above, this WQL query will produce a graph showing the 95th percentile of http response times in the last 15 minutes.

cumulativePercentile(95, mavg(15m, deriv(sum(ts(http.reponse_times), le))))

The sum function aggregates the http response times and groups them by the le tag. Since http.response_times has three buckets, the sum() function will graph three lines, one for each bucket. deriv() shows the per second rate of change in the three lines from sum. The mavg function averages the rates of change of the three lines over the last 15 minutes. Since the rates of change are per second, if you multiply the average rate of change for a bucket by 900, you get the number of new http requests falling into that bucket in the last 15 minutes. Finally, cumulativePercentile uses the values of the le tags, which are http response times, and linear interpolation of the bucket counts to estimate the 95th percentile of http.response_times over the last 15 minutes.

Summary Conversion

A summary is converted to multiple gauge metrics: one gauge for every quantile in the summary. A special "quantile" tag contains avalue between 0 and 1 indicating the quantile for which the value belongs.