Skip to content

Latest commit

 

History

History

googlecloudpubsubexporter

Google Cloud Pubsub Exporter

Status
Stability beta
Supported pipeline types traces, logs, metrics
Distributions contrib

⚠️ This is a community-provided module. It has been developed and extensively tested at Collibra, but it is not officially supported by GCP.

This exporter sends OTLP messages to a Google Cloud Pubsub topic.

The following configuration options are supported:

  • project (Optional): The Google Cloud Project of the topics.
  • topic (Required): The topic name to receive OTLP data over. The topic name should be a fully qualified resource name (eg: projects/otel-project/topics/otlp).
  • compression (Optional): Set the payload compression, only gzip is supported. Default is no compression.
  • watermark Behaviour of how the ce-time attribute is set (see watermark section for more info)
    • behavior (Optional): current sets the ce-time attribute to the system clock, earliest sets the attribute to the smallest timestamp of all the messages.
    • allow_drift (Optional): The maximum difference the ce-time attribute can be set from the system clock. When the drift is set to 0, the maximum drift from the clock is allowed (only applicable to earliest).
exporters:
  googlecloudpubsub:
    project: my-project
    topic: otlp-traces

Pubsub topic

The Google Cloud Pubsub export doesn't automatic create topics, it expects the topic to be created upfront. Security wise it's best to give the collector its own service account and give the topic Pub/Sub Publisher permission.

Messages

The message published on the topic are CloudEvent compliance and uses the binary content mode defined in the Google Cloud Pub/Sub Protocol Binding for CloudEvents .

The data field is either a ExportTraceServiceRequest, ExportMetricsServiceRequest or ExportLogsServiceRequest for traces, metrics or logs respectively. Each message is accompanied by the following attributes:

attributes description
ce-specversion Follow version 1.0 of the CloudEvent spec
ce-source The source is this /opentelemetry/collector/googlecloudpubsub/<version> exporter
ce-id a random UUID to uniquely define the message
ce-time a watermark indicating when the events, encapsulated in the OTLP message, where generated. The behavior will depend on the watermark setting in the configuration
ce-type depending on the data org.opentelemetry.otlp.traces.v1, org.opentelemetry.otlp.metrics.v1 or org.opentelemetry.otlp.logs.v1
content-type the content type is application/protobuf
content-encoding indicates that payload is compressed. Only gzip compression is supported

Compression

By default, the messages are not compressed. By compressing the messages, the cost of Pubsub can be reduced to up to 20% of the cost. This can be done by setting the compression to gzip.

exporters:
  googlecloudpubsub:
    project: my-project
    topic: otlp-traces
    compression: gzip

The exporter with add the content-encoding attribute to the message. The receiver will look at this attribute to detect the compression that is used on the payload.

Only gzip is supported.

Watermark

A watermark is a threshold that indicates where streaming processing frameworks (like Apache Beam) expects all the data in a window to have arrived. If new data arrives with a timestamp that's in the window but older than the watermark, the data is considered late data. The watermark section will change the behaviour of the ce-time attribute of the message. If you don't use such frameworks you can ignore the section and the ce-time will be set to the current time, but to have a more reliable watermark behaviour in such streaming it's better to set the ce-time attribute to the earliest timestamp of the messages embedded in the Pubsub message.

Setting the behaviour to earliest will scan all the embedded message before sending the actual Pubsub message to figure out what the earliest timestamp is. You have to set allow_drift, the allowed maximum for the ce-time timestamp , if you want to behaviour to have effect as the default is 0s.

exporters:
  googlecloudpubsub:
    project: my-project
    topic: otlp-traces
    watermark: 
      behavior: earliest
      allow_drift: 1h

The default behavior is that the watermark is set to the current time of the processor. This timestamp will not differ that much as the timestamp that is attached to a Pubsub message. Most users that don't do anything outside using Pubsub as a global distribution system will not need anything else.

If you use Google Cloud Dataflow and want to rely on the advanced streaming feature you may want to change the behavior of the watermark and de-duplication. You can leverage the unique id (ce-id) and a timestamp (ce-time) attributes on the message. In Apache Beam (the framework used by Dataflow) you can set the attributes names on the Pubsub connector via the .withTimestampAttribute("ce-time") and .withIdAttribute("ce-id") methods. A good settings for this scenario is behavior: earliest with a reasonable allow_drift of 1h.

Allowed behavior values are current or earliest. For allow_drift the default is 0s, so make sure to set the value.