Skip to content

Latest commit

 

History

History
 
 

failoverconnector

Failover Connector

Status
Distributions contrib
Issues Open issues Closed issues
Code Owners @akats7, @djaglowski, @fatsheep9146

Supported Pipeline Types

Exporter Pipeline Type Receiver Pipeline Type Stability Level
traces traces alpha
metrics metrics alpha
logs logs alpha

Allows for health based routing between trace, metric, and log pipelines depending on the health of target downstream exporters.

Configuration

If you are not already familiar with connectors, you may find it helpful to first visit the Connectors README.

The following settings are available:

  • priority_levels (required): list of pipeline level priorities in a 1 - n configuration, multiple pipelines can sit at a single priority level.
  • retry_interval (optional): the frequency at which the pipeline levels will attempt to reestablish connection with all higher priority levels. Default value is 10 minutes. (See Example below for further explanation)
  • retry_gap (optional): the amount of time between trying two separate priority levels in a single retry_interval timeframe. Default value is 30 seconds. (See Example below for further explanation)
  • max_retries (optional): the maximum retries per level. Default value is 10. Set to 0 to allow unlimited retries.

The connector intakes a list of priority_levels each of which can contain multiple pipelines. If any pipeline at a stable level fails, the level is considered unhealthy and the connector will move down one priority level and route all data to the new level (assuming it is stable).

The connector will periodically try to reestablish a stable connection with the higher priority levels. retry_interval will be the frequency at which the connector will try to iterate through all unhealthy higher priority levels while retry_gap is how long it will wait after a failed retry at one level before retrying the next level (if retry_gap is 2m, after trying to reestablish level 1, it will wait 2m before trying level 2) It will retry a maximum of one unhealthy level before returning to the current stable level.) There is a max_retries config param as well that will track how many retries have occurred at each level, and once the max is hit, it will no longer retry that priority level.

Configuration Example:

connectors:
  failover:
    priority_levels:
      - [traces/first, traces/also_first]
      - [traces/second]
      - [traces/third]
    retry_interval: 5m
    retry_gap: 1m
    max_retries: 10

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [failover]
    traces/first:
      receivers: [failover]
      exporters: [otlp/first]
    traces/second:
      receivers: [failover]
      exporters: [otlp/second]
    traces/third:
      receivers: [failover]
      exporters: [otlp/third]
    traces/also_first:
      receivers: [failover]
      exporters: [otlp/fourth]

Example with Explanation:

connectors:
  failover:
    priority_levels:
      - [traces/first]
      - [traces/second]
      - [traces/third]
      - [traces/fourth]
    retry_interval: 5m
    retry_gap: 1m
    max_retries: 10

Assume the current stable level is level 4 (traces/fourth) on the priority_level list. At the start of the retry_interval, the connector will try to reestablish the pipeline on level 1 (trace/first). If it fails, the connector will return to level 4 (traces/fourth) and wait the 1m as the retry_gap, when that 1m passes it will now retry level 2 (traces/second) and if that fails will first return to level 4 before waiting another 1m until trying level 3. Once it tries level 3 and it fails, it will return to level 4 and wait the 10m retry_interval again before repeating the process. If a retry is successful then the retried level becomes the stable level, and the connector will continue to retry any higher priority levels that haven't exceeded the max_retries.