Configuration capabilities to retry for loading config via URL #8854

sjwang90 · 2021-02-12T23:45:43Z

Feature Request

Related: #7338

Proposal:

User should be able to designated the interval and number of retries for loading their config from a URL if their endpoint is down.

Current behavior:

Right now, Telegraf retries three times at 10s intervals when receiving an error on loading config from a url in the case of the remote endpoint being down. Current solution does not use env variables or use flags to change these settings (based on #8803).

Desired behavior:

User needs some way to configure interval and number of retries settings to determine the behavior of loading the config from a URL.

Use case:

From @schmorgs:
Planning to use Telegraf in production across a large number of servers across the globe, and there are many points where breakages could happen, especially in countries where there is very low bandwidth and old infrastructure. Along with that comes many standards and versions of OS, etc, hence our approach to manage config centrally so that we don't have to navigate the variety of ways of reaching an endpoint.

So if Telegraf starts up and there happened to be a breakage somewhere (NW connectivity, Web Server down, etc), the agent will die. On RHEL7/8 and Windows, we can utilise systemd/SCM to configure infinite retries on the agent so that even if it does die, it will be restarted.
But RHEL6 doesn't have systemd and so we would end up writing some sort of watcher daemon as well which seems a bit overkill if the agent could handle (at least) this condition.

The reason for the importance is this will be our primary monitoring agent and so want to make this as available and robust as possible. We would still implement external controls such as systemd restarts to provide an extra layer of resilience, but the more the agent can do in this area makes just adds to this.

In some cases, the situation where the agent was unable to get config would be fairly small as the agent only pulls config on startup. But we want the agent to periodically pull its config down so that it can be configured centrally and automatically pulled by the agent. I understand this is part of a longer term strategy for Telegraf, but in the meantime, we HUP the agent periodically as a workaround, and so now the agent has constant reliability on the HTTP endpoint and therefore, more likelihood of encountering a problem.

Whether a switch, environment variable, config file on the server, etc, I'm happy to see whichever approach works best.

The text was updated successfully, but these errors were encountered:

powersj · 2022-03-31T20:20:45Z

next step: investigate design and implications

nkcfan · 2024-05-05T05:59:25Z

It's quite normal requirement considering a power outage at home. The modem and router need time to connect to Internet, and the telegraf service with a url config just quickly tries several time and completely fails.

This introduces a new cli option to allow the user to set the number of retry attempts to something other than 3. It also allows the user to set the attempt count to -1 to infinitely retry. fixes: influxdata#8854

sjwang90 added feature request Requests for new plugin and for new features to existing plugins area/configuration labels Feb 12, 2021

sjwang90 mentioned this issue Feb 12, 2021

Agent won't start if remote HTTP config endpoint is down #7338

Closed

sjwang90 mentioned this issue Mar 2, 2021

feature-7338 Retry failed remote HTTP Config on startup #7349

Closed

powersj mentioned this issue Jun 16, 2022

Rewrite the Telegraf CLI #11316

Closed

powersj mentioned this issue Nov 9, 2022

Git repository as config directory #12209

Closed

powersj mentioned this issue Feb 15, 2024

Cache remote config localy and use it if remote server not working #5501

Closed

powersj self-assigned this Apr 29, 2024

powersj mentioned this issue May 17, 2024

feat(agent): Introduce CLI option to set config URL retry attempts #15377

Merged

1 task

srebhan closed this as completed in #15377 May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration capabilities to retry for loading config via URL #8854

Configuration capabilities to retry for loading config via URL #8854

sjwang90 commented Feb 12, 2021 •

edited

powersj commented Mar 31, 2022

nkcfan commented May 5, 2024

Configuration capabilities to retry for loading config via URL #8854

Configuration capabilities to retry for loading config via URL #8854

Comments

sjwang90 commented Feb 12, 2021 • edited

Feature Request

Proposal:

Current behavior:

Desired behavior:

Use case:

powersj commented Mar 31, 2022

nkcfan commented May 5, 2024

sjwang90 commented Feb 12, 2021 •

edited