Don't execute otel collector if configuration is "noop" #33680

cforce · 2024-06-20T18:27:56Z

Component(s)

cmd/opampsupervisor

Is your feature request related to a problem? Please describe.

Reduce overhead of overall runtime footprint in large fleets with a default of "wait and listen for commands" but being not operational sending telemetry, do not execute the collector

Describe the solution you'd like

Imagine a scenario where the supervisor is installed as basic part of a host (container or device) broadcasting DNS and searching for an OPAMP Backend until connected.
There no local "non default" config for the collector setup, just the default "Noop" cfg which would not send any telemetry but health of the collector.

The supervisor is just waiting to get connected to opamap Backend and afterwards waiting for a configuration update from remote for the collector.

To reduce overhead until collect receives a "job" the supervisor shall no execute the collector at all. As soon as an config is sent which overwrites noop default, only then execution (deamon) shall be started

Describe alternatives you've considered

No response

Additional context

No response

github-actions · 2024-06-20T18:28:11Z

Pinging code owners:

cmd/opampsupervisor: @evan-bradley @atoulme @tigrannajaryan

See Adding Labels via Comments if you do not have permissions to add labels yourself.

BinaryFissionGames · 2024-06-20T18:46:23Z

This would be nice improvement in more resource restricted environments.

I also think, beyond this being an initial state, it would be nice if the opamp server could also send an empty config (e.g. an empty configmap) to stop running the collector until it gets another config.

cforce · 2024-06-30T06:43:22Z

Currently, bootstrapping operates as described below (excerpt from the documentation):

Bootstrapping

To obtain the remote configuration from the OpAMP Backend, the Supervisor must send an AgentDescription to the Backend. Initially, the Supervisor doesn't have this information because the AgentDescription becomes available only after the Collector process is started and the AgentDescription is sent from the opamp extension to the Supervisor. However, it's impossible to start the Collector without a configuration.

To address this issue, the Supervisor starts the Collector with a "noop" configuration that doesn't collect any data but allows the opamp extension to start. The "noop" configuration consists of a single pipeline with an OTLP receiver listening on a random port, a debug exporter, and the opamp extension. The purpose of this "noop" configuration is to ensure that the Collector starts and the opamp extension communicates with the Supervisor.

Once the initial Collector launch is successful and the Supervisor receives the remote configuration, the Supervisor restarts the Collector with the new configuration. The new configuration is also cached by the Supervisor in a local file. This caching means subsequent restarts no longer need to use the "noop" configuration. It also allows the Supervisor to start the Collector without waiting for the OpAMP Backend to provide the remote configuration, mitigating any OpAMP Backend unavailability.

I don't understand why the AgentDescription needs to be managed specifically by the Collector, requiring the Collectors to start in order to connect the Supervisor to an OpAMP Backend. This dependency seems to have only disadvantages, especially if the Supervisor needs to manage multiple Collectors. This will be a particular limitation if the Supervisor has to manage many Collectors simultaneously (see issue #33682).

Why is the Collector considered the agent in a setup where the Supervisor is used? It would make more sense for the Supervisor to be the agent of OpAMP, with any connected Collector being transparent to the OpAMP Backend. Collectors should represent "any" host system as part of a subsystem registered via the Supervisor. In IoT systems, the Supervisor would act as an Hub, which serves as a gateway for all connected Collectors to connect to an OpAMP Backend that they cannot connect to directly for various reasons.

tigrannajaryan · 2024-07-02T21:05:28Z

Why is the Collector considered the agent in a setup where the Supervisor is used? It would make more sense for the Supervisor to be the agent of OpAMP, with any connected Collector being transparent to the OpAMP Backend. Collectors should represent "any" host system as part of a subsystem registered via the Supervisor. In IoT systems, the Supervisor would act as an Hub, which serves as a gateway for all connected Collectors to connect to an OpAMP Backend that they cannot connect to directly for various reasons.

The assumption is that users want to manage their Collector not their Supervisor. Supervisor is just the means to do it. OpAMP server needs to know what Collector it is managing so that it supplies the right configuration for example. And knowing what Collector it is requires receiving an AgentDescription that correctly describes the Collector (e.g. Collector's version number). The Supervisor does not have this knowledge and uses the bootstraping process to get that information from the Collector.

cforce · 2024-07-03T05:57:28Z

The supervisor must always be aware of the presence of a collector. However, certain registration details, initially set and persistently maintained like the agent ID, should not change, as the supervisor manages the collector.

Bootstrapping can be done without any prerequisites besides the supervisor. This means the collector is downloaded and installed the first time, and the supervisor has information about its capabilities (processors, extensions, receivers, exporters, etc.) through descriptive metadata (e.g., ocb build.yaml). Thus, the supervisor doesn't need to execute the collector to understand its characteristics.

Alternatively, if the collector is already installed (managed by a third-party update) and the "opamp update feature" is off, descriptive metadata—available without execution but requiring maintenance or a persisted state file—is used. This metadata is established after the initial setup, similar to the agent ID. Somebody could even create the file and therefore even skip this creation by running the collector at least one time to describe itself. ALso the metdata could be delivered with the exe download by the opamp backend.

The supervisor should cache this metadata for each agent. Permanent execution of collectors is not mandatory; instead, the supervisor initializes essential groundwork and can start the collector when necessary, optionally based on configuration changes (e.g., when cfg!=noop).

This approach also supports future scenarios where one supervisor may manage multiple collectors, like e.g. the new profiling eBPF client donated by elastic

tigrannajaryan · 2024-07-03T15:03:18Z

The implementation of the Supervisor currently follows this design.

What you are describing appears to be a different design. If you would like to propose an alternate design please post a complete design document so that it can be considered by Supervisor maintainers. (Please note: I do not know if the alternate design will be considered and whether it will be accepted, it may be worth attending a Collector SIG to gauge the interest first).

cforce · 2024-07-04T08:16:00Z

The idea about "design changes" just came up because i have no idea how else to implement "to not run the collector until cfg chnages arrives which is !=noop, do you?

BinaryFissionGames · 2024-07-17T13:31:48Z

The idea about "design changes" just came up because i have no idea how else to implement "to not run the collector until cfg chnages arrives which is !=noop, do you?

I think the idea is we keep the bootstrapping logic to get the agent description (this is a very quick, less than a second run of the collector on startup of the supervisor), then we would simply not start the long-running collector process if we don't have a config.

Does that make sense?

cforce · 2024-07-31T13:40:35Z

According to @evan-bradley
"The Supervisor will only restart the Collector when it receives new configuration from the OpAMP server; changes to files on disk will not restart the Collector."
"#32959 (comment)"
Restart is handled in stateful "bootstrapped" state.

Bootstrapping:

The Supervisor will start the Collector when ther is no agentId persistent.
The Supervisor will start the Collector when there is a agentId persistent even if ther might be no subscription for a cfg change (similar like for restart but different behaviour)
receives a !=nop configuration
-> this won't work without agentid-> bootstrapping mean intial id creation
-> agentid bootstrapping seems to require to connect to opamp server in realtime and subscribe to cfg changes at least for a second. If this is no successfull the reconnect will fail because of supervisor does no retry to connect to opamp server forever #33408
-> only collector is currently capable of uuid creation to register at opamp backend through supervisor. Remark: If the supervisor would be able to do that itself, than it would be not needed to start the collector at all (once per lifetime to create this id)
-> what happens if opamp backend is no reachable during bootrapping? There is a bootstrap runtime dependency on being able to bootstrap through supervisor relayed to opamp backend.
The Supervisor will not stop the Collector when it receives a nop configuration -> this mean two processes need to run continuously even if there is no need (nop cfg)

related #32554

BinaryFissionGames · 2024-07-31T14:25:10Z

Bootstrapping does not require any connection to an outside OpAMP server. It connects to an OpAMP server that is internal to the supervisor, the communication during bootstrapping is only between the collector and the supervisor.

Bootstrapping also is not to generate an agent ID (the supervisor actually generates the UUID), but rather the AgentDescription message, which contains metadata about the agent (e.g. the "name" of the agent, the version of the agent) that the supervisor doesn't necessarily know without somehow executing the collector.

Bootstrapping is only concerned with getting this AgentDescription message, so once the message is received, the supervisor can (and currently does) stop the collector.

Edit to add:
Bootstrapping like this is useful because it allows the collector, which will be easily updatable through remote updates, to control the AgentDescription message. That means if there's a useful piece of metadata added to the AgentDescription later, it won't require having to re-install a new supervisor everywhere, but just to push a remote update to the collector.

cforce · 2024-07-31T22:00:25Z

Tx for clarification

cforce added enhancement New feature or request needs triage New item requiring triage labels Jun 20, 2024

github-actions bot added the cmd/opampsupervisor label Jun 20, 2024

This was referenced Jun 20, 2024

Weekly Report: 2024-06-13 - 2024-06-20 LucaLanziani/opentelemetry-collector-contrib#14

Closed

Weekly Report: 2024-06-13 - 2024-06-20 LucaLanziani/opentelemetry-collector-contrib#15

Closed

github-actions bot mentioned this issue Jul 2, 2024

Weekly Report: 2024-06-25 - 2024-07-02 #33839

Open

cforce mentioned this issue Jul 2, 2024

[cmd/opampsupervisor] Supervisor fails healthcheck with bootstrap config #31897

Closed

github-actions bot mentioned this issue Jul 9, 2024

Weekly Report: 2024-07-02 - 2024-07-09 #33962

Open

github-actions bot mentioned this issue Jul 16, 2024

Weekly Report: 2024-07-09 - 2024-07-16 #34087

Open

github-actions bot mentioned this issue Jul 23, 2024

Weekly Report: 2024-07-16 - 2024-07-23 #34202

Open

github-actions bot mentioned this issue Jul 30, 2024

Weekly Report: 2024-07-23 - 2024-07-30 #34301

Open

github-actions bot mentioned this issue Aug 6, 2024

Weekly Report: 2024-07-30 - 2024-08-06 #34410

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't execute otel collector if configuration is "noop" #33680

Don't execute otel collector if configuration is "noop" #33680

cforce commented Jun 20, 2024

github-actions bot commented Jun 20, 2024

BinaryFissionGames commented Jun 20, 2024 •

edited

Loading

cforce commented Jun 30, 2024

tigrannajaryan commented Jul 2, 2024

cforce commented Jul 3, 2024 •

edited

Loading

tigrannajaryan commented Jul 3, 2024

cforce commented Jul 4, 2024 •

edited

Loading

BinaryFissionGames commented Jul 17, 2024 •

edited

Loading

cforce commented Jul 31, 2024 •

edited

Loading

BinaryFissionGames commented Jul 31, 2024 •

edited

Loading

cforce commented Jul 31, 2024

Don't execute otel collector if configuration is "noop" #33680

Don't execute otel collector if configuration is "noop" #33680

Comments

cforce commented Jun 20, 2024

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Jun 20, 2024

BinaryFissionGames commented Jun 20, 2024 • edited Loading

cforce commented Jun 30, 2024

Bootstrapping

tigrannajaryan commented Jul 2, 2024

cforce commented Jul 3, 2024 • edited Loading

tigrannajaryan commented Jul 3, 2024

cforce commented Jul 4, 2024 • edited Loading

BinaryFissionGames commented Jul 17, 2024 • edited Loading

cforce commented Jul 31, 2024 • edited Loading

BinaryFissionGames commented Jul 31, 2024 • edited Loading

cforce commented Jul 31, 2024

BinaryFissionGames commented Jun 20, 2024 •

edited

Loading

cforce commented Jul 3, 2024 •

edited

Loading

cforce commented Jul 4, 2024 •

edited

Loading

BinaryFissionGames commented Jul 17, 2024 •

edited

Loading

cforce commented Jul 31, 2024 •

edited

Loading

BinaryFissionGames commented Jul 31, 2024 •

edited

Loading