Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8sclusterreceiver] k8s.node.condition metric not aggregatable in current form #33760

Open
sirianni opened this issue Jun 25, 2024 · 3 comments
Labels
bug Something isn't working needs triage New item requiring triage receiver/k8scluster

Comments

@sirianni
Copy link
Contributor

sirianni commented Jun 25, 2024

Component(s)

receiver/k8scluster

What happened?

Description

The use of -1 for ConditionUnknown greatly hinders usability of the k8s.node.condition metric.

For example, it's not possible to get a simple count of ready nodes in a k8s cluster (since the -1 subtracts from the sum). This would be useful to write an alert comparing k8s.daemonset.ready_nodes to sum(k8s.node.condition{condition="ready"}).

Another example of the Splunk team continuing to push the antipattern of using the metric value to encode enumerations. While this may be usable in the Splunk backend, it simply doesn't work well in most other metric systems (Datadog, New Relic, Prometheus, etc.).

This metric should instead be modeled like the kube_node_status_condition metric from kube-state-metrics which includes status as an attribute following the OpenMetrics StateSet pattern. This allows queries of the form

sum by(condition) (kube_node_status_condition{condition="ready", status="true"})

Collector version

v0.103.0

Environment information

No response

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@sirianni sirianni added bug Something isn't working needs triage New item requiring triage labels Jun 25, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

For example, it's not possible to get a simple count of ready nodes in a k8s cluster

Isn't it possible if filter by the Condition attribute or value == 1? I agree that the -1 makes aggregation across all dimensions not accurate.

@sirianni
Copy link
Contributor Author

sirianni commented Jul 1, 2024

For example, it's not possible to get a simple count of ready nodes in a k8s cluster

Isn't it possible if filter by the Condition attribute or value == 1?

It's not possible to "filter by value" in many systems (e.g. datadog). You can only aggregate values (sum, min, max, avg).
I suspect one reason is because values can be pre-aggregated over time and space and therefore you lose the ability to filter at the native ingestion granularity. For example, if I pre-aggregate away the k8s.node.name label, then what happens? What about if I'm viewing this metric at hourly granularity and therefore serving the query from a preaggregated rollup table?

Filtering by the condition attribute doesn't work because the same condition: ready applies to both true (1) and unknown (-1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage receiver/k8scluster
Projects
None yet
Development

No branches or pull requests

2 participants