Status | |
---|---|
Stability | beta: logs, metrics, traces |
Distributions | contrib, observiq, redhat, splunk, sumo |
Kubernetes attributes processor allow automatic setting of spans, metrics and logs resource attributes with k8s metadata.
The processor automatically discovers k8s resources (pods), extracts metadata from them and adds the extracted metadata to the relevant spans, metrics and logs as resource attributes. The processor uses the kubernetes API to discover all pods running in a cluster, keeps a record of their IP addresses, pod UIDs and interesting metadata. The rules for associating the data passing through the processor (spans, metrics and logs) with specific Pod Metadata are configured via "pod_association" key. It represents a list of associations that are executed in the specified order until the first one is able to do the match.
The processor stores the list of running pods and the associated metadata. When it sees a datapoint (log, trace or metric), it will try to associate the datapoint to the pod from where the datapoint originated, so we can add the relevant pod metadata to the datapoint. By default, it associates the incoming connection IP to the Pod IP. But for cases where this approach doesn't work (sending through a proxy, etc.), a custom association rule can be specified.
Each association is specified as a list of sources of associations. A source is a rule that matches metadata from the datapoint to pod metadata. In order to get an association applied, all the sources specified need to match.
Each sources rule is specified as a pair of from
(representing the rule type) and name
(representing the attribute name if from
is set to resource_attribute
).
Following rule types are available:
from: "connection" - takes the IP attribute from connection context (if available) from: "resource_attribute" - allows to specify the attribute name to lookup up in the list of attributes of the received Resource. Semantic convention should be used for naming.
Pod association configuration.
pod_association:
# below association takes a look at the datapoint's k8s.pod.ip resource attribute and tries to match it with
# the pod having the same attribute.
- sources:
- from: resource_attribute
name: k8s.pod.ip
# below association matches for pair `k8s.pod.name` and `k8s.namespace.name`
- sources:
- from: resource_attribute
name: k8s.pod.name
- from: resource_attribute
name: k8s.namespace.name
If Pod association rules are not configured, resources are associated with metadata only by connection's IP Address.
Which metadata to collect is determined by metadata
configuration that defines list of resource attributes
to be added. Items in the list called exactly the same as the resource attributes that will be added.
The following attributes are added by default:
- k8s.namespace.name
- k8s.pod.name
- k8s.pod.uid
- k8s.pod.start_time
- k8s.deployment.name
- k8s.node.name
You can change this list with metadata
configuration.
Not all the attributes are guaranteed to be added. Only attribute names from metadata
should be used for
pod_association's resource_attribute
, because empty or non-existing values will be ignored.
Additional container level attributes can be extracted provided that certain resource attributes are provided:
- If the
container.id
resource attribute is provided, the following additional attributes will be available:- k8s.container.name
- container.image.name
- container.image.tag
- If the
k8s.container.name
resource attribute is provided, the following additional attributes will be available:- container.image.name
- container.image.tag
- If the
k8s.container.restart_count
resource attribute is provided, it can be used to associate with a particular container instance. If it's not set, the latest container instance will be used:- container.id (not added by default, has to be specified in
metadata
)
- container.id (not added by default, has to be specified in
The k8sattributesprocessor can also set resource attributes from k8s labels and annotations of pods and namespaces. The config for associating the data passing through the processor (spans, metrics and logs) with specific Pod/Namespace annotations/labels is configured via "annotations" and "labels" keys. This config represents a list of annotations/labels that are extracted from pods/namespaces and added to spans, metrics and logs. Each item is specified as a config of tag_name (representing the tag name to tag the spans with), key (representing the key used to extract value) and from (representing the kubernetes object used to extract the value). The "from" field has only two possible values "pod" and "namespace" and defaults to "pod" if none is specified.
A few examples to use this config are as follows:
annotations:
- tag_name: a1 # extracts value of annotation from pods with key `annotation-one` and inserts it as a tag with key `a1`
key: annotation-one
from: pod
- tag_name: a2 # extracts value of annotation from namespaces with key `annotation-two` with regexp and inserts it as a tag with key `a2`
key: annotation-two
regex: field=(?P<value>.+)
from: namespace
labels:
- tag_name: l1 # extracts value of label from namespaces with key `label1` and inserts it as a tag with key `l1`
key: label1
from: namespace
- tag_name: l2 # extracts value of label from pods with key `label2` with regexp and inserts it as a tag with key `l2`
key: label2
regex: field=(?P<value>.+)
from: pod
k8sattributes:
k8sattributes/2:
auth_type: "serviceAccount"
passthrough: false
filter:
node_from_env_var: KUBE_NODE_NAME
extract:
metadata:
- k8s.pod.name
- k8s.pod.uid
- k8s.deployment.name
- k8s.namespace.name
- k8s.node.name
- k8s.pod.start_time
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: connection
The k8sattributesprocessor needs get
, watch
and list
permissions on both pods
and namespaces
resources, for all namespaces and pods included in the configured filters. Additionally, when using k8s.deployment.uid
or k8s.deployment.name
the processor also needs get
, watch
and list
permissions for replicaset
resources.
Here is an example of a ClusterRole
to give a ServiceAccount
the necessary permissions for all pods and namespaces in the cluster (replace <OTEL_COL_NAMESPACE>
with a namespace where collector is deployed):
apiVersion: v1
kind: ServiceAccount
metadata:
name: collector
namespace: <OTEL_COL_NAMESPACE>
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otel-collector
rules:
- apiGroups: [""]
resources: ["pods", "namespaces"]
verbs: ["get", "watch", "list"]
- apiGroups: ["apps"]
resources: ["replicasets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
resources: ["replicasets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otel-collector
subjects:
- kind: ServiceAccount
name: collector
namespace: <OTEL_COL_NAMESPACE>
roleRef:
kind: ClusterRole
name: otel-collector
apiGroup: rbac.authorization.k8s.io
The processor can be used in collectors deployed both as an agent (Kubernetes DaemonSet) or as a gateway (Kubernetes Deployment).
When running as an agent, the processor detects IP addresses of pods sending spans, metrics or logs to the agent and uses this information to extract metadata from pods. When running as an agent, it is important to apply a discovery filter so that the processor only discovers pods from the same host that it is running on. Not using such a filter can result in unnecessary resource usage especially on very large clusters. Once the filter is applied, each processor will only query the k8s API for pods running on it's own node.
Node filter can be applied by setting the filter.node
config option to the name of a k8s node. While this works
as expected, it cannot be used to automatically filter pods by the same node that the processor is running on in
most cases as it is not know before hand which node a pod will be scheduled on. Luckily, kubernetes has a solution
for this called the downward API. To automatically filter pods by the node the processor is running on, you'll need
to complete the following steps:
- Use the downward API to inject the node name as an environment variable. Add the following snippet under the pod env section of the OpenTelemetry container.
2. spec:
containers:
- env:
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
This will inject a new environment variable to the OpenTelemetry container with the value as the name of the node the pod was scheduled to run on.
- Set "filter.node_from_env_var" to the name of the environment variable holding the node name.
k8sattributes:
filter:
node_from_env_var: KUBE_NODE_NAME # this should be same as the var name used in previous step
This will restrict each OpenTelemetry agent to query pods running on the same node only dramatically reducing resource requirements for very large clusters.
When running as a gateway, the processor cannot correctly detect the IP address of the pods generating the telemetry data without any of the well-known IP attributes, when it receives them from an agent instead of receiving them directly from the pods. To workaround this issue, agents deployed with the k8sattributes processor can be configured to detect the IP addresses and forward them along with the telemetry data resources. Collector can then match this IP address with k8s pods and enrich the records with the metadata. In order to set this up, you'll need to complete the following steps:
- Setup agents in passthrough mode Configure the agents' k8sattributes processors to run in passthrough mode.
# k8sattributes config for agent
k8sattributes:
passthrough: true
This will ensure that the agents detect the IP address as add it as an attribute to all telemetry resources. Agents will not make any k8s API calls, do any discovery of pods or extract any metadata.
- Configure the collector as usual No special configuration changes are needed to be made on the collector. It'll automatically detect the IP address of spans, logs and metrics sent by the agents as well as directly by other services/pods.
There are some edge-cases and scenarios where k8sattributes will not work properly.
The processor cannot correct identify pods running in the host network mode and enriching telemetry data generated by such pods is not supported at the moment, unless the association rule is not based on IP attribute.
The processor does not support detecting containers from the same pods when running as a sidecar. While this can be done, we think it is simpler to just use the kubernetes downward API to inject environment variables into the pods and directly use their values as tags.