kubernetes = null #1691

mflococo · 2019-10-29T14:42:41Z

Bug Report

Describe the bug

Intermittently across multiple instances of our aws eks clusters logs are arriving with the kubernetes enrichment and annotations missing, the entire key is null.

We have tried restarting the fluent-bit pods, deleting the daemonset, creating new clusters and sometimes it works (arrives with k8s enrichment and annotations) others it does not.

To Reproduce

Example log message if applicable:

   @i: 33b65867
   @m: Waiting 1499 seconds to renew Vault token.
   @t: 2019-10-29T14:25:31.9918803Z
   delaySeconds: 1499
   environment: Union
   kubernetes: null
   stream: stdout
   time: 2019-10-29T14:25:31.992070193Z
   version: 0.2.2215.24190

Steps to reproduce the problem:

we haven't been able to pinpoint the issue or consistently reproduce it.

Expected behavior

When we define the input, filter and parser to extract kubernetes meta data and annotations from our logs and api it is added to our log file.

Screenshots

Your Environment

Version used: https://github.com/aws/amazon-cloudwatch-logs-for-fluent-bit which uses fluent-bit 1.3.2
Configuration:

spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: fluent-bit
      containers:
      - name: aws-for-fluent-bit
        image: amazon/aws-for-fluent-bit:1.3.2
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        - name: mnt
          mountPath: /mnt
          readOnly: true
        resources:
          limits:
            memory: 500Mi
          requests:
            cpu: 500m
            memory: 100Mi
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      - name: mnt
        hostPath:
          path: /mnt

kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: kube-system
  labels:
    app.kubernetes.io/name: fluentbit
data:
  fluent-bit.conf: |
    [SERVICE]
        Parsers_File  parsers.conf
        HTTP_Server  On
        HTTP_Listen  0.0.0.0
        HTTP_PORT    2020
    [INPUT]
        Name              tail
        # namespace_name, container_name, pod_name, docker_id must be present in the tag
        # as they are required by the kubernetes filter
        # This regex cribbed from https://github.com/fluent/fluent-bit/blob/v1.0.2/plugins/filter_kubernetes/kube_regex.h#L25
        Tag_Regex         (?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$
        Tag               k8s_containers.<namespace_name>.<container_name>.<pod_name>.<docker_id>-
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        DB.Sync           Normal
        Buffer_Chunk_Size 1M
        Buffer_Max_Size   2M
        Mem_Buf_Limit     256MB
        Skip_Long_Lines   On
        Refresh_Interval  2
        Docker_Mode       On
    [FILTER]
        Name           kubernetes
        Match          *
        Kube_URL       https://kubernetes.default.svc:443
        Merge_Log      On
        Keep_Log       Off
        Annotations    On
        tls.verify     Off
        Regex_Parser   k8s-custom-tag
        Kube_Tag_Prefix k8s_containers.

    # Add logging to cloudwatch by looping through Namespaces
    [OUTPUT]
        Alias kube-system
        Name cloudwatch
        Match k8s_containers.kube-system*
        region us-east-1
        log_group_name /eks/test/kube-system
        log_stream_prefix kube-system-
        auto_create_group true

  parsers.conf: |
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%Z
        Time_Keep    On
        # Command      |  Decoder | Field | Optional Action
        # =============|==================|=================
        # Decode_Field_As   escaped_utf8    log    do_next
        Decode_Field_As   json       log
    [PARSER]
        # Kubernetes filter must be able to parse these fields out of the tag to work
        Name    k8s-custom-tag
        Format  regex
        Regex   (?<namespace_name>[^_]+)\.(?<container_name>.+)\.(?<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)\.(?<docker_id>[a-z0-9]{64})-$
---
# Service added for ServiceMonitor (Prometheus)
apiVersion: v1
kind: Service
metadata:
  name: fluentbit
  namespace: kube-system
  labels:
    app: fluentbit
spec:
  selector:
    name: fluentbit
  ports:
  - port: 2020
    protocol: TCP
    targetPort: 2020
    name: metrics

Environment name and version (e.g. Kubernetes? What version?): AWS EKS Kubernetes version 1.13
Server type and version: AWS EKS ec2 t3.medium
Operating System and version: amazon-eks-node-1.13-v20190927 (ami-0990970b056c619eb)
Filters and plugins: kubernetes

Additional context

software development is severely impacted by not being able to trace where in the system things are failing. It also affects analytics as values we defined in our queries are missing.

The text was updated successfully, but these errors were encountered:

computeracer · 2019-10-29T18:39:33Z

Here is another example when outputting fluentbit logs to file:

k8s_containers.kube-system.aws-for-fluent-bit.fluentbit-7g649.c311243e64905c4f193c7bca600399fa8c7f8b48abfe15f4a41dfe11e196ea45-: [1572373624.454250, {"log":"2019-10-29T18:27:04.441400237Z stderr F [2019/10/29 18:27:04] [debug] [filter_kube] API Server (ns=kube-system, pod=kube-proxy-lvt7v) http_do=0, HTTP Status: 200","kubernetes":null}]

computeracer · 2019-10-30T20:30:46Z

Interesting to note is when I manually give the daemonset aws creds, and run it in a kind cluster, I still get intermittent "kubernetes":null. I also tried changing output to file, and again intermittent failures.
Any thoughts @kantica and @edsiper ?

computeracer · 2019-10-31T17:22:34Z

Today we greatly simplified the config, and deployed (kubectl apply -f) it in a local kind cluster.

We seem to consistently not get k8s annotations ("kubernetes": null) on the fluent-bit pod, but the others have all the data under fluent bit. Hope this may help get to the root of the issue.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: pod-log-reader
rules:
- apiGroups: [""]
  resources:
  - namespaces
  - pods
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: pod-log-crb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: pod-log-reader
subjects:
- kind: ServiceAccount
  name: fluent-bit
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentbit
  namespace: kube-system
  labels:
    app.kubernetes.io/name: fluentbit
spec:
  selector:
    matchLabels:
      name: fluentbit
  template:
    metadata:
      labels:
        name: fluentbit
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: fluent-bit
      containers:
      - name: aws-for-fluent-bit
        image: amazon/aws-for-fluent-bit:1.3.2
        env:
        - name: FLB_LOG_LEVEL
          value: debug
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        - name: mnt
          mountPath: /mnt
          readOnly: true
        - name: cache
          mountPath: /fluent-bit/cache
        resources:
          limits:
            memory: 500Mi
          requests:
            cpu: 500m
            memory: 100Mi
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      - name: mnt
        hostPath:
          path: /mnt
      - name: cache
        emptyDir: {}
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: kube-system
  labels:
    app.kubernetes.io/name: fluentbit
data:
  fluent-bit.conf: |
    [SERVICE]
        Parsers_File  parsers.conf
        HTTP_Server  On
        HTTP_Listen  0.0.0.0
        HTTP_PORT    2020
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     256MB
    [FILTER]
        Name kubernetes
        Match kube.*
        Kube_URL https://kubernetes.default.svc:443
        Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix kube.var.log.containers.
        Merge_Log On
        Merge_Log_Key log_processed
        Kube_meta_preload_cache_dir /fluent-bit/cache

    [OUTPUT]
        Alias kube-system
        Name file
        Match kube.*
        Path /var/log/kube-system.log

  parsers.conf: |
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

jhuntwork · 2019-11-06T14:47:10Z

This issue has been extremely difficult to consistently reproduce. Only sometimes does the request to the API seem to produce null annotations. And because fluent-bit caches the metadata for a pod/namespace combination, it can take a while to encounter it.

Here's a method to eventually trigger the issue:

Create a kind cluster

kind create cluster
export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"

Check out the fluent-bit source and make one small adjustment so that metadata is always fetched from the api instead of just using the cached result:

diff --git a/plugins/filter_kubernetes/kube_meta.c b/plugins/filter_kubernetes/kube_meta.c
index 0d3df7e7..e2a59529 100644
--- a/plugins/filter_kubernetes/kube_meta.c
+++ b/plugins/filter_kubernetes/kube_meta.c
@@ -974,7 +974,7 @@ int flb_kube_meta_get(struct flb_kube *ctx,
     ret = flb_hash_get(ctx->hash_table,
                        meta->cache_key, meta->cache_key_len,
                        &hash_meta_buf, &hash_meta_size);
-    if (ret == -1) {
+    if (-1 == -1) {
         /* Retrieve API server meta and merge with local meta */
         ret = get_and_merge_meta(ctx, meta,
                                  &tmp_hash_meta_buf, &hash_meta_size);

Build a docker image with the above change, and load it into kind

docker build -t fluent-bit:test .
kind load docker-image fluent-bit:test

Launch fluent-bit via kubectl. The following spec should represent a pretty minimal setup:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: pod-log-reader
rules:
- apiGroups: [""]
  resources:
  - namespaces
  - pods
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: pod-log-crb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: pod-log-reader
subjects:
- kind: ServiceAccount
  name: fluent-bit
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentbit
  namespace: kube-system
  labels:
    app.kubernetes.io/name: fluentbit
spec:
  selector:
    matchLabels:
      name: fluentbit
  template:
    metadata:
      labels:
        name: fluentbit
    spec:
      serviceAccountName: fluent-bit
      containers:
      - name: fluent-bit
        image: fluent-bit:test
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: kube-system
  labels:
    app.kubernetes.io/name: fluentbit
data:
  fluent-bit.conf: |
    [SERVICE]
        Parsers_File  parsers.conf
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Refresh_Interval  2
    [FILTER]
        Name kubernetes
        Kube_Tag_Prefix kube.var.log.containers.
        Match kube.*
        Merge_Log On
    [OUTPUT]
        Alias kube-system
        Name file
        Match kube.*
        Path /var/log/fluent-bit-output.log
  parsers.conf: |
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

Get shell access into the kind control-plane container and tail the output log.

# First grab the container id of the control plane
$ docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                                  NAMES
cc590a19909f        kindest/node:v1.15.3   "/usr/local/bin/entr…"   15 minutes ago      Up 15 minutes       64283/tcp, 127.0.0.1:64283->6443/tcp   kind-control-plane
# Then docker exec
$ docker exec -it cc590a19909f /bin/bash
# Inside the control-plane container, tail the log
tail -f /var/log/fluent-bit-output.log | grep -iE 'null|filter'

In another terminal session on the same host system, continually loop through creating a pod with some basic output and deleting it.

export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"
while true; do \
    kubectl run -it --rm --generator=run-pod/v1 --image=alpine alpine -- /bin/sh -c 'for i in 1 2 3 4 5; do date; sleep 1; done'; \
    sleep 1; \
done

Eventually, you will see logs from the output being tailed that look like this:

kube.var.log.containers.alpine_default_alpine-c40b7545867c821b3dfe0b398983e636132c44a98ab9ea8463d83b76206d77cd.log: [1573048310.008641, {"log":"2019-11-06T13:51:49.0348335Z stdout F Wed Nov  6 13:51:49 UTC 2019","kubernetes":null}]
kube.var.log.containers.alpine_default_alpine-c40b7545867c821b3dfe0b398983e636132c44a98ab9ea8463d83b76206d77cd.log: [1573048310.036044, {"log":"2019-11-06T13:51:50.0358045Z stdout F Wed Nov  6 13:51:50 UTC 2019","kubernetes":null}]

gageorsburn · 2019-11-06T18:32:11Z

I seem to be hitting this as well. Not sure what causes it.

kantica · 2019-11-08T15:12:09Z

I have found what is causing this, and its inside kube_meta.c: extract_container_hash()

Baiscally part of extract_container_hash() which looks at "containerID" and "imageID" does not work correctly on for example kind cluster. Therefore there is corruption of kubernetes messagepack metadata map.

What is going on?
Inside kube_meta.c following is defined:

#define FLB_KUBE_META_CONTAINER_STATUSES_KEY "containerStatuses"
#define FLB_KUBE_META_CONTAINER_STATUSES_KEY_LEN \
    (sizeof(FLB_KUBE_META_CONTAINER_STATUSES_KEY) - 1)
#define FLB_KUBE_META_INIT_CONTAINER_STATUSES_KEY "initContainerStatuses"
#define FLB_KUBE_META_INIT_CONTAINER_STATUSES_KEY_LEN \
    (sizeof(FLB_KUBE_META_INIT_CONTAINER_STATUSES_KEY) - 1)
#define FLB_KUBE_META_CONTAINER_ID_PREFIX "docker:https://"
#define FLB_KUBE_META_CONTAINER_ID_PREFIX_LEN \
    (sizeof(FLB_KUBE_META_CONTAINER_ID_PREFIX) - 1)
#define FLB_KUBE_META_IMAGE_ID_PREFIX "docker-pullable:https://"
#define FLB_KUBE_META_IMAGE_ID_PREFIX_LEN \
    (sizeof(FLB_KUBE_META_IMAGE_ID_PREFIX) - 1)

CONTAINER_ID_PREFIX is defined as "docker:https://"
IMAGE_ID_PREFIX is defined as "docker-pullable:https://"

Looking at (kind) cluster metadata from kube-api server, following can be observerd:

# curl -s -k -H "Authorization: Bearer $token" https://kubernetes.default.svc.cluster.local/api/v1/namespaces/default/pods | jq '.items[].status.containerStatuses[] | "\(.containerID) \(.imageID)"'
"containerd:https://1d13ee4a35ca70f86cef420871d4a552ad7b14e251265dcf23595add22508247 docker.io/library/alpine@sha256:c19173c5ada610a5989151111163d28a67368362762534d8a8121ce95cf2bd5a"
"containerd:https://9d53fd9649c4ff7e67f73e26f7c1e13fc14caa2417bfc3694f567645f5338620 docker.io/library/alpine@sha256:c19173c5ada610a5989151111163d28a67368362762534d8a8121ce95cf2bd5a"
# curl -s -k -H "Authorization: Bearer $token" https://kubernetes.default.svc.cluster.local/api/v1/namespaces/default/pods | egrep '(containerID|imageID)'
            "imageID": ""
            "imageID": "docker.io/library/alpine@sha256:c19173c5ada610a5989151111163d28a67368362762534d8a8121ce95cf2bd5a",
            "containerID": "containerd:https://1ed452c3d02d8114e5739375e1d67405fa40c2f78797ec2d8e5bd13740424273"
                "containerID": "containerd:https://1efbaa7537602817fbf902570011db53097fed4b45301017f4c2b5691bb63d47"
            "imageID": "docker.io/library/alpine@sha256:c19173c5ada610a5989151111163d28a67368362762534d8a8121ce95cf2bd5a",
            "containerID": "containerd:https://1efbaa7537602817fbf902570011db53097fed4b45301017f4c2b5691bb63d47"

Clearly expected IMAGE_ID_PREFIX is not "docker-pullable:https://" and expected CONTAINER_ID_PREFIX is "containerd:https://" insted of defined "docker:https://".

This is relevant code part inside kube_meta.c is https://github.com/fluent/fluent-bit/blob/master/plugins/filter_kubernetes/kube_meta.c#L385-L400

If you comment that two else if blocks the issue should be gone.
IMHO IMAGE_ID and CONTAINER_ID prefixes should be configurable with sane defaults, and relevant code parts better/more robust (re)written.

kantica · 2019-11-08T15:52:30Z

Here is quick-and-dirty patch which should probably work so that metadata msgpack map is not corrupted :)

diff --git a/plugins/filter_kubernetes/kube_meta.c b/plugins/filter_kubernetes/kube_meta.c
index 0d3df7e7..790c53ca 100644
--- a/plugins/filter_kubernetes/kube_meta.c
+++ b/plugins/filter_kubernetes/kube_meta.c
@@ -395,8 +395,10 @@ static void extract_container_hash(struct flb_kube_meta *meta,
                                        "imageID",
                                        k2.via.str.size)) {
                         /* Strip "docker-pullable:https://" prefix */
-                        container_hash = v2.ptr + FLB_KUBE_META_IMAGE_ID_PREFIX_LEN;
-                        container_hash_len = v2.size - FLB_KUBE_META_IMAGE_ID_PREFIX_LEN;
+                        if (v2.size >= FLB_KUBE_META_IMAGE_ID_PREFIX_LEN) {
+                            container_hash = v2.ptr + FLB_KUBE_META_IMAGE_ID_PREFIX_LEN;
+                            container_hash_len = v2.size - FLB_KUBE_META_IMAGE_ID_PREFIX_LEN;
+                        }
                     }
                 }
                 if (name_found) {

edsiper · 2019-11-08T18:34:09Z

thanks everyone! what a team work :) , based on the feedback and the solution provided by @kantica I have pushed the followng test image:

edsiper/fluent-bit-1.3.3-next:3

please give it a try, if it looks good we can do a release TODAY. please test it!

computeracer · 2019-11-08T19:12:37Z

Tested with edsiper/fluent-bit-1.3.3-next:3. Could not get a null. Well done.

I deleted and applied my deployment referenced earlier several times with the new image and could not reproduce a null in the logs of my kind cluster. Thank you!

@kantica

The following patch fix the container hash lookup, there are certain deployments where containers prefix are not docker:https:// or docker-pullable:https:// like when 'kind' is used. The fix works as failover mechanism skipping the prefix and consuming the string after the ':https://' separator. This fix is based in the solution provided by @kantica. note: kudos to all people involved on this issue that helper to troubleshoot, find the root cause of the problem and provide workarounds. Signed-off-by: Eduardo Silva <[email protected]>

ivanfoo · 2019-11-11T17:35:21Z

We are testing the patch too, it seems we could not get any null (so far)

This fixes containerID and imageID hash lookup. There was error in processing/extracting hash from containerID or imageID if strings do not begin with docker:https:// or docker-pullable:https:// Implication of this would be that kubernetes metadata structure is "null" at output. In some cases imageID comes also as empty string from kubernetes api. Example cases: * "imageID": "" * "imageID": "sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c" * "imageID": "docker:https://sha256:2b424891d78a4704aa9d763b274f1be6752766d24463b9e3b5411a7bac2207ca" * "imageID": "docker.io/library/debian@sha256:41f76363fd83982e14f7644486e1fb04812b3894aa4e396137c3435eaf05de88" * "imageID": "docker-pullable:https://redis@sha256:4be7fdb131e76a6c6231e820c60b8b12938cf1ff3d437da4871b9b2440f4e385" * "imageID": "docker-pullable:https://k8s.gcr.io/event-exporter@sha256:16ca66e2b5dc7a1ce6a5aafcb21d0885828b75cdfc08135430480f7ad2364adc" * "imageID": "docker-pullable:https://kubedb/operator@sha256:063a92a47de7c4b25e3bc382d878513564c65c5313b347bdbcfc60959d082b22" * "imageID": "docker-pullable:https://quay.io/jetstack/cert-manager-controller@sha256:66eb65ac8ff3505310e850e0486c4dd82b26e6cfdc7afd7456e0fab753b57855" * "containerID": "docker:https://e8d482336de6b610b9084e9e74b0213c5cc7d399053016448319a4970a16324e" * "containerID": "containerd:https://1ed452c3d02d8114e5739375e1d67405fa40c2f78797ec2d8e5bd13740424273" * "containerID": "cri-o:https://fc9f90e123ca14e4d000c9d4c854bb54d43ce96b1624d51bb4b97afd1c728229" Obviously value of containerID is dependent on container runtime used by kubernetes cluster. This patch tries to find relative positions of last ":" and "/" to the end of the string. Therefore only hash is extracted and everything preceding it, is discarded. Signed-off-by: kantica <[email protected]>

@kantica

The following patch fix the container hash lookup, there are certain deployments where containers prefix are not docker:https:// or docker-pullable:https:// like when 'kind' is used. The fix works as failover mechanism skipping the prefix and consuming the string after the ':https://' separator. This fix is based in the solution provided by @kantica. note: kudos to all people involved on this issue that helper to troubleshoot, find the root cause of the problem and provide workarounds. Signed-off-by: Eduardo Silva <[email protected]>

This fixes containerID and imageID hash lookup. There was error in processing/extracting hash from containerID or imageID if strings do not begin with docker:https:// or docker-pullable:https:// Implication of this would be that kubernetes metadata structure is "null" at output. In some cases imageID comes also as empty string from kubernetes api. Example cases: * "imageID": "" * "imageID": "sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c" * "imageID": "docker:https://sha256:2b424891d78a4704aa9d763b274f1be6752766d24463b9e3b5411a7bac2207ca" * "imageID": "docker.io/library/debian@sha256:41f76363fd83982e14f7644486e1fb04812b3894aa4e396137c3435eaf05de88" * "imageID": "docker-pullable:https://redis@sha256:4be7fdb131e76a6c6231e820c60b8b12938cf1ff3d437da4871b9b2440f4e385" * "imageID": "docker-pullable:https://k8s.gcr.io/event-exporter@sha256:16ca66e2b5dc7a1ce6a5aafcb21d0885828b75cdfc08135430480f7ad2364adc" * "imageID": "docker-pullable:https://kubedb/operator@sha256:063a92a47de7c4b25e3bc382d878513564c65c5313b347bdbcfc60959d082b22" * "imageID": "docker-pullable:https://quay.io/jetstack/cert-manager-controller@sha256:66eb65ac8ff3505310e850e0486c4dd82b26e6cfdc7afd7456e0fab753b57855" * "containerID": "docker:https://e8d482336de6b610b9084e9e74b0213c5cc7d399053016448319a4970a16324e" * "containerID": "containerd:https://1ed452c3d02d8114e5739375e1d67405fa40c2f78797ec2d8e5bd13740424273" * "containerID": "cri-o:https://fc9f90e123ca14e4d000c9d4c854bb54d43ce96b1624d51bb4b97afd1c728229" Obviously value of containerID is dependent on container runtime used by kubernetes cluster. This patch tries to find relative positions of last ":" and "/" to the end of the string. Therefore only hash is extracted and everything preceding it, is discarded. Signed-off-by: kantica <[email protected]>

@edsiper

This commit reworks last patch for this issue and rebases to current master with patch from @edsiper for the same issue. Rework avoids double usage of memrchr() for same string for matching "/" from 5844b9f. Signed-off-by: kantica <[email protected]>

@edsiper

* filter_kubernetes: fix container hash lookup (#1691) This fixes containerID and imageID hash lookup. There was error in processing/extracting hash from containerID or imageID if strings do not begin with docker:https:// or docker-pullable:https:// Implication of this would be that kubernetes metadata structure is "null" at output. In some cases imageID comes also as empty string from kubernetes api. Example cases: * "imageID": "" * "imageID": "sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c" * "imageID": "docker:https://sha256:2b424891d78a4704aa9d763b274f1be6752766d24463b9e3b5411a7bac2207ca" * "imageID": "docker.io/library/debian@sha256:41f76363fd83982e14f7644486e1fb04812b3894aa4e396137c3435eaf05de88" * "imageID": "docker-pullable:https://redis@sha256:4be7fdb131e76a6c6231e820c60b8b12938cf1ff3d437da4871b9b2440f4e385" * "imageID": "docker-pullable:https://k8s.gcr.io/event-exporter@sha256:16ca66e2b5dc7a1ce6a5aafcb21d0885828b75cdfc08135430480f7ad2364adc" * "imageID": "docker-pullable:https://kubedb/operator@sha256:063a92a47de7c4b25e3bc382d878513564c65c5313b347bdbcfc60959d082b22" * "imageID": "docker-pullable:https://quay.io/jetstack/cert-manager-controller@sha256:66eb65ac8ff3505310e850e0486c4dd82b26e6cfdc7afd7456e0fab753b57855" * "containerID": "docker:https://e8d482336de6b610b9084e9e74b0213c5cc7d399053016448319a4970a16324e" * "containerID": "containerd:https://1ed452c3d02d8114e5739375e1d67405fa40c2f78797ec2d8e5bd13740424273" * "containerID": "cri-o:https://fc9f90e123ca14e4d000c9d4c854bb54d43ce96b1624d51bb4b97afd1c728229" Obviously value of containerID is dependent on container runtime used by kubernetes cluster. This patch tries to find relative positions of last ":" and "/" to the end of the string. Therefore only hash is extracted and everything preceding it, is discarded. Signed-off-by: kantica <[email protected]> * filter_kubernetes: fix container hash lookup (#1691) This commit reworks last patch for this issue and rebases to current master with patch from @edsiper for the same issue. Rework avoids double usage of memrchr() for same string for matching "/" from 5844b9f. Signed-off-by: kantica <[email protected]>

This fixes tests for container_hash kubernetes meta. Tests are for fixes made in fluent#1691 and PR fluent#1731. Signed-off-by: kantica <[email protected]>

This fixes tests for container_hash kubernetes meta. Tests are for fixes made in #1691 and PR #1731. Signed-off-by: kantica <[email protected]>

jhuntwork · 2019-11-14T18:16:12Z

When we applied just d62bd88 to our custom image, we still eventually saw some nulls. But with current master (now at cc6b7bb) we haven't seen any in the last 2 hours.

@kantica

The following patch fix the container hash lookup, there are certain deployments where containers prefix are not docker:https:// or docker-pullable:https:// like when 'kind' is used. The fix works as failover mechanism skipping the prefix and consuming the string after the ':https://' separator. This fix is based in the solution provided by @kantica. note: kudos to all people involved on this issue that helper to troubleshoot, find the root cause of the problem and provide workarounds. Signed-off-by: Eduardo Silva <[email protected]>

@edsiper

* filter_kubernetes: fix container hash lookup (#1691) This fixes containerID and imageID hash lookup. There was error in processing/extracting hash from containerID or imageID if strings do not begin with docker:https:// or docker-pullable:https:// Implication of this would be that kubernetes metadata structure is "null" at output. In some cases imageID comes also as empty string from kubernetes api. Example cases: * "imageID": "" * "imageID": "sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c" * "imageID": "docker:https://sha256:2b424891d78a4704aa9d763b274f1be6752766d24463b9e3b5411a7bac2207ca" * "imageID": "docker.io/library/debian@sha256:41f76363fd83982e14f7644486e1fb04812b3894aa4e396137c3435eaf05de88" * "imageID": "docker-pullable:https://redis@sha256:4be7fdb131e76a6c6231e820c60b8b12938cf1ff3d437da4871b9b2440f4e385" * "imageID": "docker-pullable:https://k8s.gcr.io/event-exporter@sha256:16ca66e2b5dc7a1ce6a5aafcb21d0885828b75cdfc08135430480f7ad2364adc" * "imageID": "docker-pullable:https://kubedb/operator@sha256:063a92a47de7c4b25e3bc382d878513564c65c5313b347bdbcfc60959d082b22" * "imageID": "docker-pullable:https://quay.io/jetstack/cert-manager-controller@sha256:66eb65ac8ff3505310e850e0486c4dd82b26e6cfdc7afd7456e0fab753b57855" * "containerID": "docker:https://e8d482336de6b610b9084e9e74b0213c5cc7d399053016448319a4970a16324e" * "containerID": "containerd:https://1ed452c3d02d8114e5739375e1d67405fa40c2f78797ec2d8e5bd13740424273" * "containerID": "cri-o:https://fc9f90e123ca14e4d000c9d4c854bb54d43ce96b1624d51bb4b97afd1c728229" Obviously value of containerID is dependent on container runtime used by kubernetes cluster. This patch tries to find relative positions of last ":" and "/" to the end of the string. Therefore only hash is extracted and everything preceding it, is discarded. Signed-off-by: kantica <[email protected]> * filter_kubernetes: fix container hash lookup (#1691) This commit reworks last patch for this issue and rebases to current master with patch from @edsiper for the same issue. Rework avoids double usage of memrchr() for same string for matching "/" from 5844b9f. Signed-off-by: kantica <[email protected]>

This fixes tests for container_hash kubernetes meta. Tests are for fixes made in #1691 and PR #1731. Signed-off-by: kantica <[email protected]>

edsiper · 2019-11-20T23:02:54Z

The new version is coming out today, packages are already out and docker images will take a few more hours to be ready, thanks for your patience

amkartashov · 2019-11-21T12:42:25Z

@edsiper

with this fix, container_hash contains hash only, so imageID: docker-pullable:https://bitnami/mongodb@sha256:a93f75fec367ea5393f46f6050dea9747053f694a2659da44a9e9e06cf2cd039 leads to container_hash:"a93f75fec367ea5393f46f6050dea9747053f694a2659da44a9e9e06cf2cd039". while it is expected to strip only docker-pullable:https://

I think memrchr should be replaced with memchr here https://github.com/fluent/fluent-bit/blob/master/plugins/filter_kubernetes/kube_meta.c#L325 (so to search for the leftmost colon)

EDIT: pls ignore, @kantica explained that this is desired change to only have hash.

edsiper · 2019-11-23T16:43:37Z

@kantica can you submit a new PR to fix the use case mentioned above ?

It would be good to include this as part of the next v1.3.4 release

kantica · 2019-11-25T08:35:55Z

Ok, I'll try to do it today.

@gorilych

This commit reworks hash lookup as reported by @gorilych in fluent#1691 (comment) Test cases are also updated accordingly. Signed-off-by: kantica <[email protected]>

@gorilych

This commit reworks hash lookup as reported by @gorilych in fluent#1691 (comment). Based on current master 8e071e3 Test cases are also updated accordingly. Signed-off-by: kantica <[email protected]>

@gorilych

This commit reworks hash lookup as reported by @gorilych in #1691 (comment). Based on current master 8e071e3 Test cases are also updated accordingly. Signed-off-by: kantica <[email protected]>

beckjim · 2020-02-03T07:35:59Z

As of my understanding, this is not yet in the 1.3 branch. Are there any plans to do this?
Thanks for working on this!

eddycharly · 2020-02-19T18:27:46Z

Any plan to release a version including a fix ?

edsiper · 2020-02-19T18:47:14Z

@kantica I tried to rebase on top of 1.3 but there are some conflicts. Is it possible to backport this?

kantica · 2020-02-19T19:14:46Z

Ok I'll rebase/patch this issue in 1.3 branch. I'll get back as soon as I have it there.

eddycharly · 2020-02-24T08:01:19Z

@edsiper is there a nightly build or something we can use, until a release including the fix is available ?

This is backport of master (51ac9f5) Signed-off-by: kantica <[email protected]>

kantica · 2020-03-19T09:33:57Z

Sorry for long wait, could not make it earlier.

This is backport of master (51ac9f5) Signed-off-by: kantica <[email protected]>

edsiper · 2020-03-19T13:40:29Z

thanks @kantica ! just in time for v1.3.11 release today :)

edsiper self-assigned this Nov 8, 2019

edsiper added bug work-in-process labels Nov 8, 2019

kantica mentioned this issue Nov 13, 2019

tests: runtime: filter_kubernetes: fix tests for container_hash #1735

Merged

edsiper pushed a commit that referenced this issue Nov 13, 2019

tests: runtime: filter_kubernetes: fix tests for container_hash

cc6b7bb

This fixes tests for container_hash kubernetes meta. Tests are for fixes made in #1691 and PR #1731. Signed-off-by: kantica <[email protected]>

kantica mentioned this issue Nov 13, 2019

[filter_nest] Value of key 'kubernetes' is not a map. Will not attempt to lift from here #1707

Closed

edsiper pushed a commit that referenced this issue Nov 20, 2019

tests: runtime: filter_kubernetes: fix tests for container_hash

4c64537

This fixes tests for container_hash kubernetes meta. Tests are for fixes made in #1691 and PR #1731. Signed-off-by: kantica <[email protected]>

kantica mentioned this issue Nov 21, 2019

filter_kubernetes: fetch image field #1764

Merged

kantica mentioned this issue Nov 26, 2019

filter_kubernetes: rework container hash lookup (fluent#1691) #1773

Merged

kantica added a commit to kantica/fluent-bit that referenced this issue Mar 19, 2020

filter_kubernetes: rework container hash lookup (fluent#1691)

b6cbd42

This is backport of master (51ac9f5) Signed-off-by: kantica <[email protected]>

kantica mentioned this issue Mar 19, 2020

filter_kubernetes: rework container hash lookup (fluent#1691) #2032

Merged

edsiper pushed a commit that referenced this issue Mar 19, 2020

filter_kubernetes: rework container hash lookup (#1691)

9b13465

This is backport of master (51ac9f5) Signed-off-by: kantica <[email protected]>

edsiper added fixed and removed work-in-process labels Mar 19, 2020

edsiper closed this as completed Mar 19, 2020

alitoufighi mentioned this issue Apr 6, 2020

[Kubernetes Filter] Api server connection gives name or service not known after initial OK #2085

Closed

ghost mentioned this issue Nov 15, 2021

kubernetes fields such as container_image and container_hash when digest are used as tags #3312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubernetes = null #1691

kubernetes = null #1691

mflococo commented Oct 29, 2019 •

edited

Loading

computeracer commented Oct 29, 2019

computeracer commented Oct 30, 2019 •

edited

Loading

computeracer commented Oct 31, 2019

jhuntwork commented Nov 6, 2019

gageorsburn commented Nov 6, 2019

kantica commented Nov 8, 2019

kantica commented Nov 8, 2019

edsiper commented Nov 8, 2019

computeracer commented Nov 8, 2019 •

edited

Loading

ivanfoo commented Nov 11, 2019

jhuntwork commented Nov 14, 2019

edsiper commented Nov 20, 2019

amkartashov commented Nov 21, 2019 •

edited

Loading

edsiper commented Nov 23, 2019

kantica commented Nov 25, 2019

beckjim commented Feb 3, 2020

eddycharly commented Feb 19, 2020

edsiper commented Feb 19, 2020

kantica commented Feb 19, 2020

eddycharly commented Feb 24, 2020

kantica commented Mar 19, 2020

edsiper commented Mar 19, 2020

kubernetes = null #1691

kubernetes = null #1691

Comments

mflococo commented Oct 29, 2019 • edited Loading

Bug Report

computeracer commented Oct 29, 2019

computeracer commented Oct 30, 2019 • edited Loading

computeracer commented Oct 31, 2019

jhuntwork commented Nov 6, 2019

gageorsburn commented Nov 6, 2019

kantica commented Nov 8, 2019

kantica commented Nov 8, 2019

edsiper commented Nov 8, 2019

computeracer commented Nov 8, 2019 • edited Loading

ivanfoo commented Nov 11, 2019

jhuntwork commented Nov 14, 2019

edsiper commented Nov 20, 2019

amkartashov commented Nov 21, 2019 • edited Loading

edsiper commented Nov 23, 2019

kantica commented Nov 25, 2019

beckjim commented Feb 3, 2020

eddycharly commented Feb 19, 2020

edsiper commented Feb 19, 2020

kantica commented Feb 19, 2020

eddycharly commented Feb 24, 2020

kantica commented Mar 19, 2020

edsiper commented Mar 19, 2020

mflococo commented Oct 29, 2019 •

edited

Loading

computeracer commented Oct 30, 2019 •

edited

Loading

computeracer commented Nov 8, 2019 •

edited

Loading

amkartashov commented Nov 21, 2019 •

edited

Loading