-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubernetes = null #1691
Comments
Here is another example when outputting fluentbit logs to file: k8s_containers.kube-system.aws-for-fluent-bit.fluentbit-7g649.c311243e64905c4f193c7bca600399fa8c7f8b48abfe15f4a41dfe11e196ea45-: [1572373624.454250, {"log":"2019-10-29T18:27:04.441400237Z stderr F [2019/10/29 18:27:04] [debug] [filter_kube] API Server (ns=kube-system, pod=kube-proxy-lvt7v) http_do=0, HTTP Status: 200","kubernetes":null}] |
Today we greatly simplified the config, and deployed (kubectl apply -f) it in a local kind cluster. We seem to consistently not get k8s annotations ("kubernetes": null) on the fluent-bit pod, but the others have all the data under fluent bit. Hope this may help get to the root of the issue.
|
This issue has been extremely difficult to consistently reproduce. Only sometimes does the request to the API seem to produce null annotations. And because fluent-bit caches the metadata for a pod/namespace combination, it can take a while to encounter it. Here's a method to eventually trigger the issue:
kind create cluster
export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"
diff --git a/plugins/filter_kubernetes/kube_meta.c b/plugins/filter_kubernetes/kube_meta.c
index 0d3df7e7..e2a59529 100644
--- a/plugins/filter_kubernetes/kube_meta.c
+++ b/plugins/filter_kubernetes/kube_meta.c
@@ -974,7 +974,7 @@ int flb_kube_meta_get(struct flb_kube *ctx,
ret = flb_hash_get(ctx->hash_table,
meta->cache_key, meta->cache_key_len,
&hash_meta_buf, &hash_meta_size);
- if (ret == -1) {
+ if (-1 == -1) {
/* Retrieve API server meta and merge with local meta */
ret = get_and_merge_meta(ctx, meta,
&tmp_hash_meta_buf, &hash_meta_size);
docker build -t fluent-bit:test .
kind load docker-image fluent-bit:test
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: pod-log-reader
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: pod-log-crb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: pod-log-reader
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentbit
namespace: kube-system
labels:
app.kubernetes.io/name: fluentbit
spec:
selector:
matchLabels:
name: fluentbit
template:
metadata:
labels:
name: fluentbit
spec:
serviceAccountName: fluent-bit
containers:
- name: fluent-bit
image: fluent-bit:test
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: kube-system
labels:
app.kubernetes.io/name: fluentbit
data:
fluent-bit.conf: |
[SERVICE]
Parsers_File parsers.conf
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Refresh_Interval 2
[FILTER]
Name kubernetes
Kube_Tag_Prefix kube.var.log.containers.
Match kube.*
Merge_Log On
[OUTPUT]
Alias kube-system
Name file
Match kube.*
Path /var/log/fluent-bit-output.log
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# First grab the container id of the control plane
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cc590a19909f kindest/node:v1.15.3 "/usr/local/bin/entr…" 15 minutes ago Up 15 minutes 64283/tcp, 127.0.0.1:64283->6443/tcp kind-control-plane
# Then docker exec
$ docker exec -it cc590a19909f /bin/bash
# Inside the control-plane container, tail the log
tail -f /var/log/fluent-bit-output.log | grep -iE 'null|filter'
export KUBECONFIG="$(kind get kubeconfig-path --name="kind")"
while true; do \
kubectl run -it --rm --generator=run-pod/v1 --image=alpine alpine -- /bin/sh -c 'for i in 1 2 3 4 5; do date; sleep 1; done'; \
sleep 1; \
done Eventually, you will see logs from the output being tailed that look like this: kube.var.log.containers.alpine_default_alpine-c40b7545867c821b3dfe0b398983e636132c44a98ab9ea8463d83b76206d77cd.log: [1573048310.008641, {"log":"2019-11-06T13:51:49.0348335Z stdout F Wed Nov 6 13:51:49 UTC 2019","kubernetes":null}]
kube.var.log.containers.alpine_default_alpine-c40b7545867c821b3dfe0b398983e636132c44a98ab9ea8463d83b76206d77cd.log: [1573048310.036044, {"log":"2019-11-06T13:51:50.0358045Z stdout F Wed Nov 6 13:51:50 UTC 2019","kubernetes":null}] |
I seem to be hitting this as well. Not sure what causes it. |
I have found what is causing this, and its inside kube_meta.c: extract_container_hash() Baiscally part of extract_container_hash() which looks at "containerID" and "imageID" does not work correctly on for example kind cluster. Therefore there is corruption of kubernetes messagepack metadata map. What is going on?
CONTAINER_ID_PREFIX is defined as "docker:https://" Looking at (kind) cluster metadata from kube-api server, following can be observerd:
Clearly expected IMAGE_ID_PREFIX is not "docker-pullable:https://" and expected CONTAINER_ID_PREFIX is "containerd:https://" insted of defined "docker:https://". This is relevant code part inside kube_meta.c is https://github.com/fluent/fluent-bit/blob/master/plugins/filter_kubernetes/kube_meta.c#L385-L400 If you comment that two else if blocks the issue should be gone. |
Here is quick-and-dirty patch which should probably work so that metadata msgpack map is not corrupted :)
|
thanks everyone! what a team work :) , based on the feedback and the solution provided by @kantica I have pushed the followng test image:
please give it a try, if it looks good we can do a release TODAY. please test it! |
Tested with edsiper/fluent-bit-1.3.3-next:3. Could not get a null. Well done. I deleted and applied my deployment referenced earlier several times with the new image and could not reproduce a null in the logs of my kind cluster. Thank you! |
The following patch fix the container hash lookup, there are certain deployments where containers prefix are not docker:https:// or docker-pullable:https:// like when 'kind' is used. The fix works as failover mechanism skipping the prefix and consuming the string after the ':https://' separator. This fix is based in the solution provided by @kantica. note: kudos to all people involved on this issue that helper to troubleshoot, find the root cause of the problem and provide workarounds. Signed-off-by: Eduardo Silva <[email protected]>
We are testing the patch too, it seems we could not get any null (so far) |
This fixes containerID and imageID hash lookup. There was error in processing/extracting hash from containerID or imageID if strings do not begin with docker:https:// or docker-pullable:https:// Implication of this would be that kubernetes metadata structure is "null" at output. In some cases imageID comes also as empty string from kubernetes api. Example cases: * "imageID": "" * "imageID": "sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c" * "imageID": "docker:https://sha256:2b424891d78a4704aa9d763b274f1be6752766d24463b9e3b5411a7bac2207ca" * "imageID": "docker.io/library/debian@sha256:41f76363fd83982e14f7644486e1fb04812b3894aa4e396137c3435eaf05de88" * "imageID": "docker-pullable:https://redis@sha256:4be7fdb131e76a6c6231e820c60b8b12938cf1ff3d437da4871b9b2440f4e385" * "imageID": "docker-pullable:https://k8s.gcr.io/event-exporter@sha256:16ca66e2b5dc7a1ce6a5aafcb21d0885828b75cdfc08135430480f7ad2364adc" * "imageID": "docker-pullable:https://kubedb/operator@sha256:063a92a47de7c4b25e3bc382d878513564c65c5313b347bdbcfc60959d082b22" * "imageID": "docker-pullable:https://quay.io/jetstack/cert-manager-controller@sha256:66eb65ac8ff3505310e850e0486c4dd82b26e6cfdc7afd7456e0fab753b57855" * "containerID": "docker:https://e8d482336de6b610b9084e9e74b0213c5cc7d399053016448319a4970a16324e" * "containerID": "containerd:https://1ed452c3d02d8114e5739375e1d67405fa40c2f78797ec2d8e5bd13740424273" * "containerID": "cri-o:https://fc9f90e123ca14e4d000c9d4c854bb54d43ce96b1624d51bb4b97afd1c728229" Obviously value of containerID is dependent on container runtime used by kubernetes cluster. This patch tries to find relative positions of last ":" and "/" to the end of the string. Therefore only hash is extracted and everything preceding it, is discarded. Signed-off-by: kantica <[email protected]>
The following patch fix the container hash lookup, there are certain deployments where containers prefix are not docker:https:// or docker-pullable:https:// like when 'kind' is used. The fix works as failover mechanism skipping the prefix and consuming the string after the ':https://' separator. This fix is based in the solution provided by @kantica. note: kudos to all people involved on this issue that helper to troubleshoot, find the root cause of the problem and provide workarounds. Signed-off-by: Eduardo Silva <[email protected]>
This fixes containerID and imageID hash lookup. There was error in processing/extracting hash from containerID or imageID if strings do not begin with docker:https:// or docker-pullable:https:// Implication of this would be that kubernetes metadata structure is "null" at output. In some cases imageID comes also as empty string from kubernetes api. Example cases: * "imageID": "" * "imageID": "sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c" * "imageID": "docker:https://sha256:2b424891d78a4704aa9d763b274f1be6752766d24463b9e3b5411a7bac2207ca" * "imageID": "docker.io/library/debian@sha256:41f76363fd83982e14f7644486e1fb04812b3894aa4e396137c3435eaf05de88" * "imageID": "docker-pullable:https://redis@sha256:4be7fdb131e76a6c6231e820c60b8b12938cf1ff3d437da4871b9b2440f4e385" * "imageID": "docker-pullable:https://k8s.gcr.io/event-exporter@sha256:16ca66e2b5dc7a1ce6a5aafcb21d0885828b75cdfc08135430480f7ad2364adc" * "imageID": "docker-pullable:https://kubedb/operator@sha256:063a92a47de7c4b25e3bc382d878513564c65c5313b347bdbcfc60959d082b22" * "imageID": "docker-pullable:https://quay.io/jetstack/cert-manager-controller@sha256:66eb65ac8ff3505310e850e0486c4dd82b26e6cfdc7afd7456e0fab753b57855" * "containerID": "docker:https://e8d482336de6b610b9084e9e74b0213c5cc7d399053016448319a4970a16324e" * "containerID": "containerd:https://1ed452c3d02d8114e5739375e1d67405fa40c2f78797ec2d8e5bd13740424273" * "containerID": "cri-o:https://fc9f90e123ca14e4d000c9d4c854bb54d43ce96b1624d51bb4b97afd1c728229" Obviously value of containerID is dependent on container runtime used by kubernetes cluster. This patch tries to find relative positions of last ":" and "/" to the end of the string. Therefore only hash is extracted and everything preceding it, is discarded. Signed-off-by: kantica <[email protected]>
This commit reworks last patch for this issue and rebases to current master with patch from @edsiper for the same issue. Rework avoids double usage of memrchr() for same string for matching "/" from 5844b9f. Signed-off-by: kantica <[email protected]>
* filter_kubernetes: fix container hash lookup (#1691) This fixes containerID and imageID hash lookup. There was error in processing/extracting hash from containerID or imageID if strings do not begin with docker:https:// or docker-pullable:https:// Implication of this would be that kubernetes metadata structure is "null" at output. In some cases imageID comes also as empty string from kubernetes api. Example cases: * "imageID": "" * "imageID": "sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c" * "imageID": "docker:https://sha256:2b424891d78a4704aa9d763b274f1be6752766d24463b9e3b5411a7bac2207ca" * "imageID": "docker.io/library/debian@sha256:41f76363fd83982e14f7644486e1fb04812b3894aa4e396137c3435eaf05de88" * "imageID": "docker-pullable:https://redis@sha256:4be7fdb131e76a6c6231e820c60b8b12938cf1ff3d437da4871b9b2440f4e385" * "imageID": "docker-pullable:https://k8s.gcr.io/event-exporter@sha256:16ca66e2b5dc7a1ce6a5aafcb21d0885828b75cdfc08135430480f7ad2364adc" * "imageID": "docker-pullable:https://kubedb/operator@sha256:063a92a47de7c4b25e3bc382d878513564c65c5313b347bdbcfc60959d082b22" * "imageID": "docker-pullable:https://quay.io/jetstack/cert-manager-controller@sha256:66eb65ac8ff3505310e850e0486c4dd82b26e6cfdc7afd7456e0fab753b57855" * "containerID": "docker:https://e8d482336de6b610b9084e9e74b0213c5cc7d399053016448319a4970a16324e" * "containerID": "containerd:https://1ed452c3d02d8114e5739375e1d67405fa40c2f78797ec2d8e5bd13740424273" * "containerID": "cri-o:https://fc9f90e123ca14e4d000c9d4c854bb54d43ce96b1624d51bb4b97afd1c728229" Obviously value of containerID is dependent on container runtime used by kubernetes cluster. This patch tries to find relative positions of last ":" and "/" to the end of the string. Therefore only hash is extracted and everything preceding it, is discarded. Signed-off-by: kantica <[email protected]> * filter_kubernetes: fix container hash lookup (#1691) This commit reworks last patch for this issue and rebases to current master with patch from @edsiper for the same issue. Rework avoids double usage of memrchr() for same string for matching "/" from 5844b9f. Signed-off-by: kantica <[email protected]>
This fixes tests for container_hash kubernetes meta. Tests are for fixes made in fluent#1691 and PR fluent#1731. Signed-off-by: kantica <[email protected]>
This fixes tests for container_hash kubernetes meta. Tests are for fixes made in #1691 and PR #1731. Signed-off-by: kantica <[email protected]>
The following patch fix the container hash lookup, there are certain deployments where containers prefix are not docker:https:// or docker-pullable:https:// like when 'kind' is used. The fix works as failover mechanism skipping the prefix and consuming the string after the ':https://' separator. This fix is based in the solution provided by @kantica. note: kudos to all people involved on this issue that helper to troubleshoot, find the root cause of the problem and provide workarounds. Signed-off-by: Eduardo Silva <[email protected]>
* filter_kubernetes: fix container hash lookup (#1691) This fixes containerID and imageID hash lookup. There was error in processing/extracting hash from containerID or imageID if strings do not begin with docker:https:// or docker-pullable:https:// Implication of this would be that kubernetes metadata structure is "null" at output. In some cases imageID comes also as empty string from kubernetes api. Example cases: * "imageID": "" * "imageID": "sha256:eb516548c180f8a6e0235034ccee2428027896af16a509786da13022fe95fe8c" * "imageID": "docker:https://sha256:2b424891d78a4704aa9d763b274f1be6752766d24463b9e3b5411a7bac2207ca" * "imageID": "docker.io/library/debian@sha256:41f76363fd83982e14f7644486e1fb04812b3894aa4e396137c3435eaf05de88" * "imageID": "docker-pullable:https://redis@sha256:4be7fdb131e76a6c6231e820c60b8b12938cf1ff3d437da4871b9b2440f4e385" * "imageID": "docker-pullable:https://k8s.gcr.io/event-exporter@sha256:16ca66e2b5dc7a1ce6a5aafcb21d0885828b75cdfc08135430480f7ad2364adc" * "imageID": "docker-pullable:https://kubedb/operator@sha256:063a92a47de7c4b25e3bc382d878513564c65c5313b347bdbcfc60959d082b22" * "imageID": "docker-pullable:https://quay.io/jetstack/cert-manager-controller@sha256:66eb65ac8ff3505310e850e0486c4dd82b26e6cfdc7afd7456e0fab753b57855" * "containerID": "docker:https://e8d482336de6b610b9084e9e74b0213c5cc7d399053016448319a4970a16324e" * "containerID": "containerd:https://1ed452c3d02d8114e5739375e1d67405fa40c2f78797ec2d8e5bd13740424273" * "containerID": "cri-o:https://fc9f90e123ca14e4d000c9d4c854bb54d43ce96b1624d51bb4b97afd1c728229" Obviously value of containerID is dependent on container runtime used by kubernetes cluster. This patch tries to find relative positions of last ":" and "/" to the end of the string. Therefore only hash is extracted and everything preceding it, is discarded. Signed-off-by: kantica <[email protected]> * filter_kubernetes: fix container hash lookup (#1691) This commit reworks last patch for this issue and rebases to current master with patch from @edsiper for the same issue. Rework avoids double usage of memrchr() for same string for matching "/" from 5844b9f. Signed-off-by: kantica <[email protected]>
This fixes tests for container_hash kubernetes meta. Tests are for fixes made in #1691 and PR #1731. Signed-off-by: kantica <[email protected]>
The new version is coming out today, packages are already out and docker images will take a few more hours to be ready, thanks for your patience |
with this fix, I think EDIT: pls ignore, @kantica explained that this is desired change to only have hash. |
@kantica can you submit a new PR to fix the use case mentioned above ? It would be good to include this as part of the next v1.3.4 release |
Ok, I'll try to do it today. |
This commit reworks hash lookup as reported by @gorilych in fluent#1691 (comment) Test cases are also updated accordingly. Signed-off-by: kantica <[email protected]>
This commit reworks hash lookup as reported by @gorilych in fluent#1691 (comment). Based on current master 8e071e3 Test cases are also updated accordingly. Signed-off-by: kantica <[email protected]>
This commit reworks hash lookup as reported by @gorilych in #1691 (comment). Based on current master 8e071e3 Test cases are also updated accordingly. Signed-off-by: kantica <[email protected]>
As of my understanding, this is not yet in the 1.3 branch. Are there any plans to do this? |
Any plan to release a version including a fix ? |
@kantica I tried to rebase on top of 1.3 but there are some conflicts. Is it possible to backport this? |
Ok I'll rebase/patch this issue in 1.3 branch. I'll get back as soon as I have it there. |
@edsiper is there a nightly build or something we can use, until a release including the fix is available ? |
This is backport of master (51ac9f5) Signed-off-by: kantica <[email protected]>
Sorry for long wait, could not make it earlier. |
This is backport of master (51ac9f5) Signed-off-by: kantica <[email protected]>
thanks @kantica ! just in time for v1.3.11 release today :) |
Bug Report
Describe the bug
Intermittently across multiple instances of our aws eks clusters logs are arriving with the kubernetes enrichment and annotations missing, the entire key is null.
We have tried restarting the fluent-bit pods, deleting the daemonset, creating new clusters and sometimes it works (arrives with k8s enrichment and annotations) others it does not.
To Reproduce
we haven't been able to pinpoint the issue or consistently reproduce it.
Expected behavior
When we define the input, filter and parser to extract kubernetes meta data and annotations from our logs and api it is added to our log file.
Screenshots
Your Environment
Additional context
software development is severely impacted by not being able to trace where in the system things are failing. It also affects analytics as values we defined in our queries are missing.
The text was updated successfully, but these errors were encountered: