Minikube Unrecoverable Failure Accural - potential State Issue? #1136

jbkc85 · 2017-03-10T18:01:13Z

While discussing Linkerd in Slack with @olix0r, I noticed an issue of what he mentioned was Failure Accural occurring - however after multiple restarts of the service, Linkerd never recovered the specific service in Failure.

This actually also occurs on the chance that a service/deployment is scheduled that is later deleted/rescheduled on a 'broken' port, but the container itself still exposes a working port. Linkerd will continue to associate the working port, even if the service for Kubernetes is specifically pointing elsewhere.

Maybe related to #1114? Seems similar in nature...

Environment

$ minikube version
minikube version: v0.16.0
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.0", GitCommit:"a16c0a7f71a6f93c7e0f222d961f4675cd97a46b", GitTreeState:"clean", BuildDate:"2016-09-26T18:16:57Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"1970-01-01T00:00:00Z", GoVersion:"go1.7.1", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes YAML files

ConfigMap for Linkerd Configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-linkerd-config
data:
  config.yaml: |-
    admin:
      port: 9990

    namers:
    - kind: io.l5d.k8s
      experimental: true
      host: 127.0.0.1
      port: 8001

    routers:
    - protocol: http
      servers:
      - port: 8080
        ip: 0.0.0.0
      dtab: |
        /iface      => /#/io.l5d.k8s/default;
        /svc        => /iface/http;

Linkerd Deployment:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: linkerd-proxy-controller
  labels:
    k8s-app: linkerd-proxy-lb
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: linkerd-proxy-lb
  template:
    metadata:
      labels:
        k8s-app: linkerd-proxy-lb
        name: linkerd-proxy-lb
        proxy: v0.9.0
    spec:
      terminationGracePeriodSeconds: 60
      dnsPolicy: ClusterFirst
      volumes:
      - name: linkerd-config
        configMap:
          name: "test-linkerd-config"
      containers:
      - name: linkerd
        image: buoyantio/linkerd:latest
        args:
        - "/io.buoyant/linkerd/config/config.yaml"
        - "-log.level=DEBUG"
        ports:
        - name: ext
          containerPort: 8080
          hostPort: 9980
        - name: admin
          containerPort: 9990
          hostPort: 9990
        volumeMounts:
        - name: "linkerd-config"
          mountPath: "/io.buoyant/linkerd/config"
          readOnly: true

      - name: kubectl
        image: buoyantio/kubectl:1.2.3
        args:
        - "proxy"
        - "-p"
        - "8001"

working service/deployment:

apiVersion: v1
kind: Service
metadata:
  name: goapp-svc
  labels:
    app: goapp
spec:
  selector:
    app: goapp
  ports:
    - name: http
      port: 80
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: goapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: goapp
  template:
    metadata:
      labels:
        name: goapp
        app: goapp
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: goapp
        image: kelseyhightower/app-healthz:1.0.0
        ports:
        - name: http
          containerPort: 80

broken service/deployment:

apiVersion: v1
kind: Service
metadata:
  name: goapp-svc
  labels:
    app: goapp
spec:
  selector:
    app: goapp
  ports:
    - name: http
      port: 81
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: goapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: goapp
  template:
    metadata:
      labels:
        name: goapp
        app: goapp
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: goapp
        image: kelseyhightower/app-healthz:1.0.0
        ports:
        - name: http
          containerPort: 81

Reproducing Issue

The directions below show an example of producing an issue from deploying a working service, testing it, and then deleting and deploying a non-working service.

$ kubectl create -f test-configmap.yaml
configmap "test-linkerd-config" created
$ kubectl create -f test-deployment.yaml
deployment "linkerd-proxy-controller" created
$ kubectl get svc
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   10.0.0.1     <none>        443/TCP   14h
$ kubectl get pods
NAME                                        READY     STATUS    RESTARTS   AGE
linkerd-proxy-controller-1173476374-r44fh   2/2       Running   0          9s
$ curl 192.168.99.100:9990/admin/ping
pong
$ kubectl create -f test-works.yaml
service "goapp-svc" created
deployment "goapp" created
$ kubectl get svc
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
goapp-svc    10.0.0.188   <none>        80/TCP    11s
kubernetes   10.0.0.1     <none>        443/TCP   15h
$ curl -si -H "Host: goapp-svc" 192.168.99.100:9980 | head -n1
HTTP/1.1 200 OK
$ kubectl delete -f test-works.yaml
service "goapp-svc" deleted
deployment "goapp" deleted
$ kubectl create -f test-fails.yaml
service "goapp-svc" created
deployment "goapp" created
$ kubectl get svc
NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
goapp-svc    10.0.0.123   <none>        81/TCP    6s
kubernetes   10.0.0.1     <none>        443/TCP   15h
$ curl -si -H "Host: goapp-svc" 192.168.99.100:9980 | head -n 1
HTTP/1.1 200 OK
$ kubectl delete -f test-deployment.yaml
deployment "linkerd-proxy-controller" deleted
$ kubectl create -f test-deployment.yaml
deployment "linkerd-proxy-controller" created
$ curl -si -H "Host: goapp-svc" 192.168.99.100:9980 | head -n 1
HTTP/1.1 502 Bad Gateway

Logs from Linkerd:

2017-03-10T17:52:03.895929402Z D 0310 17:52:03.894 UTC THREAD19: k8s ns default initial state: goapp-svc, kubernetes
2017-03-10T17:52:03.896535245Z D 0310 17:52:03.895 UTC THREAD19: k8s ns default service goapp-svc found
2017-03-10T17:52:03.897200021Z D 0310 17:52:03.895 UTC THREAD19: k8s ns default service goapp-svc port http found + /
2017-03-10T17:56:59.497863558Z D 0310 17:56:59.488 UTC THREAD19: k8s ns default deleted: goapp-svc
2017-03-10T17:56:59.510000788Z D 0310 17:56:59.489 UTC THREAD19: k8s ns default initial state: kubernetes
2017-03-10T17:56:59.524061588Z D 0310 17:56:59.502 UTC THREAD19: k8s ns default service goapp-svc missing
2017-03-10T17:57:25.472192860Z D 0310 17:57:25.471 UTC THREAD19: k8s ns default added: goapp-svc
2017-03-10T17:57:25.474783741Z D 0310 17:57:25.472 UTC THREAD19: k8s ns default initial state: kubernetes, goapp-svc
2017-03-10T17:57:25.475877991Z D 0310 17:57:25.474 UTC THREAD19: k8s ns default service goapp-svc found
2017-03-10T17:57:25.476743333Z D 0310 17:57:25.476 UTC THREAD19: k8s ns default service goapp-svc port http missing
2017-03-10T17:57:25.571866348Z D 0310 17:57:25.569 UTC THREAD19: k8s ns default modified: goapp-svc
2017-03-10T17:57:26.265959668Z D 0310 17:57:26.265 UTC THREAD19: k8s ns default modified: goapp-svc
2017-03-10T17:57:26.267886963Z D 0310 17:57:26.267 UTC THREAD19: k8s ns default service goapp-svc port http found + /

The text was updated successfully, but these errors were encountered:

jbkc85 · 2017-03-10T18:11:55Z

updated configMap, forgot I changed it to 'test-linkerd-config' to avoid interrupting my work on Linkerd

siggy · 2017-04-13T00:56:49Z

i am seeing a similar issue in minikube. not sure if related, but the steps to reproduce are simpler.

full output of k8s endpoints api and linkerd debug log are at:
https://gist.github.com/siggy/19c049a62bc8e9b65cac041c2921346b

steps to repro

Deploy linkerd and app

kubectl apply -f https://raw.githubusercontent.com/linkerd/linkerd-examples/master/k8s-daemonset/k8s/linkerd.yml
kubectl apply -f https://raw.githubusercontent.com/linkerd/linkerd-examples/master/k8s-daemonset/k8s/hello-world-legacy.yml

Verify routing works

OUTGOING_PORT=$(kubectl get svc l5d -o jsonpath='{.spec.ports[?(@.name=="outgoing")].nodePort}')
L5D_INGRESS_LB=http:https://$(minikube ip):$OUTGOING_PORT
http_proxy=$L5D_INGRESS_LB curl -s http:https://world
world (172.17.0.9)!

Redeploy app

kubectl delete -f https://raw.githubusercontent.com/linkerd/linkerd-examples/master/k8s-daemonset/k8s/hello-world-legacy.yml
kubectl apply -f https://raw.githubusercontent.com/linkerd/linkerd-examples/master/k8s-daemonset/k8s/hello-world-legacy.yml

Observe routing failure

$ http_proxy=$L5D_INGRESS_LB curl -s http:https://world
No hosts are available for /svc/world, Dtab.base=[/srv=>/#/io.l5d.k8s/default/http;/host=>/srv;/svc=>/host;/host/world=>/srv/world-v1], Dtab.local=[]. Remote Info: Not Available

Observe delegator api returns healthy endpoints

$ ADMIN_PORT=$(kubectl get svc l5d -o jsonpath='{.spec.ports[?(@.name=="admin")].nodePort}')
$ curl -H "Content-Type: application/json" -X POST -d '{"namespace":"incoming","dtab":"/srv=>/#/io.l5d.k8s/default/http;/host=>/srv;/svc=>/host;/host/world=>/srv/world-v1","path":"/svc/world"}' http:https://$(minikube ip):$ADMIN_PORT/delegator.json

{"type":"delegate","path":"/svc/world","delegate":{"type":"alt","path":"/host/world","dentry":{"prefix":"/svc","dst":"/host"},"alt":[{"type":"delegate","path":"/srv/world-v1","dentry":{"prefix":"/host/world","dst":"/srv/world-v1"},"delegate":{"type":"transformation","path":"/#/io.l5d.k8s/default/http/world-v1","name":"SubnetLocalTransformer","bound":{"addr":{"type":"bound","addrs":[{"ip":"172.17.0.13","port":7778,"meta":{"nodeName":"minikube"}},{"ip":"172.17.0.11","port":7778,"meta":{"nodeName":"minikube"}},{"ip":"172.17.0.12","port":7778,"meta":{"nodeName":"minikube"}}],"meta":{}},"id":"/#/io.l5d.k8s/default/http/world-v1","path":"/"},"tree":{"type":"leaf","path":"/%/io.l5d.k8s.localnode/172.17.0.3/#/io.l5d.k8s/default/http/world-v1","dentry":{"prefix":"/srv","dst":"/#/io.l5d.k8s/default/http"},"bound":{"addr":{"type":"bound","addrs":[{"ip":"172.17.0.13","port":7778,"meta":{"nodeName":"minikube"}},{"ip":"172.17.0.11","port":7778,"meta":{"nodeName":"minikube"}},{"ip":"172.17.0.12","port":7778,"meta":{"nodeName":"minikube"}}],"meta":{}},"id":"/%/io.l5d.k8s.localnode/172.17.0.3/#/io.l5d.k8s/default/http/world-v1","path":"/"}}}},{"type":"delegate","path":"/srv/world","dentry":{"prefix":"/host","dst":"/srv"},"delegate":{"type":"neg","path":"/#/io.l5d.k8s/default/http/world","dentry":{"prefix":"/srv","dst":"/#/io.l5d.k8s/default/http"}}}]}}

Observe successful curl to world service from inside l5d container

$ kubectl exec -it l5d-r6fv5 -c l5d curl 172.17.0.11:7778
world (172.17.0.11)!

olix0r · 2017-04-13T14:33:34Z

Can you share linkerd's metrics? (:9990/admin/metrics.json?pretty=1)

siggy · 2017-04-13T17:02:30Z

updated gist with metrics.json https://gist.github.com/siggy/19c049a62bc8e9b65cac041c2921346b

adleong added this to Low Priority (big) in Linkerd 1.x Backlog. See https://github.com/linkerd/linkerd2/blob/main/ROADMAP.md May 5, 2017

hawkw added the kubernetes label Aug 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minikube Unrecoverable Failure Accural - potential State Issue? #1136

Minikube Unrecoverable Failure Accural - potential State Issue? #1136

jbkc85 commented Mar 10, 2017 •

edited

Loading

jbkc85 commented Mar 10, 2017

siggy commented Apr 13, 2017 •

edited

Loading

olix0r commented Apr 13, 2017

siggy commented Apr 13, 2017

Minikube Unrecoverable Failure Accural - potential State Issue? #1136

Minikube Unrecoverable Failure Accural - potential State Issue? #1136

Comments

jbkc85 commented Mar 10, 2017 • edited Loading

Environment

Kubernetes YAML files

Reproducing Issue

jbkc85 commented Mar 10, 2017

siggy commented Apr 13, 2017 • edited Loading

steps to repro

Deploy linkerd and app

Verify routing works

Redeploy app

Observe routing failure

Observe delegator api returns healthy endpoints

Observe successful curl to world service from inside l5d container

olix0r commented Apr 13, 2017

siggy commented Apr 13, 2017

jbkc85 commented Mar 10, 2017 •

edited

Loading

siggy commented Apr 13, 2017 •

edited

Loading