Couchdb pods perpetually crashing under OpenShift #13

blsaws · 2019-11-15T17:47:50Z

Describe the bug
Couchdb pods are continuously crashing under OpenShift.

Version of Helm and Kubernetes:
Helm
$ helm version
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}

OpenShift
$ oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://127.0.0.1:8443
kubernetes v1.11.0+d4cacc0

What happened:
Deployed the couchdb helm chart, and the pods are continually crashing.
Deployment commands:
helm repo add couchdb https://apache.github.io/couchdb-helm
helm install --name acumos-couchdb --namespace acumos
--set service.type=NodePort --set allowAdminParty=true couchdb/couchdb

What you expected to happen:
Couchdb pods should become ready. This happens as expected under generic kubernetes.

How to reproduce it (as minimally and precisely as possible):

Install OpenShift Origin 3.11
Setup other cluster/namespace prerequisites, e.g. create the namespace as used in the example above.
Install the CouchDB helm chart, as above

Anything else we need to know:

willholley · 2019-11-18T07:58:18Z

I don't think this chart has been tested under OpenShift; it's difficult to speculate on the cause of the problem without more detail from the pod logs.

That said, I'd recommend using the CouchDB Operator instead of the Helm chart for OpenShift / OKD deployments.

blsaws · 2019-11-21T01:23:31Z

Here are the logs from the init-copy containers (they are crashing), and output of describe pods:
couchdb-openshift-crash.txt

My goal is where possible to use a consistent set of upstream tools to deploy supplemental components (e.g. mariadb, nexus, ELK, jupyterhub, NiFi, Jenkins, ...). This reduces the maintenance effort and UX variations across k8s envs. But I will take a look at the Operator. In the meantime if you have any suggestions on the reason for the crash I would appreciate it, since the logs really don't tell me anything.

willholley · 2019-11-21T10:57:14Z

@blsaws those logs look to be from the init-copy container which succeeded. Can you get the logs from the couchdb container: oc logs acumos-couchdb-couchdb-0 -c couchdb?

blsaws · 2019-11-25T03:27:08Z

Nothing is returned from the logs:
root@77f48ec29783:/# oc logs acumos-couchdb-couchdb-0 -c couchdb
root@77f48ec29783:/#

willholley · 2019-11-25T08:29:29Z

@blsaws you might need to use the --previous flag to get the logs of the crashed container. See https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/#my-pod-is-crashing-or-otherwise-unhealthy. At the moment I don't have enough information to provide any guidance as to why it might be failing I'm afraid.

alwinmark · 2020-01-30T12:13:43Z

No its just silently failing exiting 1 as well on Rancher with PSPs enabled.
Guess this Chart or the default Container does not work well without certain privileges or rights.

  - containerID: docker:https://41e114505ff6963276d07ae001be4cb4794e1b79532930c1aec8b51107304263
    image: couchdb:2.3.1
    imageID: docker-pullable:https://couchdb@sha256:da2d31cc06455d6fc12767c4947c6b58e97e8cda419ecbe054cc89ab48420afa
    lastState:
      terminated:
        containerID: docker:https://41e114505ff6963276d07ae001be4cb4794e1b79532930c1aec8b51107304263
        exitCode: 1
        finishedAt: 2020-01-30T12:09:42Z
        reason: Error
        startedAt: 2020-01-30T12:09:41Z
    name: couchdb
    ready: false
    restartCount: 2
    started: false
    state:
      waiting:
        message: back-off 20s restarting failed container=couchdb pod=couchdb-tischi-test-couchdb-0_connect(7af5e9ca-38b1-493b-9170-5a58da8c4b5c)
        reason: CrashLoopBackOff
  hostIP: 172.21.1.113
  initContainerStatuses:
  - containerID: docker:https://3be2b192ab8e92628082527f39aa7db417708c55fac2cb0cdf1823078a0e0988
    image: busybox:latest
    imageID: docker-pullable:https://busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
    lastState: {}
    name: init-copy
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker:https://3be2b192ab8e92628082527f39aa7db417708c55fac2cb0cdf1823078a0e0988
        exitCode: 0
        finishedAt: 2020-01-30T12:09:29Z
        reason: Completed
        startedAt: 2020-01-30T12:09:29Z

Logs are empty even with --previous.

In order to reproduce, run K8s cluster with follwoing PSP:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  labels:
  name: restricted-psp
spec:
  allowPrivilegeEscalation: false
  fsGroup:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  requiredDropCapabilities:
  - ALL
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  volumes:
  - configMap
  - emptyDir
  - projected
  - secret
  - downwardAPI
  - persistentVolumeClaim

as it is default by Rancher and similar to OKD when enabling PSPs/SecurityContextClasses

bondar-pavel · 2020-06-04T11:14:02Z

Looks like I have same issue, pods can not be created because of psp:

$ sudo kubectl describe statefulset -n couchdb
...
Volume Claims:  <none>
Events:
  Type     Reason        Age                   From                    Message
  ----     ------        ----                  ----                    -------
  Warning  FailedCreate  8m23s (x19 over 30m)  statefulset-controller  create Pod vociferous-garfish-couchdb-0 in StatefulSet vociferous-garfish-couchdb failed error: pods "vociferous-garfish-couchdb-0" is forbidden: unable to validate against any pod security policy: []

Some environments enforce PodSecurityPolicy checks and deployment fails if objects PodSecurityPolicy, ClusterRole and ClusterRoleBinding are not declared. This commit adds PodSecurityPolicy, ClusterRole and ClusterRoleBinding objects and adds new configuration option podSecurityPolicy, which is disabled by default. Related to apache#13

bondar-pavel · 2020-06-04T13:43:10Z

PR #30 resolves my issues with pod security policies:

create Pod vociferous-garfish-couchdb-0 in StatefulSet vociferous-garfish-couchdb failed error: pods "vociferous-garfish-couchdb-0" is forbidden: unable to validate against any pod security policy: []

@blsaws Could you please check if it resolves your issue as well?

bondar-pavel · 2020-06-09T14:48:09Z

Looks like my issue is different from the original one, since in my case pods were not even created because they did not satisfy policies on the cluster.

bondar-pavel mentioned this issue Jun 4, 2020

Add PodSecurityPolicy #30

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Couchdb pods perpetually crashing under OpenShift #13

Couchdb pods perpetually crashing under OpenShift #13

blsaws commented Nov 15, 2019

willholley commented Nov 18, 2019

blsaws commented Nov 21, 2019

willholley commented Nov 21, 2019

blsaws commented Nov 25, 2019

willholley commented Nov 25, 2019

alwinmark commented Jan 30, 2020

bondar-pavel commented Jun 4, 2020

bondar-pavel commented Jun 4, 2020

bondar-pavel commented Jun 9, 2020

Couchdb pods perpetually crashing under OpenShift #13

Couchdb pods perpetually crashing under OpenShift #13

Comments

blsaws commented Nov 15, 2019

willholley commented Nov 18, 2019

blsaws commented Nov 21, 2019

willholley commented Nov 21, 2019

blsaws commented Nov 25, 2019

willholley commented Nov 25, 2019

alwinmark commented Jan 30, 2020

bondar-pavel commented Jun 4, 2020

bondar-pavel commented Jun 4, 2020

bondar-pavel commented Jun 9, 2020