Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couchdb pods perpetually crashing under OpenShift #13

Open
blsaws opened this issue Nov 15, 2019 · 9 comments
Open

Couchdb pods perpetually crashing under OpenShift #13

blsaws opened this issue Nov 15, 2019 · 9 comments

Comments

@blsaws
Copy link

blsaws commented Nov 15, 2019

Describe the bug
Couchdb pods are continuously crashing under OpenShift.

Version of Helm and Kubernetes:
Helm
$ helm version
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}

OpenShift
$ oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://127.0.0.1:8443
kubernetes v1.11.0+d4cacc0

What happened:
Deployed the couchdb helm chart, and the pods are continually crashing.
Deployment commands:
helm repo add couchdb https://apache.github.io/couchdb-helm
helm install --name acumos-couchdb --namespace acumos
--set service.type=NodePort --set allowAdminParty=true couchdb/couchdb

What you expected to happen:
Couchdb pods should become ready. This happens as expected under generic kubernetes.

How to reproduce it (as minimally and precisely as possible):

  1. Install OpenShift Origin 3.11
  2. Setup other cluster/namespace prerequisites, e.g. create the namespace as used in the example above.
  3. Install the CouchDB helm chart, as above

Anything else we need to know:

@willholley
Copy link
Member

I don't think this chart has been tested under OpenShift; it's difficult to speculate on the cause of the problem without more detail from the pod logs.

That said, I'd recommend using the CouchDB Operator instead of the Helm chart for OpenShift / OKD deployments.

@blsaws
Copy link
Author

blsaws commented Nov 21, 2019

Here are the logs from the init-copy containers (they are crashing), and output of describe pods:
couchdb-openshift-crash.txt

My goal is where possible to use a consistent set of upstream tools to deploy supplemental components (e.g. mariadb, nexus, ELK, jupyterhub, NiFi, Jenkins, ...). This reduces the maintenance effort and UX variations across k8s envs. But I will take a look at the Operator. In the meantime if you have any suggestions on the reason for the crash I would appreciate it, since the logs really don't tell me anything.

@willholley
Copy link
Member

@blsaws those logs look to be from the init-copy container which succeeded. Can you get the logs from the couchdb container: oc logs acumos-couchdb-couchdb-0 -c couchdb?

@blsaws
Copy link
Author

blsaws commented Nov 25, 2019

Nothing is returned from the logs:
root@77f48ec29783:/# oc logs acumos-couchdb-couchdb-0 -c couchdb
root@77f48ec29783:/#

@willholley
Copy link
Member

@blsaws you might need to use the --previous flag to get the logs of the crashed container. See https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/#my-pod-is-crashing-or-otherwise-unhealthy. At the moment I don't have enough information to provide any guidance as to why it might be failing I'm afraid.

@alwinmark
Copy link

No its just silently failing exiting 1 as well on Rancher with PSPs enabled.
Guess this Chart or the default Container does not work well without certain privileges or rights.

  - containerID: docker:https://41e114505ff6963276d07ae001be4cb4794e1b79532930c1aec8b51107304263
    image: couchdb:2.3.1
    imageID: docker-pullable:https://couchdb@sha256:da2d31cc06455d6fc12767c4947c6b58e97e8cda419ecbe054cc89ab48420afa
    lastState:
      terminated:
        containerID: docker:https://41e114505ff6963276d07ae001be4cb4794e1b79532930c1aec8b51107304263
        exitCode: 1
        finishedAt: 2020-01-30T12:09:42Z
        reason: Error
        startedAt: 2020-01-30T12:09:41Z
    name: couchdb
    ready: false
    restartCount: 2
    started: false
    state:
      waiting:
        message: back-off 20s restarting failed container=couchdb pod=couchdb-tischi-test-couchdb-0_connect(7af5e9ca-38b1-493b-9170-5a58da8c4b5c)
        reason: CrashLoopBackOff
  hostIP: 172.21.1.113
  initContainerStatuses:
  - containerID: docker:https://3be2b192ab8e92628082527f39aa7db417708c55fac2cb0cdf1823078a0e0988
    image: busybox:latest
    imageID: docker-pullable:https://busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
    lastState: {}
    name: init-copy
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker:https://3be2b192ab8e92628082527f39aa7db417708c55fac2cb0cdf1823078a0e0988
        exitCode: 0
        finishedAt: 2020-01-30T12:09:29Z
        reason: Completed
        startedAt: 2020-01-30T12:09:29Z

Logs are empty even with --previous.

In order to reproduce, run K8s cluster with follwoing PSP:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  labels:
  name: restricted-psp
spec:
  allowPrivilegeEscalation: false
  fsGroup:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  requiredDropCapabilities:
  - ALL
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  volumes:
  - configMap
  - emptyDir
  - projected
  - secret
  - downwardAPI
  - persistentVolumeClaim

as it is default by Rancher and similar to OKD when enabling PSPs/SecurityContextClasses

@bondar-pavel
Copy link

Looks like I have same issue, pods can not be created because of psp:

$ sudo kubectl describe statefulset -n couchdb
...
Volume Claims:  <none>
Events:
  Type     Reason        Age                   From                    Message
  ----     ------        ----                  ----                    -------
  Warning  FailedCreate  8m23s (x19 over 30m)  statefulset-controller  create Pod vociferous-garfish-couchdb-0 in StatefulSet vociferous-garfish-couchdb failed error: pods "vociferous-garfish-couchdb-0" is forbidden: unable to validate against any pod security policy: []

bondar-pavel added a commit to bondar-pavel/couchdb-helm that referenced this issue Jun 4, 2020
Some environments enforce PodSecurityPolicy checks
and deployment fails if objects PodSecurityPolicy, ClusterRole and ClusterRoleBinding are not declared.

This commit adds PodSecurityPolicy, ClusterRole and ClusterRoleBinding
objects and adds new configuration option podSecurityPolicy, which is
disabled by default.

Related to apache#13
@bondar-pavel bondar-pavel mentioned this issue Jun 4, 2020
4 tasks
@bondar-pavel
Copy link

PR #30 resolves my issues with pod security policies:

create Pod vociferous-garfish-couchdb-0 in StatefulSet vociferous-garfish-couchdb failed error: pods "vociferous-garfish-couchdb-0" is forbidden: unable to validate against any pod security policy: []

@blsaws Could you please check if it resolves your issue as well?

@bondar-pavel
Copy link

Looks like my issue is different from the original one, since in my case pods were not even created because they did not satisfy policies on the cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants