Deploy job fails due to invalid metadata.labels being added to k8s resource
Summary
When creating a CRD instance from a Gitlab deploy job (for a Gitlab managed k8s cluster) a label with an invalid value is added, which causes the CRD instance creation to fail. Note this only happens when creating argo CRD instances, but the issue is related to the gitlab environment rather than to argo (I've filed a bug report with argo here).
Steps to reproduce
- Add a managed k8s cluster to a Gitlab.com group, use this cluster in a gitlab.com project for that group.
- Install argo on the cluster, use
kubectl install -f https://raw.githubusercontent.com/argoproj/argo/master/manifests/install.yaml
- Give the gitlab SA for your project that will run the deploy job the RBAC permissions for creating Argo CRDs, e.g.
kubectl create clusterrolebinding argo-server-cluster-role-myproject-gitlab --clusterrole=argo-server-cluster-role --serviceaccount=$(KUBE_NAMESPACE):$(KUBE_NAMESPACE)-service-account
where KUBE_NAMESPACE looks likemyproject-21001231-staging
for a gitlab.com project myproject, with project id 21001231 and an environment namedstaging
. - Add a deploy job to
.gitlab-ci.yaml
that creates a CRD instance via the argo CLI tool (with an environment for the k8s cluster), as an example:
deploy-staging-environment:
stage: deploy
script:
- kubectl version
- kubectl config get-contexts
- kubectl config view
- argo submit --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml
environment:
name: staging
- Execute the deploy job
Example Project
I've encountered this issue in a private project on gitlab.com, cannot share it unfortunately.
What is the current bug behavior?
The deploy job fails due to invalid metadata.labels being added. This only happens when creating this CRD instance from the gitlab environment service account. The failure has not occurred for other service accounts.
This is the error from the deploy job:
$ argo submit --watch https://raw.githubusercontent.com/argoproj/argo/master/examples/hello-world.yaml
time="2020-09-24T13:54:56+02:00" level=error msg="Create request is failed. Error: Workflow.argoproj.io \"hello-world-df5b8\" is invalid: metadata.labels: Invalid value: \"-21007791-staging-imagemanager-21007791-staging-service-account\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')"
2020/09/24 13:54:56 Failed to submit workflow: rpc error: code = InvalidArgument desc = Workflow.argoproj.io "hello-world-df5b8" is invalid: metadata.labels: Invalid value: "-21007791-staging-imagemanager-21007791-staging-service-account": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')
ERROR: Job failed: exit status 1
I didn't redact the project name, project id and environment from my private gitlab.com project as it is part of the invalid label that is being applied when creating this CRD from the gitlab service account. It looks to me that the gitlab service account is missing a piece of information for the label, specifically the projectname (imagemanager in this case)? Could this be because the cluster in use is a group-level cluster?
What is the expected correct behavior?
The service account applies a correctly formated label and submitting the workflow succeeds (this is from the same cluster, but using the default service account with my user's kubeconfig):
Name: hello-world-tvrvx
Namespace: default
ServiceAccount: default
Status: Succeeded
Conditions:
Completed True
Created: Thu Sep 24 13:56:58 +0200 (1 minute ago)
Started: Thu Sep 24 13:56:58 +0200 (1 minute ago)
Finished: Thu Sep 24 13:57:04 +0200 (1 minute ago)
Duration: 6 seconds
ResourcesDuration: 2s*(1 cpu),2s*(100Mi memory)
STEP TEMPLATE PODNAME DURATION MESSAGE
✔ hello-world-tvrvx whalesay hello-world-tvrvx 4s
Relevant logs and/or screenshots
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"archive", BuildDate:"2020-09-03T15:34:56Z", GoVersion:"go1.15.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.9-eks-4c6976", GitCommit:"4c6976793196d70bc5cd29d56ce5440c9473648e", GitTreeState:"clean", BuildDate:"2020-07-17T18:46:04Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* gitlab-deploy gitlab-deploy gitlab-deploy imagemanager-21007791-staging
$ kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://xxx.eu-west-1.eks.amazonaws.com
name: gitlab-deploy
contexts:
- context:
cluster: gitlab-deploy
namespace: imagemanager-21007791-staging
user: gitlab-deploy
name: gitlab-deploy
current-context: gitlab-deploy
kind: Config
preferences: {}
users:
- name: gitlab-deploy
user:
token: [MASKED]
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
This bug happens on GitLab.com
Results of GitLab application Check
This bug happens on GitLab.com