Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwanted (and invalid) metadata.label when creating wftmpl via argo template create #4058

Closed
fvdnabee opened this issue Sep 17, 2020 · 12 comments · Fixed by #4643
Closed

Unwanted (and invalid) metadata.label when creating wftmpl via argo template create #4058

fvdnabee opened this issue Sep 17, 2020 · 12 comments · Fixed by #4643
Labels

Comments

@fvdnabee
Copy link
Contributor

Summary

When creating templates via argo template create a label is added to the workflowtemplate which is not in the yaml file. Because the label is in an invalid format, the creation of the workflowtemplate fails. Using kubectl apply -f does succeed in creating the templates. I have only noticed this issue inside a gitlab-runner process (via a gitlab deploy job). Likely this issue is due to the combination of the gitlab-runner environment and argo template create.

Gitlab sets up a specific k8s environment in the gitlab-runner and (used to) use labels for tracking the k8s resources that are created by gitlab. Likely the label is added to track that these templates were created via a gitlab deployment. Still I find it odd that in the same environment kubectl is able to create the templates, whereas argo is not. It is not clear to me how and where these labels are being added. Is this configured on the client side or is this configured on the server; how could I check (I realize this is a gitlab specific question, not argo related). Any pointers on how to debug this issue further are welcomed.

Diagnostics

$ argo template list -n default
NAME
$ kubectl apply -f https://raw.githubusercontent.com/argoproj/argo/master/examples/workflow-template/templates.yaml -n default
workflowtemplate.argoproj.io/workflow-template-whalesay-template created
workflowtemplate.argoproj.io/workflow-template-random-fail-template created
workflowtemplate.argoproj.io/workflow-template-inner-steps created
workflowtemplate.argoproj.io/workflow-template-inner-dag created
workflowtemplate.argoproj.io/workflow-template-submittable created
$ argo template delete --all -n default
WorkflowTemplate 'workflow-template-inner-dag' deleted
WorkflowTemplate 'workflow-template-inner-steps' deleted
WorkflowTemplate 'workflow-template-random-fail-template' deleted
WorkflowTemplate 'workflow-template-submittable' deleted
WorkflowTemplate 'workflow-template-whalesay-template' deleted
$ argo template create https://raw.githubusercontent.com/argoproj/argo/master/examples/workflow-template/templates.yaml -n default -v
time="2020-09-17T14:05:18+02:00" level=debug msg="CLI version" version="{v2.10.1 2020-09-02T22:53:49Z 854444e47ac00d146cb83d174049bfbb2066bfb2 v2.10.1 clean go1.13.4 gc linux/amd64}"
time="2020-09-17T14:05:18+02:00" level=debug msg="Client options" opts="{{ false false}  0x17084a0 <nil> 0x17084f0}"
time="2020-09-17T14:05:19+02:00" level=debug msg="Getting the template by name"
time="2020-09-17T14:05:19+02:00" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (whalesay-template)"
time="2020-09-17T14:05:19+02:00" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (whalesay-template)"
time="2020-09-17T14:05:19+02:00" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (whalesay-template)"
time="2020-09-17T14:05:19+02:00" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (whalesay-template)"
time="2020-09-17T14:05:19+02:00" level=debug msg="Resolving the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (whalesay-template)"
time="2020-09-17T14:05:19+02:00" level=debug msg="Getting the template" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (whalesay-template)"
time="2020-09-17T14:05:19+02:00" level=debug msg="Getting the template by name" base="*v1alpha1.Workflow (namespace=,name=)" depth=0 tmpl="*v1alpha1.WorkflowStep (whalesay-template)"
2020/09/17 14:05:19 Failed to create workflow template: rpc error: code = InvalidArgument desc = WorkflowTemplate.argoproj.io "workflow-template-whalesay-template" is invalid: metadata.labels: Invalid value: "-21007791-staging-imagemanager-21007791-staging-service-account": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')
ERROR: Job failed: exit status 1
FATAL: exit status 1
❯ argo version
argo: v2.10.1
  BuildDate: 2020-09-02T22:53:49Z
  GitCommit: 854444e47ac00d146cb83d174049bfbb2066bfb2
  GitTreeState: clean
  GitTag: v2.10.1
  GoVersion: go1.13.4
  Compiler: gc
  Platform: linux/amd64
Paste the workflow here, including status:
Just using templates from https://raw.githubusercontent.com/argoproj/argo/master/examples/workflow-template/templates.yaml
Paste the logs from the workflow controller:
Nothing relevant in status messages, as creation fails already in k8s.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@fvdnabee fvdnabee changed the title argo cli inserts unwanted (and invalid) metadata.label when creating wftmpl Unwanted (and invalid) metadata.label when creating wftmpl via argo template create Sep 17, 2020
@simster7
Copy link
Member

When creating the templates with the CLI, it could either create them through an API request to the argo-server instance that you have running or directly from the CLI binary itself. Do you know if you have your CLI configured to use the argo-server? (i.e., by having the ARGO_SERVER env variable set or running commands with the -s flag)

@fvdnabee
Copy link
Contributor Author

@simster7 there are no environment prefixed with ARGO_ in the environment calling argo. Neither is the -s being passed when invoking the argo CLI binary. The argo-server pod log doesn't mention anything interesting either(just go GC stats being logged). My guess would be it's not calling argo-server. I looked at strace output but just say TLS traffic going over what I assume to be a proxy. What would be the alternative way that argo creates wftmpls? Via creating the CRD instances directly? Can I find a (debug) log of these calls somewhere, perhaps in k8s itself? My gut tells me Gitlab has set env variables (or maybe configured the k8s cluster?) to automatically add labels to newly created resources. Though why they are being added only via argo template create and not with any kubectl apply -f call I can't say.

@jessesuen
Copy link
Member

Though why they are being added only via argo template create and not with any kubectl apply -f call I can't say.

-21007791-staging-imagemanager-21007791-staging-service-account must be because of some type of admission webhook.

The difference is that kubectl apply does a PATCH (if resources already exists) whereas argo template create does a POST (Create). These could have different code paths of a mutating webhook which I think explains the behavior difference.

In any case, we strongly believe this to be an environment specific issue.

@fvdnabee
Copy link
Contributor Author

fvdnabee commented Sep 23, 2020

I have managed to narrow this down to the service account that is being used when calling argo template create, specifying a bearer token that uses the serviceaccount managed by gitlab triggers the issue. Using kubectl create to submit work flows (which also does a POST: https://xxx.eks.amazonaws.com/apis/argoproj.io/v1alpha1/namespaces/imagemanager-21007791-staging/workflows) does not lead to the label being added. argo submit with the specific service account also leads the an invalid label being added and the workflow failed to be created.

I'm trying to figure out how that specific service account was setup to adds metadata.label to workflow definitions. It's a service account that is managed by gitlab (we use the gitlab deployment k8s integration). I've read that you can extend the spec of pods that are created by a service account by editing the SA spec. However there is nothing suspicious in either the SA or the secret that it uses:

❯ kubectl get sa -n imagemanager-21007791-staging imagemanager-21007791-staging-service-account -o  yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2020-09-16T12:34:08Z"
  name: imagemanager-21007791-staging-service-account
  namespace: imagemanager-21007791-staging
  resourceVersion: "266806"
  selfLink: /api/v1/namespaces/imagemanager-21007791-staging/serviceaccounts/imagemanager-21007791-staging-service-account
  uid: 9a5413e6-9a31-4b78-b03d-e0f12c9fab02
secrets:
- name: imagemanager-21007791-staging-service-account-token-vfn7w
❯ kubectl get secrets -n imagemanager-21007791-staging imagemanager-21007791-staging-service-account-token-vfn7w -o yaml
apiVersion: v1
data:
  ca.crt: masked
  namespace: aW1hZ2VtYW5hZ2VyLTIxMDA3NzkxLXN0YWdpbmc=
  token: masked
kind: Secret
metadata:
  annotations:
    kubernetes.io/service-account.name: imagemanager-21007791-staging-service-account
    kubernetes.io/service-account.uid: 9a5413e6-9a31-4b78-b03d-e0f12c9fab02
  creationTimestamp: "2020-09-16T12:34:08Z"
  name: imagemanager-21007791-staging-service-account-token-vfn7w
  namespace: imagemanager-21007791-staging
  resourceVersion: "266805"
  selfLink: /api/v1/namespaces/imagemanager-21007791-staging/secrets/imagemanager-21007791-staging-service-account-token-vfn7w
  uid: e6fb73de-1f7f-49c8-9d93-a29fcd886d95
type: kubernetes.io/service-account-token

The token secret has some arcane format which appears to be separated by dots and which includes (apart from what I assume to be the actual token) some metadata. I think this metadata links the token to the SA. It looks like this (after base64 decoding parts of the token):

{"alg":"RS256","kid":"xxx"}
{"iss":"kubernetes/serviceaccount","kubernetes.io/serviceaccount/namespace":"imagemanager-21007791-staging","kubernetes.io/serviceaccount/secret.name":"imagemanager-21007791-staging-service-account-token-vfn7w","kubernetes.io/serviceaccount/service-account.name":"imagemanager-21007791-staging-service-account","kubernetes.io/serviceaccount/service-account.uid":"9a5413e6-9a31-4b78-b03d-e0f12c9fab02","sub":"system:serviceaccount:imagemanager-21007791-staging:imagemanager-21007791-staging-service-account"}

From all of this I haven't figured out how the labels are added for the imagemanager-21007791-staging-service-account via the argo CLI. Any helpful pointers or input are welcomed.

@jessesuen I looked into the web hooks, but to me it appears that there are only the web hooks from AWS:

❯ kubectl get mutatingwebhookconfigurations
NAME                            CREATED AT
pod-identity-webhook            2020-09-15T15:50:45Z
vpc-resource-mutating-webhook   2020-09-15T15:50:47Z
❯ kubectl get validatingwebhookconfigurations
NAME                              CREATED AT
vpc-resource-validating-webhook   2020-09-15T15:50:47Z

@fvdnabee
Copy link
Contributor Author

I noticed the same error when creating the default workflow (named wonderful-python in the screenshot here) from the argo UI (via Submit New Workflow) when using the gitlab managed service account. Login was done via client authentication using argo --token=xxx auth token for the gitlab SA:
image

Submitting the same workflow from my aws eks user works fine.

@alexec alexec added the invalid label Sep 23, 2020
@alexec
Copy link
Contributor

alexec commented Sep 23, 2020

Marking as invalid as this does not appear to be caused by Argo Workflows.

@fvdnabee
Copy link
Contributor Author

Note I filed a bug report on gitlab.com's EE tracker here.

@stale
Copy link

stale bot commented Nov 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 23, 2020
@stale stale bot closed this as completed Nov 30, 2020
@gturbat
Copy link

gturbat commented Dec 3, 2020

Hi, we have the same error when doing argo submit from a pod within the cluster. It happens when the label workflows.argoproj.io/creator is longer than 63 characters and is truncated up to a -.

To reproduce it, you can use a Namespace example-of-wrong-length-submit and a ServiceAccount workflow-manager.

When submitting any template from within a Pod using this service account, you should have the error :

Failed to create workflow template: rpc error: code = InvalidArgument desc = WorkflowTemplate.argoproj.io "workflow-template-whalesay-template" is invalid: metadata.labels: Invalid value: "-serviceaccount-example-of-wrong-length-submit-workflow-manager": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')

It could be great to truncate the creator label in a correct string before Kubernetes does, or to give us the possibility to deactivate this label.

@alexec
Copy link
Contributor

alexec commented Dec 3, 2020

This bug was fixed in v2.11

@gturbat
Copy link

gturbat commented Dec 3, 2020

@alexec we are having this bug on v2.11.7

@alexec
Copy link
Contributor

alexec commented Dec 3, 2020

Would you be interested in submitting a PR? Should be easy to do:

https://github.com/argoproj/argo/blob/f7e85f04b11fd65e45b9408d5413be3bbb95e5cb/workflow/creator/creator.go#L19

@alexec alexec reopened this Dec 3, 2020
@stale stale bot removed the wontfix label Dec 3, 2020
alexec added a commit to alexec/argo-workflows that referenced this issue Dec 3, 2020
alexec added a commit that referenced this issue Dec 4, 2020
alexec added a commit that referenced this issue Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants