Skip to content
This repository has been archived by the owner on Jul 26, 2022. It is now read-only.

Question: pod not using the role defined on the ServiceAccount (IRSA) when fetching secrets (defaults to the EC2 node's role) #452

Closed
Bhashit opened this issue Jul 25, 2020 · 28 comments
Labels
aws irsa question Further information is requested

Comments

@Bhashit
Copy link

Bhashit commented Jul 25, 2020

This is more of a question than an issue.

I am trying to integrate the kubernetes-external-secrets (KES) into EKS based kubernetes cluster. I was trying to use IRSA (IAM Roles For Service Accounts) to add authentication.

I have added the role-arn (that's attached to a policy that provides access to the secrets-manager) to the ServiceAccount for KES. I have verified that assuming this role using the following command allows the secrets to be fetched (I ran another container with AWS-CLI in it and had it attached to the same ServiceAccount I am using for KES):

aws sts assume-role-with-web-identity \
--role-arn $AWS_ROLE_ARN \
--role-session-name mh9test \
--web-identity-token file:https://$AWS_WEB_IDENTITY_TOKEN_FILE \
--duration-seconds 1000

This works correctly, and allows me to fetch secrets correctly (in that other container I ran).

However, the KES pod doesn't assume the correct role. It assumes the role that's attached to the EC2 instance on which it is running. Even though AWS_ROLE_ARN is set as an env-var.

My main question is: why am I required to put the roleArn within the ExternalSecret if I have already specified the role-arn as an annotation on the service-account (which does get added to the pod as AWS_ROLE_ARN env-var). Shouldn't it fallback to using the role that's specified for the ServiceAccount as a default?

It's certainly possible I'm misunderstanding things though. Also, if it turns out that what I was expecting should be the default behavior, I'd be happy to submit changes to the code.

@Bhashit Bhashit changed the title Pod not using IRSA when fetching secrets Pod not using the role defined on the ServiceAccount (IRSA) when fetching secrets (defaults to the EC2 node's role) Jul 25, 2020
@Bhashit Bhashit changed the title Pod not using the role defined on the ServiceAccount (IRSA) when fetching secrets (defaults to the EC2 node's role) Question: pod not using the role defined on the ServiceAccount (IRSA) when fetching secrets (defaults to the EC2 node's role) Jul 25, 2020
@Flydiverny Flydiverny added aws question Further information is requested labels Jul 26, 2020
@Flydiverny
Copy link
Member

Currently there are some issues with IRSA usage due to #442 but this sounds like another issue.

Your assumption that the KES pods role should be used and there shouldn't be a need to specify a roleArn is correct, unless one of course wants to assume another role for that specific secret.

@Bhashit
Copy link
Author

Bhashit commented Jul 26, 2020

Looking at https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/TokenFileWebIdentityCredentials.html, it seems like it could be either an AWS SDK version issue (I haven't checked the versions yet). Or a bug in the AWS SDK

it [sdk] will also read IAM role to be assumed from AWS_ROLE_ARN environment variable

EDIT

I ran the following commands in an interactive console on the KES pod (as outlined in here aws/aws-sdk-js#3090 (comment)):

const aws = require('aws-sdk');
const sts = new aws.STS({region: "us-east-2"});
sts.getCallerIdentity({}, function (error,data){console.log(data)});

And it assumes the correct identity.

@riccardomc
Copy link
Member

I believe that in your test you are running your pod as root user, while the kubernetes-external-secrets pod is running as non-root by default. This is an important difference because in the "Pod Configuration" section of IRSA AWS Documentation you can read:

By default, only containers that run as root have the proper file system permissions to read the web identity token file. You can provide these permissions by having your containers run as root, or by providing the following security context for the containers in your manifest. The fsGroup ID is arbitrary, and you can choose any valid group ID.

Therefore you should try setting securityContext.fsGroup for your pod. The helm chart of this project already supports this.

As a side note... I believe this is a rather obscure and confusing mechanism. People in AWS seem to think the same, see this comment.

@Bhashit
Copy link
Author

Bhashit commented Jul 27, 2020

The pod's already running with securityContext.fsGroup set up to 65534

Also, the test pod's not running as the root user.

@tymokvo
Copy link

tymokvo commented Aug 3, 2020

Also experiencing this with kubernetes-external-secrets-5.1.0 when applied with kubectl instead of the Helm chart.

The env of the Pod returns the expected role arn from the specified ServiceRole, but using the commands from this comment in a REPL in the external-secrets container returns the role of the EKS node on which the Pod is running. process.env.AWS_ROLE_ARN also returns the IAM role of the ServiceAccount. The securityGroup.fsGroup is already set to 65534.

@Flydiverny
Copy link
Member

@tymokvo just to confirm, you are saying the REPL commands does not return your expected role (ie the service account role)?

@tymokvo
Copy link

tymokvo commented Aug 3, 2020

Yes that's correct. It returns the role of the node.
(edit)
And, just to reiterate, I deployed this by exporting the helm chart from the repo into individual descriptors before following the directions for this in this repo's readme, if that matters.

@tbondarchuk
Copy link

had the same issue, fixed by adding to values:

securityContext:
  runAsNonRoot: true
  fsGroup: 65534

worth adding a note to readme perhaps to "2. IAM roles for service accounts."? comment in values.yaml is quite easy to miss especially for "still learning" k8s noobs like me :)

@ordonezf
Copy link

ordonezf commented Aug 6, 2020

I'm having the same issue as @tymokvo (also installing with kubectl instead of helm instal), the service account has a role set and I can see it when I run env | grep AWS inside the pod, but if I run the REPL commands I get the role from the EKS node. Tried it on 4.2.0 and 5.1.0, same results.

Already tried adding the securityContext like @aliusmiles said but that didn't work.

This is part of the error KES throws
"message":"User: arn:aws:sts::ACCOUNT-ID:assumed-role/EKS-NODE-ROLE is not authorized to perform: secretsmanager:GetSecretValue on resource: arn:aws:secretsmanager:us-east-1:ACCOUNT-ID:secret:MY-SECRET"

@Flydiverny
Copy link
Member

Please provide more details on environment. EKS version at least :)

When running in the pod please verify if you can read the file set in the AWS env

@tymokvo
Copy link

tymokvo commented Aug 6, 2020

This is our EKS cluster version:

GitVersion:"v1.16.8-eks-e16311", GitCommit:"e163110a04dcb2f39c3325af96d019b4925419eb"

Could this be a problem with the node's IAM role? I also see cannot perform sts:AssumeRoleWithWebIdentity if I try to set a roleArn for the ExternalSecret resource.

@ordonezf
Copy link

ordonezf commented Aug 6, 2020

@Flydiverny My bad!
EKS version: v1.17.6-eks-4e7f64

Followed this to create the policy and the role for the service account: https://docs.aws.amazon.com/eks/latest/userguide/create-service-account-iam-policy-and-role.html

When running in the pod please verify if you can read the file set in the AWS env

I can read it if SecurityContext: fsGroup: 65534 is enabled in the deployment, otherwise it says Permission denied.

@Flydiverny
Copy link
Member

Could this be a problem with the node's IAM role? I also see cannot perform sts:AssumeRoleWithWebIdentity if I try to set a roleArn for the ExternalSecret resource.

For the AssumeRuleWithWebIdentity, we made a mistake with a change introduced in 4.1.0, the revert is merged but not released (I cannot cut releases, only godaddy guys can 🤷). So for that case please stick to 4.0.0 if you can for now.

I can read it if SecurityContext: fsGroup: 65534 is enabled in the deployment, otherwise it says Permission denied.

Aight, yeah the fsGroup needs to be set to fix the file permissions, this should allow external-secrets to pick up the file, and then the AWS SDKs default credential chain should detect the IAM role for service account token. The EC2 node role that gets used instead is the last fallback in the credential chain.

Will make some tests on my EKS cluster.

@Flydiverny
Copy link
Member

I get the same results as I got on Fargate #442 (comment)

My helm value overrides:

env:
  AWS_REGION: eu-west-1        
  AWS_DEFAULT_REGION: eu-west-1
  DISABLE_POLLING: true        
  LOG_LEVEL: debug

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::111111111111:role/eksctl-fargate-test-kes-iamserviceaccount-role

image:
  tag: 4.0.0

securityContext:
  fsGroup: 65534

Installing using helm 3 (using chart version 5.0.0):

helm install external-secrets external-secrets/kubernetes-external-secrets -f values.yaml

Printing AWS envs in container

❯ k exec -it external-secrets-kubernetes-external-secrets-db59955b-gc2fj printenv | grep AWS_

AWS_DEFAULT_REGION=eu-west-1
AWS_ROLE_ARN=arn:aws:iam::111111111111:role/eksctl-fargate-test-kes-iamserviceaccount-role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_REGION=eu-west-1
❯ kubectl version --short
Client Version: v1.18.6
Server Version: v1.16.8-eks-fd1ea7
apiVersion: 'kubernetes-client.io/v1'
kind: ExternalSecret
metadata:
  name: hello-service-with-role
spec:
  backendType: secretsManager
  roleArn: arn:aws:iam::111111111111:role/direct-role
  data:
    - key: hello-service/password
      name: password

@tymokvo
Copy link

tymokvo commented Aug 6, 2020

Ok, thank you for the feedback, I will try rolling back to 4.0.0 and test.

@ordonezf
Copy link

@Flydiverny A few updates from our side:
Following this tutorial all the way doesn't work, we had to stop at step 9 of Create an IAM role to make it work.

Now we have a role tied to our service account and it can fetch secrets without issues.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: secrets-kubernetes-external-secrets
  namespace: default
  annotations: 
    eks.amazonaws.com/role-arn: arn:aws:iam:: 111111111111:role/test-eks-secrets
  labels:
    app: kubernetes-external-secrets

After that I tried setting a roleArn to an external secret so we could limit the access a bit. This role has a policy attached so that it can only read mysecret.

apiVersion: 'kubernetes-client.io/v1'
kind: ExternalSecret
metadata:
  name: external-secret-test
spec:
  backendType: secretsManager
  roleArn: arn:aws:iam::111111111111:role/eks-test-mysecret-getter
  dataFrom:
    - path/to/mysecret

So apparently this works on 5.1.0 🤷

But, I could only make it work if the service account has a role attached to it, it doesn't need to have any policies attached, I'm using one without and that makes the trick I guess.

@Flydiverny
Copy link
Member

Flydiverny commented Aug 11, 2020

Yes, the service account needs to have a role set, otherwise you wouldn't get the AWS envs set right?
While you can get it working on 5.1.0, it is not following intended behavior.

I take it you got this working by having a trust relationship for arn:aws:iam::111111111111:role/eks-test-mysecret-getter looking something like

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::111111111111:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringLike": {
          "oidc.eks.region-code.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:SERVICE_ACCOUNT_NAMESPACE:SERVICE_ACCOUNT_NAME"
        }
      }
    }
  ]
}

This isn't the intended behavior as this bypasses the role used by the external-secrets service.

The intended setup is to have a trust relationship from arn:aws:iam:: 111111111111:role/test-eks-secrets to assume role arn:aws:iam::111111111111:role/eks-test-mysecret-getter eg something like

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:role/test-eks-secrets"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

@larsrnielsen
Copy link

larsrnielsen commented Aug 21, 2020

I faced the same issue today, I am using version 5.2.0. My external secret's SA has the annotation: 'eks.amazonaws.com/role-arn: arn:aws:iam::something:role/secrets_manager' which has the policy that allows it to get secret values. However, when I do 'kubectl get externalsecret -n my-ns' I see an error which complains about the Node not being allowed to perform 'secretsmanager:GetSecretValue'. Is there a fix for this?

ERROR, User: arn:aws:sts::something:assumed-role/node/i-something is not authorized to perform: secretsmanager:GetSecretValue on resource: arn:aws:secretsmanager:eu-central-1:something: secret: mykey-something

My ExternalSecret looks like:

apiVersion: v1
items:
- apiVersion: kubernetes-client.io/v1
  kind: ExternalSecret
  metadata:
    annotations:
    name: something
    namespace: my-app 
  spec:
    backendType: secretsManager
    data:
    - key: mykey
      name: .dockerconfigjson
    template:
      type: kubernetes.io/dockerconfigjson
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::something:role/secrets_manager
  labels:
    app.kubernetes.io/instance: kubernetes-external-secrets
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: kubernetes-external-secrets
    helm.sh/chart: kubernetes-external-secrets-5.2.0
  name: kubernetes-external-secrets
  namespace: external-secrets
secrets:
- name: kubernetes-external-secrets-token-vs44n

@amitkarpe
Copy link

@larsrnielsen
Can you please share "Trust Relationship" for your role "arn:aws:iam::something:role/secrets_manager"?
'secretsmanager:GetSecretValue' gives impression that ExternalSecret is not able to use IRSA. So it is falling back to EKS Node i.e. it is using role as "node/i-something".
You must have a correct "Trust Relationship" & role must be annotate correct Service Account.
Please refer gist, where you can find an example for trust.json.

cat <<EOF  > trust.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringLike": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:*"
        }
      }
    }
  ]
}
EOF

@larsrnielsen
Copy link

@larsrnielsen
Can you please share "Trust Relationship" for your role "arn:aws:iam::something:role/secrets_manager"?

You were right - I had a typo. Thanks, Lars

@babadofar
Copy link

I also had a typo in my policy config, which improved the situation.
However, the error with no access due to assuming the role from the node happens. I can fix it by killing the external-secrets pod. Something seems to be a little fishy. This is on 5.2

@dfboyd
Copy link

dfboyd commented Sep 24, 2020

I have set up a serviceaccount with an AWS role, and the correct AssumeRoleWithWebIdentity policy on it. I run kubernetes-external-secrets under the serviceaccount, and here's what I have:
With version 5.1.0, I'm able to access secrets using the role attached to the serviceaccount.
With version 5.2.0, the kubernetes-external-secrets pod ignores the serviceaccount role, and uses the worker-node role instead.

@iAnomaly
Copy link
Contributor

It looks like the IRSA logic has been removed completely in 5.2 so stick with 5.1.

I was also hitting EACCES errors on 5.1 when running with the Helm chart's default securityContext (non-root):

"stack":"Error: EACCES: permission denied, open '/var/run/secrets/eks.amazonaws.com/serviceaccount/token'\n at Object.openSync (fs.js:458:3)\n at Object.readFileSync (fs.js:360:35)\n at loadServiceToken (/app/config/aws-config.js:35:13)\n at /app/config/aws-config.js:55:81\n at new Promise ()\n at SecretsManagerBackend.assumeRole [as _assumeRole] (/app/config/aws-config.js:54:14)\n at SecretsManagerBackend._get (/app/lib/backends/secrets-manager-backend.js:34:30)\n at /app/lib/backends/kv-backend.js:63:32\n at Array.map ()\n at SecretsManagerBackend._fetchDataFromValues (/app/lib/backends/kv-backend.js:62:33)"

The solution is described above by @riccardomc and you can read more details here and here.

@Flydiverny
Copy link
Member

Flydiverny commented Sep 25, 2020

It looks like the IRSA logic has been removed completely in 5.2 so stick with 5.1.
This is simply incorrect.

  • If you're having issues check your configuration.
  • If it works with 5.1.0 and not 5.2.0 you are most likely not using the correct trust relationship configuration.
  • If the token file can't be read, set the fsGroup as mentioned by several people and README and values file.

Create a service account, a trust relationship for the service account (or annotate the one created by helm chart)
Make sure KES uses the service account.
https://docs.aws.amazon.com/eks/latest/userguide/create-service-account-iam-policy-and-role.html

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::111111111111:role/eksctl-fargate-test-kes-iamserviceaccount-role

securityContext:
  fsGroup: 65534

Trust relation ship for arn:aws:iam::111111111111:role/eksctl-fargate-test-kes-iamserviceaccount-role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::111111111111:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/F5C083E55DB8AE8A685E5F11E3DDCAB8"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringLike": {
          "oidc.eks.eu-west-1.amazonaws.com/id/F5C083E55DB8AE8A685E5F11E3DDCAB8:sub": "system:serviceaccount:*"
        }
      }
    }
  ]
}

^ you should really have something more specific so not all service accounts can assume this role..

Either attach permissons to this role so it can fetch your secrets.

Or create a secondary IAM role arn:aws:iam::111111111111:role/my-externalsecret-specific-role
with a trust relationship to the role used by the service account

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:role/eksctl-fargate-test-kes-iamserviceaccount-role"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

and use this role in your external secrets by setting spec.roleArn: arn:aws:iam::111111111111:role/my-externalsecret-specific-role

Notice the different Action specified in these trust relationships.

Either you get:
Service Account -> sts:AssumeRoleWithWebIdentity (pod role!!) -> SECRETS
Service Account -> sts:AssumeRoleWithWebIdentity (pod role!!) -> sts:AssumeRole (secret role!!) -> SECRETS

@patalwell
Copy link

patalwell commented Mar 13, 2021

For me it was not having the AWS CLI client version 2 installed on my Jenkins Slave. In other words, even though I had setup a service account for my slaves and configured that service account to assume a role with the OIDC provider in my EKS cluster. My AWS client was still using the previous credentials chain e.g. the node role. The same is true of the AWS Java SDK. One needs to use version 2 in order to use the web identity credentials chain for AWS.

+ aws sts get-caller-identity
{
    "UserId": "AROATEC3FQQGVDRCHDPEM:botocore-session-1615588710",
    "Account": "214939567117",
    "Arn": "arn:aws:sts::214939567117:assumed-role/gpfm-nonprod-jenkins-role/botocore-session-1615588710"
}

Changing the file permissions and container user to root also helped to prevent further errors with authorization and authentication.

@sandile-shongwe
Copy link

I had a similar issue. The problem was my apps that were running within the pods were using much older versions of the AWS SDK.

https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html

@tdeheurles
Copy link

Hey, I see the same issue with:

implementation platform('software.amazon.awssdk:bom:2.15.0')
implementation 'software.amazon.awssdk:ssm'
implementation 'software.amazon.awssdk:sts'

I wanted to debug by using stsClient.getCallerIdentity(), but that did change the role ...
It's really strange, the call to getCallerIdentity() have a "side effect" that switch the role to the expected service account one.
I have seen this behaviour on 2 of my services.

To summarize:

  • when I use sts.getCallerIdentity() once, the role is the expected serviceAccount
  • without the call to sts.getCallerIdentity(), the role is the node one.

@cp1408
Copy link

cp1408 commented Jan 8, 2022

Faced the same issue..Controller using Node-Instance-Role instead of IRSA role set in Service Account.
Using below configuration in helm chart values.yaml quick fixed my issue.

securityContext:
  runAsNonRoot: true
  fsGroup: 65534

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
aws irsa question Further information is requested
Projects
None yet
Development

No branches or pull requests