Skip to content

Commit

Permalink
feat: Native Google Cloud Storage support for artifact. Closes argopr…
Browse files Browse the repository at this point in the history
  • Loading branch information
whynowy committed Mar 25, 2020
1 parent 999b1e1 commit 06cfc12
Show file tree
Hide file tree
Showing 19 changed files with 1,722 additions and 458 deletions.
31 changes: 31 additions & 0 deletions api/openapi-spec/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -1495,6 +1495,10 @@
"title": "Artifactory contains artifactory artifact location details",
"$ref": "#/definitions/io.argoproj.workflow.v1alpha1.ArtifactoryArtifact"
},
"gcs": {
"title": "GCS contains GCS artifact location details",
"$ref": "#/definitions/io.argoproj.workflow.v1alpha1.GCSArtifact"
},
"git": {
"title": "Git contains git artifact location details",
"$ref": "#/definitions/io.argoproj.workflow.v1alpha1.GitArtifact"
Expand Down Expand Up @@ -1790,6 +1794,33 @@
}
}
},
"io.argoproj.workflow.v1alpha1.GCSArtifact": {
"type": "object",
"title": "GCSArtifact is the location of a GCS artifact",
"properties": {
"gCSBucket": {
"$ref": "#/definitions/io.argoproj.workflow.v1alpha1.GCSBucket"
},
"key": {
"type": "string",
"title": "Key is the path in the bucket where the artifact resides"
}
}
},
"io.argoproj.workflow.v1alpha1.GCSBucket": {
"type": "object",
"title": "GCSBucket contains the access information for interfacring with a GCS bucket",
"properties": {
"bucket": {
"type": "string",
"title": "Bucket is the name of the bucket"
},
"serviceAccountKeySecret": {
"title": "ServiceAccountKeySecret is the secret selector to the bucket's service account key",
"$ref": "#/definitions/io.k8s.api.core.v1.SecretKeySelector"
}
}
},
"io.argoproj.workflow.v1alpha1.Gauge": {
"type": "object",
"title": "Gauge is a Gauge prometheus metric",
Expand Down
10 changes: 10 additions & 0 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,8 @@ type ArtifactRepository struct {
HDFS *HDFSArtifactRepository `json:"hdfs,omitempty"`
// OSS stores artifact in a OSS-compliant object store
OSS *OSSArtifactRepository `json:"oss,omitempty"`
// GCS stores artifact in a GCS object store
GCS *GCSArtifactRepository `json:"gcs,omitempty"`
}

func (a *ArtifactRepository) IsArchiveLogs() bool {
Expand Down Expand Up @@ -184,6 +186,14 @@ type OSSArtifactRepository struct {
KeyFormat string `json:"keyFormat,omitempty"`
}

// GCSArtifactRepository defines the controller configuration for a GCS artifact repository
type GCSArtifactRepository struct {
wfv1.GCSBucket `json:",inline"`

// KeyFormat is defines the format of how to store keys. Can reference workflow variables
KeyFormat string `json:"keyFormat,omitempty"`
}

// ArtifactoryArtifactRepository defines the controller configuration for an artifactory artifact repository
type ArtifactoryArtifactRepository struct {
wfv1.ArtifactoryAuth `json:",inline"`
Expand Down
170 changes: 143 additions & 27 deletions docs/configure-artifact-repository.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Configuring Your Artifact Repository

To run Argo workflows that use artifacts, you must configure and use an artifact repository.
Argo supports any S3 compatible artifact repository such as AWS, GCS and Minio.
This section shows how to configure the artifact repository. Subsequent sections will show how to use it.
To run Argo workflows that use artifacts, you must configure and use an artifact
repository. Argo supports any S3 compatible artifact repository such as AWS, GCS
and Minio. This section shows how to configure the artifact repository.
Subsequent sections will show how to use it.

## Configuring Minio

Expand All @@ -13,27 +14,35 @@ $ helm repo update
$ helm install argo-artifacts stable/minio --set service.type=LoadBalancer --set fullnameOverride=argo-artifacts
```

Login to the Minio UI using a web browser (port 9000) after obtaining the external IP using `kubectl`.
Login to the Minio UI using a web browser (port 9000) after obtaining the
external IP using `kubectl`.

```
$ kubectl get service argo-artifacts
```

On Minikube:

```
$ minikube service --url argo-artifacts
```

NOTE: When minio is installed via Helm, it uses the following hard-wired default credentials,
which you will use to login to the UI:
* AccessKey: AKIAIOSFODNN7EXAMPLE
* SecretKey: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
NOTE: When minio is installed via Helm, it uses the following hard-wired default
credentials, which you will use to login to the UI:

Create a bucket named `my-bucket` from the Minio UI.
- AccessKey: AKIAIOSFODNN7EXAMPLE
- SecretKey: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Create a bucket named `my-bucket` from the Minio UI.

## Configuring AWS S3

Create your bucket and access keys for the bucket. AWS access keys have the same permissions as the user they are associated with. In particular, you cannot create access keys with reduced scope. If you want to limit the permissions for an access key, you will need to create a user with just the permissions you want to associate with the access key. Otherwise, you can just create an access key using your existing user account.
Create your bucket and access keys for the bucket. AWS access keys have the same
permissions as the user they are associated with. In particular, you cannot
create access keys with reduced scope. If you want to limit the permissions for
an access key, you will need to create a user with just the permissions you want
to associate with the access key. Otherwise, you can just create an access key
using your existing user account.

```
$ export mybucket=bucket249
Expand All @@ -58,8 +67,9 @@ $ aws iam put-user-policy --user-name $mybucket-user --policy-name $mybucket-pol
$ aws iam create-access-key --user-name $mybucket-user > access-key.json
```


NOTE: if you want argo to figure out which region your buckets belong in, you must additionally set the following statement policy. Otherwise, you must specify a bucket region in your workflow configuration.
NOTE: if you want argo to figure out which region your buckets belong in, you
must additionally set the following statement policy. Otherwise, you must
specify a bucket region in your workflow configuration.

```
...
Expand All @@ -74,38 +84,117 @@ NOTE: if you want argo to figure out which region your buckets belong in, you mu
```

## Configuring GCS (Google Cloud Storage)
Create a bucket from the GCP Console (https://console.cloud.google.com/storage/browser).

Enable S3 compatible access and create an access key.
Note that S3 compatible access is on a per project rather than per bucket basis.
- Navigate to Storage > Settings (https://console.cloud.google.com/storage/settings).
Create a bucket from the GCP Console
(https://console.cloud.google.com/storage/browser).

There are 2 ways to configure a Google Cloud Storage.

### Through Native GCS APIs

- Create and download a Google Cloud service account key.
- Create a kubernetes secret to store the key.
- Configure `gcs` artifact as following in the yaml.

```yaml
artifacts:
- name: message
path: /tmp/message
gcs:
bucket: my-bucket-name
key: path/in/bucket
# serviceAccountKeySecret is a secret selector.
# It references the k8s secret named 'my-gcs-credentials'.
# This secret is expected to have have the key 'serviceAccountKey',
# containing the base64 encoded credentials
# to the bucket.
#
# If it's running on GKE and Workload Identity is used,
# serviceAccountKeySecret is not needed.
serviceAccountKeySecret:
name: my-gcs-credentials
key: serviceAccountKey
```

If it's a GEK cluster, and Workload Identity is configured, there's no need to
create the Service Account key and store it as a K8s secret,
`serviceAccountKeySecret` is also not needed in this case. Please follow the
link to configure Workload Identity
(https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity).

### Use S3 APIs

Enable S3 compatible access and create an access key. Note that S3 compatible
access is on a per project rather than per bucket basis.

- Navigate to Storage > Settings
(https://console.cloud.google.com/storage/settings).
- Enable interoperability access if needed.
- Create a new key if needed.
- Confiture `s3` artifact as following exmaple.

```yaml
artifacts:
- name: my-output-artifact
path: /my-ouput-artifact
s3:
endpoint: storage.googleapis.com
bucket: my-gcs-bucket-name
# NOTE that all output artifacts are automatically tarred and
# gzipped before saving. So as a best practice, .tgz or .tar.gz
# should be incorporated into the key name so the resulting file
# has an accurate file extension.
key: path/in/bucket/my-output-artifact.tgz
accessKeySecret:
name: my-gcs-s3-credentials
key: accessKey
secretKeySecret:
name: my-gcs-s3-credentials
key: secretKey
```

# Configure the Default Artifact Repository

In order for Argo to use your artifact repository, you must configure it as the default repository.
Edit the workflow-controller config map with the correct endpoint and access/secret keys for your repository.
In order for Argo to use your artifact repository, you can configure it as the
default repository. Edit the workflow-controller config map with the correct
endpoint and access/secret keys for your repository.

## S3 compatible artifact repository bucket (such as AWS, GCS and Minio)

Use the `endpoint` corresponding to your S3 provider:

- AWS: s3.amazonaws.com
- GCS: storage.googleapis.com
- Minio: my-minio-endpoint.default:9000

The `key` is name of the object in the `bucket` The `accessKeySecret` and `secretKeySecret` are secret selectors that reference the specified kubernetes secret. The secret is expected to have the keys 'accessKey' and 'secretKey', containing the base64 encoded credentials to the bucket.
The `key` is name of the object in the `bucket` The `accessKeySecret` and
`secretKeySecret` are secret selectors that reference the specified kubernetes
secret. The secret is expected to have the keys 'accessKey' and 'secretKey',
containing the base64 encoded credentials to the bucket.

For AWS, the `accessKeySecret` and `secretKeySecret` correspond to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY respectively.
For AWS, the `accessKeySecret` and `secretKeySecret` correspond to
AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY respectively.

EC2 provides a metadata API via which applications using the AWS SDK may assume IAM roles associated with the instance. If you are running argo on EC2 and the instance role allows access to your S3 bucket, you can configure the workflow step pods to assume the role. To do so, simply omit the `accessKeySecret` and `secretKeySecret` fields.
EC2 provides a metadata API via which applications using the AWS SDK may assume
IAM roles associated with the instance. If you are running argo on EC2 and the
instance role allows access to your S3 bucket, you can configure the workflow
step pods to assume the role. To do so, simply omit the `accessKeySecret` and
`secretKeySecret` fields.

For GCS, the `accessKeySecret` and `secretKeySecret` for S3 compatible access can be obtained from the GCP Console. Note that S3 compatible access is on a per project rather than per bucket basis.
- Navigate to Storage > Settings (https://console.cloud.google.com/storage/settings).
For GCS, the `accessKeySecret` and `secretKeySecret` for S3 compatible access
can be obtained from the GCP Console. Note that S3 compatible access is on a per
project rather than per bucket basis.

- Navigate to Storage > Settings
(https://console.cloud.google.com/storage/settings).
- Enable interoperability access if needed.
- Create a new key if needed.

For Minio, the `accessKeySecret` and `secretKeySecret` naturally correspond the AccessKey and SecretKey.
For Minio, the `accessKeySecret` and `secretKeySecret` naturally correspond the
AccessKey and SecretKey.

Example:

```
$ kubectl edit configmap workflow-controller-configmap -n argo # assumes argo was installed in the argo namespace
...
Expand All @@ -124,13 +213,40 @@ data:
key: secretkey
useSDKCreds: true #tells argo to use AWS SDK's default provider chain, enable for things like IRSA support
```
The secrets are retrieved from the namespace you use to run your workflows. Note that you can specify a `keyPrefix`.

The secrets are retrieved from the namespace you use to run your workflows. Note
that you can specify a `keyPrefix`.

## Google Cloud Storage (GCS)

Argo also can use native GCS APIs to access a Google Cloud Storage bucket.

`serviceAccountKeySecret` refereces to a k8 secret which stores a Google Cloud
service account key to access the bucket.

Example:

```
$ kubectl edit configmap workflow-controller-configmap -n argo # assumes argo was installed in the argo namespace
...
data:
config: |
artifactRepository:
gcs:
bucket: my-bucket
keyFormat: prefix/in/bucket #optional, it could reference workflow variables, such as "{{workflow.name}}/{{pod.name}}"
serviceAccountKeySecret:
name: my-gcs-credentials
key: serviceAccountKey
```

# Accessing Non-Default Artifact Repositories

This section shows how to access artifacts from non-default artifact repositories.
This section shows how to access artifacts from non-default artifact
repositories.

The `endpoint`, `accessKeySecret` and `secretKeySecret` are the same as for configuring the default artifact repository described previously.
The `endpoint`, `accessKeySecret` and `secretKeySecret` are the same as for
configuring the default artifact repository described previously.

```
templates:
Expand Down
4 changes: 2 additions & 2 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1196,7 +1196,7 @@ In the above example, we create a sidecar container that runs nginx as a simple

## Hardwired Artifacts

With Argo, you can use any container image that you like to generate any kind of artifact. In practice, however, we find certain types of artifacts are very common, so there is built-in support for git, http, and s3 artifacts.
With Argo, you can use any container image that you like to generate any kind of artifact. In practice, however, we find certain types of artifacts are very common, so there is built-in support for git, http, gcs and s3 artifacts.

```yaml
apiVersion: argoproj.io/v1alpha1
Expand All @@ -1222,7 +1222,7 @@ spec:
mode: 0755
http:
url: https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kubectl
# Copy an s3 bucket and place it at /s3
# Copy an s3 compatible artifact repository bucket (such as AWS, GCS and Minio) and place it at /s3
- name: objects
path: /s3
s3:
Expand Down
38 changes: 38 additions & 0 deletions examples/input-artifact-gcs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# This example demonstrates the loading of a hard-wired input artifact from a GCP storage.
#
# It uses a GCP Service Account Key stored as a regular Kubernetes secret, to access GCP storage.
# To create the secret required for this example, first run the following command:
#
# $ kubectl create secret generic my-gcs-credentials --from-file=serviceAccountKey=<YOUR-SERVICE-ACCOUNT-KEY-file>
#
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: input-artifact-gcs-
spec:
entrypoint: input-artifact-gcs-example
templates:
- name: input-artifact-gcs-example
inputs:
artifacts:
- name: my-art
path: /my-artifact
gcs:
bucket: my-bucket-name
# key could be either a file or a directory.
key: path/in/bucket
# serviceAccountKeySecret is a secret selector.
# It references the k8s secret named 'my-gcs-credentials'.
# This secret is expected to have have the key 'serviceAccountKey',
# containing the base64 encoded Google Cloud Service Account Key (json)
# to the bucket.
#
# If it's running on GKE, and Workload Identity is used,
# serviceAccountKeySecret is not needed.
serviceAccountKeySecret:
name: my-gcs-credentials
key: serviceAccountKey
container:
image: debian:latest
command: [sh, -c]
args: ["ls -l /my-artifact"]
15 changes: 8 additions & 7 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ module github.com/argoproj/argo
go 1.13

require (
cloud.google.com/go v0.51.0 // indirect
cloud.google.com/go v0.55.0 // indirect
cloud.google.com/go/storage v1.6.0
github.com/Knetic/govaluate v3.0.1-0.20171022003610-9aa49832a739+incompatible
github.com/ajg/form v1.5.1 // indirect
github.com/aliyun/aliyun-oss-go-sdk v2.0.6+incompatible
Expand Down Expand Up @@ -73,12 +74,12 @@ require (
github.com/yudai/golcs v0.0.0-20170316035057-ecda9a501e82 // indirect
github.com/yudai/pp v2.0.1+incompatible // indirect
golang.org/x/crypto v0.0.0-20200128174031-69ecbb4d6d5d
golang.org/x/net v0.0.0-20200226121028-0de0cce0169b
golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d // indirect
golang.org/x/sys v0.0.0-20200124204421-9fbb57f87de9 // indirect
golang.org/x/time v0.0.0-20191024005414-555d28b269f0 // indirect
google.golang.org/genproto v0.0.0-20191230161307-f3c370f40bfb
google.golang.org/grpc v1.26.0
golang.org/x/net v0.0.0-20200301022130-244492dfa37a
golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d
golang.org/x/tools v0.0.0-20200323210725-ef1313dc6d0a // indirect
google.golang.org/api v0.20.0
google.golang.org/genproto v0.0.0-20200317114155-1f3552e48f24
google.golang.org/grpc v1.28.0
gopkg.in/gavv/httpexpect.v2 v2.0.0
gopkg.in/ini.v1 v1.52.0 // indirect
gopkg.in/jcmturner/goidentity.v2 v2.0.0 // indirect
Expand Down
Loading

0 comments on commit 06cfc12

Please sign in to comment.