This guide provides step-by-step instructions to deploy the AWS Lambda function for the secure spreadsheet upload process using Docker and Amazon Elastic Container Registry (ECR).
The purpose of the Lambda functions dcp-secure-spreadsheet-upload/notify_upload and dcp-secure-spreadsheet-upload/ingest_upload is to handle the secure upload of spreadsheet files and the subsequent notification and ingestion processes.
- dcp-secure-spreadsheet-upload/notify_upload: Detects uploads to the designated S3 bucket, extracts the project UUID from the folder tags, and sends a notification.
- dcp-secure-spreadsheet-upload/ingest_upload: Uploads the spreadsheet to the ingest system using an API call, triggered (currently) by wranglers.
Users (wranglers) can create and tag upload areas using the hca-util
tool. For more details on how to use hca-util
to create upload areas, see the hca-util repository.
For more details, see also the following documents:
- Managed access dataset - Data and metadata review and export SOP
- Managed access dataset - Data Transfer SOP
The token authentication system uses the hca_ingest
client to generate tokens. Google service account credentials are retrieved from the environment variable and loaded into the client. For more details, see the hca_ingest repository.
flowchart TD
A[Wranglers create and tag upload area in S3 with project UUID] --> B[Data contributors upload spreadsheet to S3]
B --> C[S3 Event Trigger]
C --> D[Extract Bucket and Key]
D --> E[Extract Folder UUID and Spreadsheet Name]
E --> F[Retrieve Project UUID]
F --> |If project UUID not found| G[Raise Error-not currently supported]
F --> H[Get Object Metadata]
H --> I[Send Notification]
I --> J[Wranglers edit spreadsheet using AWS Workspaces]
J --> K[Manually trigger upload to ingest]
K --> L[Load Configuration]
L --> M[Generate Audience URL]
M --> |If environment is prod| N[Use Production URL]
M --> |Else| O[Use Development/Staging URL]
N --> P[Retrieve Credentials]
O --> P[Retrieve Credentials]
P --> Q[Generate Token using hca_ingest]
Q --> |If token generation fails| R[Raise Error]
Q --> S[Upload Spreadsheet]
S --> |If upload fails| T[Raise Error]
S --> U[Send Notification]
U --> V[End]
G --> V
R --> V
T --> V
Before you begin, ensure you have the following:
- AWS CLI: Installed and configured with appropriate permissions.
- Docker: Installed on your local machine.
- IAM Role: Ensure you have a role with necessary permissions for Lambda execution, S3 access, and SNS. The following AWS managed policies should be attached to the role:
- AmazonS3ReadOnlyAccess
- AmazonSNSFullAccess
- AWSLambdaBasicExecutionRole
- Amazon ECR: Repository set up to store the Docker image.
- AWS Account: With access to Lambda, IAM, and ECR services.
- ECR Repository: An ECR repository named
secure-spreadsheet-upload-repo
is already created. You can view it here. - Lambda Function: The Lambda function named
notify-spreadsheet-upload-function
is available here anddcp-secure-spreadsheet-upload-auth
here.
Navigate to Project Directory:
cd /path/to/dcp-secure-spreadsheet-upload
Use the provided deploy_notify_upload.sh script to deploy the function.
- Ensure the script is executable:
chmod +x deploy_notify_upload.sh
- Run the deployment script:
./deploy_notify_upload.sh
-
Build the Docker Image:
docker build -t secure-spreadsheet-upload -f ingest_upload/Dockerfile .
-
Tag the Docker Image: Replace and with your AWS account ID and region.
docker tag secure-spreadsheet-upload:latest 871979166454.dkr.ecr.us-east-1.amazonaws.com/secure-spreadsheet-upload-repo:latest
-
Authenticate Docker to ECR:
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 871979166454.dkr.ecr.us-east-1.amazonaws.com
-
Push the Docker Image to ECR:
docker push 871979166454.dkr.ecr.us-east-1.amazonaws.com/secure-spreadsheet-upload-repo:latest
-
Creating/Updating the Lambda Function: If creating a new function:
aws lambda create-function \ --function-name dcp-secure-spreadsheet-upload-auth \ --package-type Image \ --code ImageUri=871979166454.dkr.ecr.us-east-1.amazonaws.com/secure-spreadsheet-upload-repo:latest \ --role arn:aws:iam::871979166454:role/lambda-execution-role \ --profile your-aws-profile
If updating an existing function:
aws lambda update-function-code \ --function-name dcp-secure-spreadsheet-upload-auth \ --image-uri 871979166454.dkr.ecr.us-east-1.amazonaws.com/secure-spreadsheet-upload-repo:latest \ --profile your-aws-profile
- Install the required packages:
pip install -r requirements.txt
- Run the tests using pytest:
pytest tests/
- TOPIC_NAME: The name of the SNS topic.
- MY_AWS_REGION: The AWS region where your SNS topic is located.
- GOOGLE_APPLICATION_CREDENTIALS: The JSON credentials file which is required for authenticating and generating tokens using the hca_ingest client.