Skip to content

AWS Lambda function for securely uploading spreadsheet files from Data Contributors to the ingest service, triggered by S3 events.

Notifications You must be signed in to change notification settings

ebi-ait/dcp-secure-spreadsheet-upload

Repository files navigation

Secure Spreadsheet Upload Lambda Deployment

This guide provides step-by-step instructions to deploy the AWS Lambda function for the secure spreadsheet upload process using Docker and Amazon Elastic Container Registry (ECR).

Table of Contents

  1. General Description
  2. Prerequisites
  3. Existing AWS Resources
  4. Deployment
  5. Running Tests

General Description

The purpose of the Lambda functions dcp-secure-spreadsheet-upload/notify_upload and dcp-secure-spreadsheet-upload/ingest_upload is to handle the secure upload of spreadsheet files and the subsequent notification and ingestion processes.

  • dcp-secure-spreadsheet-upload/notify_upload: Detects uploads to the designated S3 bucket, extracts the project UUID from the folder tags, and sends a notification.
  • dcp-secure-spreadsheet-upload/ingest_upload: Uploads the spreadsheet to the ingest system using an API call, triggered (currently) by wranglers.

Users (wranglers) can create and tag upload areas using the hca-util tool. For more details on how to use hca-util to create upload areas, see the hca-util repository.

For more details, see also the following documents:

Authentication System

The token authentication system uses the hca_ingest client to generate tokens. Google service account credentials are retrieved from the environment variable and loaded into the client. For more details, see the hca_ingest repository.

Flowchart

flowchart TD
   A[Wranglers create and tag upload area in S3 with project UUID] --> B[Data contributors upload spreadsheet to S3]
   B --> C[S3 Event Trigger]
   C --> D[Extract Bucket and Key]
   D --> E[Extract Folder UUID and Spreadsheet Name]
   E --> F[Retrieve Project UUID]
   F --> |If project UUID not found| G[Raise Error-not currently supported]
   F --> H[Get Object Metadata]
   H --> I[Send Notification]
   I --> J[Wranglers edit spreadsheet using AWS Workspaces]
   J --> K[Manually trigger upload to ingest]
   K --> L[Load Configuration]
   L --> M[Generate Audience URL]
   M --> |If environment is prod| N[Use Production URL]
   M --> |Else| O[Use Development/Staging URL]
   N --> P[Retrieve Credentials]
   O --> P[Retrieve Credentials]
   P --> Q[Generate Token using hca_ingest]
   Q --> |If token generation fails| R[Raise Error]
   Q --> S[Upload Spreadsheet]
   S --> |If upload fails| T[Raise Error]
   S --> U[Send Notification]
   U --> V[End]
   G --> V
   R --> V
   T --> V

Loading

Prerequisites

Before you begin, ensure you have the following:

  • AWS CLI: Installed and configured with appropriate permissions.
  • Docker: Installed on your local machine.
  • IAM Role: Ensure you have a role with necessary permissions for Lambda execution, S3 access, and SNS. The following AWS managed policies should be attached to the role:
    • AmazonS3ReadOnlyAccess
    • AmazonSNSFullAccess
    • AWSLambdaBasicExecutionRole
  • Amazon ECR: Repository set up to store the Docker image.
  • AWS Account: With access to Lambda, IAM, and ECR services.

Existing AWS Resources

  • ECR Repository: An ECR repository named secure-spreadsheet-upload-repo is already created. You can view it here.
  • Lambda Function: The Lambda function named notify-spreadsheet-upload-function is available here and dcp-secure-spreadsheet-upload-auth here.

Deployment

Navigate to Project Directory:

cd /path/to/dcp-secure-spreadsheet-upload

For notify-spreadsheet-upload-function

Use the provided deploy_notify_upload.sh script to deploy the function.

  1. Ensure the script is executable:
    chmod +x deploy_notify_upload.sh
    
  2. Run the deployment script:
    ./deploy_notify_upload.sh
    

For dcp-secure-spreadsheet-upload-auth

  1. Build the Docker Image:

    docker build -t secure-spreadsheet-upload -f ingest_upload/Dockerfile .
    
  2. Tag the Docker Image: Replace and with your AWS account ID and region.

    docker tag secure-spreadsheet-upload:latest 871979166454.dkr.ecr.us-east-1.amazonaws.com/secure-spreadsheet-upload-repo:latest
    
  3. Authenticate Docker to ECR:

    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 871979166454.dkr.ecr.us-east-1.amazonaws.com
    
  4. Push the Docker Image to ECR:

    docker push 871979166454.dkr.ecr.us-east-1.amazonaws.com/secure-spreadsheet-upload-repo:latest
  5. Creating/Updating the Lambda Function: If creating a new function:

    aws lambda create-function \
    --function-name dcp-secure-spreadsheet-upload-auth \
    --package-type Image \
    --code ImageUri=871979166454.dkr.ecr.us-east-1.amazonaws.com/secure-spreadsheet-upload-repo:latest \
    --role arn:aws:iam::871979166454:role/lambda-execution-role \
    --profile your-aws-profile

    If updating an existing function:

    aws lambda update-function-code \
    --function-name dcp-secure-spreadsheet-upload-auth \
    --image-uri 871979166454.dkr.ecr.us-east-1.amazonaws.com/secure-spreadsheet-upload-repo:latest \
    --profile your-aws-profile

Running Tests

Setting Up the Test Environment

  1. Install the required packages:
    pip install -r requirements.txt
    
  2. Run the tests using pytest:
    pytest tests/
    

Environment Variables for Lambda

  • TOPIC_NAME: The name of the SNS topic.
  • MY_AWS_REGION: The AWS region where your SNS topic is located.
  • GOOGLE_APPLICATION_CREDENTIALS: The JSON credentials file which is required for authenticating and generating tokens using the hca_ingest client.

About

AWS Lambda function for securely uploading spreadsheet files from Data Contributors to the ingest service, triggered by S3 events.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages