diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..776fa6d --- /dev/null +++ b/.gitignore @@ -0,0 +1,183 @@ +# Created by https://www.gitignore.io/api/linux,macos,python,windows +# Edit at https://www.gitignore.io/?templates=linux,macos,python,windows + +### Linux ### +*~ + +# temporary files which can be created if a process still has a handle open of a deleted file +.fuse_hidden* + +# KDE directory preferences +.directory + +# Linux trash folder which might appear on any partition or disk +.Trash-* + +# .nfs files are created when an open file is removed but is still being accessed +.nfs* + +### macOS ### +# General +.DS_Store +.AppleDouble +.LSOverride + +# Icon must end with two \r +Icon + +# Thumbnails +._* + +# Files that might appear in the root of a volume +.DocumentRevisions-V100 +.fseventsd +.Spotlight-V100 +.TemporaryItems +.Trashes +.VolumeIcon.icns +.com.apple.timemachine.donotpresent + +# Directories potentially created on remote AFP share +.AppleDB +.AppleDesktop +Network Trash Folder +Temporary Items +.apdisk + +### Python ### +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +pip-wheel-metadata/ +share/python-wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.nox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +.hypothesis/ +.pytest_cache/ + +# Translations +*.mo +*.pot + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +target/ + +# pyenv +.python-version + +# pipenv +# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. +# However, in case of collaboration, if having platform-specific dependencies or dependencies +# having no cross-platform support, pipenv may install dependencies that don't work, or not +# install all needed dependencies. +#Pipfile.lock + +# celery beat schedule file +celerybeat-schedule + +# SageMath parsed files +*.sage.py + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# Mr Developer +.mr.developer.cfg +.project +.pydevproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ +.dmypy.json +dmypy.json + +# Pyre type checker +.pyre/ + +### Windows ### +# Windows thumbnail cache files +Thumbs.db +Thumbs.db:encryptable +ehthumbs.db +ehthumbs_vista.db + +# Dump file +*.stackdump + +# Folder config file +[Dd]esktop.ini + +# Recycle Bin used on file shares +$RECYCLE.BIN/ + +# Windows Installer files +*.cab +*.msi +*.msix +*.msm +*.msp + +# Windows shortcuts +*.lnk + +# End of https://www.gitignore.io/api/linux,macos,python,windows + +**/pre-processing-code/*/ +**/pre-processing-code.zip + +response.json diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e1761ad..3ef7a9f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -37,7 +37,7 @@ For more details on the technologies used in our ADX products, please visit [Get ## How can I contribute? #### Report an Issue/Bug or Submit an Improvement/Suggestion -If you have feedback specific to the ADX product featured in this repository, the best way to contact us would be through [opening a GitHub issue](https://github.com/rearc-data/ordering-referring-medicare/issues) in this repository. Before opening an issue please review the existing suggestions to see if your idea is already there. If already present, please comment on the existing issue instead of making a new one. +If you have feedback specific to the ADX product featured in this repository, the best way to contact us would be through [opening a GitHub issue](https://github.com/rearc-data/nppes-npi-registry-data-cms/issues) in this repository. Before opening an issue please review the existing suggestions to see if your idea is already there. If already present, please comment on the existing issue instead of making a new one. When opening an issue please **be as descriptive as possible**. If relevant please **provide information regarding your use-case, development configuration and environment**. The more specific you can be the easier it will be for us to identify and address the situation. diff --git a/README.md b/README.md new file mode 100644 index 0000000..597cd6f --- /dev/null +++ b/README.md @@ -0,0 +1,63 @@ + +Rearc Logo + + + + +NPPES NPI Registry Data | CMS +========================= +You can subscribe to the AWS Data Exchange product utilizing the automation featured in this repository by visiting + +Main Overview +------------- + +This release contains an achieved copy of the National Plan and Provider Enumeration System (NPPES)'s National Provider Identifier (NPI) registry data. NPI numbers are utilized throughout various interactions within the United State medical system to identify individual healthcare providers. + + +Centers for Medicare & Medicaid provide substantial amounts of data to +the public about their programs, various health care topics, and care +settings. If you are interested in learning more about datasets released +by the CMS, please visit the [CMS Data homepage](https://data.cms.gov/). + +#### Data Source + +The provided data files are presented in CSV format, while the associated documentation files are presented PDF format. For details regarding the included files visit [CMS's webpage for NPI Data Dissemination](https://www.cms.gov/Regulations-and-Guidance/Administrative-Simplification/NationalProvIdentStand/DataDissemination). + + + +More Information +---------------- + +- Source - [Centers for Medicare & Medicaid Services + (CMS)](https://www.cms.gov/Regulations-and-Guidance/Administrative-Simplification/NationalProvIdentStand/DataDissemination) +- [CMS Homepage](https://www.cms.gov/) +- [Terms of Use](https://www.usa.gov/government-works) +- Frequency: Weekly +- Formats: CSV, PDF + +Contact Details +--------------- + +- If you find any issues or have enhancements with this product, open + up a GitHub + [issue](https://github.com/rearc-data/Data-Dissmenination-CMS/issues) + and we will gladly take a look at it. Better yet, submit a pull + request. Any contributions you make are greatly appreciated :heart:. +- If you are interested in any other open datasets, please create a + request on our project board + [here](https://github.com/rearc-data/covid-datasets-aws-data-exchange/projects/1). +- If you have questions about this source data, please send the + Center for Medicare & Medicaid Services an email at customerservice@npienumerator.com. +- If you have any other questions or feedback, send us an email at + data@rearc.io. + +About Rearc +----------- + +Rearc is a cloud, software and services company. We believe that +empowering engineers drives innovation. Cloud-native architectures, +modern software and data practices, and the ability to safely experiment +can enable engineers to realize their full potential. We have partnered +with several enterprises and startups to help them achieve agility. Our +approach is simple — empower engineers with the best tools possible to +make an impact within their industry. diff --git a/dataset-description.md b/dataset-description.md new file mode 100644 index 0000000..ad71f6e --- /dev/null +++ b/dataset-description.md @@ -0,0 +1 @@ +This release contains an achieved copy of the National Plan and Provider Enumeration System (NPPES)'s National Provider Identifier (NPI) registry data. NPI numbers are utilized throughout various interactions within the United State medical system to identify individual healthcare providers. diff --git a/init.sh b/init.sh new file mode 100644 index 0000000..bb5a18f --- /dev/null +++ b/init.sh @@ -0,0 +1,155 @@ +#!/usr/bin/env bash + +# Exit on error. Append "|| true" if you expect an error. +set -o errexit +# Exit on error inside any functions or subshells. +set -o errtrace +# Do not allow use of undefined vars. Use ${VAR:-} to use an undefined VAR +#set -o nounset +# Catch the error in case mysqldump fails (but gzip succeeds) in `mysqldump |gzip` +set -o pipefail +# Turn on traces, useful while debugging but commented out by default +# set -o xtrace + +# Sets profile variable to an empty value by default, reassigns in while loop below if it was included as a parameter +PROFILE="" + +while [[ $# -gt 0 ]]; do + opt="${1}" + shift; + current_arg="$1" + case ${opt} in + "-s"|"--s3-bucket") export S3_BUCKET="$1"; shift;; + "-d"|"--dataset-name") export DATASET_NAME="$1"; shift;; + "-p"|"--product-name") export PRODUCT_NAME="$1"; shift;; + "-i"|"--product-id") export PRODUCT_ID="$1"; shift;; + "-r"|"--region") export REGION="$1"; shift;; + "-f"|"--profile") PROFILE=" --profile $1"; shift;; + *) echo "ERROR: Invalid option: \""$opt"\"" >&2; exit 1;; + esac +done + +while [[ ${#DATASET_NAME} -gt 53 ]]; do + echo "dataset-name must be under 53 characters in length, enter a shorter name:" + read -p "New dataset-name: " DATASET_NAME + case ${#DATASET_NAME} in + [1-9]|[1-4][0-9]|5[0-3]) break;; + * ) echo "Enter in a shorter dataset-name";; + esac +done + +while [[ ${#PRODUCT_NAME} -gt 72 ]]; do + echo "product-name must be under 72 characters in length, enter a shorter name:" + read -p "New product-name: " PRODUCT_NAME + case ${#PRODUCT_NAME} in + [1-9]|[1-6][0-9]|7[0-2]) break;; + * ) echo "Enter in a shorter product-name";; + esac +done + +#creating a pre-processing zip package, these commands may need to be adjusted depending on folder structure and dependencies +(cd pre-processing/pre-processing-code && zip -r pre-processing-code.zip . -x "*.dist-info/*" -x "bin/*" -x "**/__pycache__/*") + +#upload pre-preprocessing.zip to s3 +echo "uploading pre-preprocessing.zip to s3" +aws s3 cp pre-processing/pre-processing-code/pre-processing-code.zip s3://$S3_BUCKET/$DATASET_NAME/automation/pre-processing-code.zip --region $REGION$PROFILE + +#creating dataset on ADX +echo "creating dataset on ADX" +DATASET_COMMAND="aws dataexchange create-data-set --asset-type "S3_SNAPSHOT" --description file://dataset-description.md --name \"${PRODUCT_NAME}\" --region $REGION --output json$PROFILE" +DATASET_OUTPUT=$(eval $DATASET_COMMAND) +DATASET_ARN=$(echo $DATASET_OUTPUT | tr '\r\n' ' ' | jq -r '.Arn') +DATASET_ID=$(echo $DATASET_OUTPUT | tr '\r\n' ' ' | jq -r '.Id') + +#creating pre-processing cloudformation stack +echo "creating pre-processing cloudformation stack" +CFN_STACK_NAME="producer-${DATASET_NAME}-preprocessing" +aws cloudformation create-stack --stack-name $CFN_STACK_NAME --template-body file://pre-processing/pre-processing-cfn.yaml --parameters ParameterKey=S3Bucket,ParameterValue=$S3_BUCKET ParameterKey=DataSetName,ParameterValue=$DATASET_NAME ParameterKey=DataSetArn,ParameterValue=$DATASET_ARN ParameterKey=ProductId,ParameterValue=$PRODUCT_ID ParameterKey=Region,ParameterValue=$REGION --region $REGION --capabilities "CAPABILITY_AUTO_EXPAND" "CAPABILITY_NAMED_IAM" "CAPABILITY_IAM"$PROFILE + +echo "waiting for cloudformation stack to complete" +aws cloudformation wait stack-create-complete --stack-name $CFN_STACK_NAME --region $REGION$PROFILE + +if [[ $? -ne 0 ]] +then + # Cloudformation stack created + echo "Cloudformation stack creation failed" + exit 1 +fi + +#invoking the pre-processing lambda function to create first dataset revision +echo "invoking the pre-processing lambda function to create first dataset revision" +LAMBDA_FUNCTION_NAME="source-for-${DATASET_NAME}" +# AWS CLI version 2 changes require explicitly declairing `--cli-binary-format raw-in-base64-out` for the format of the `--payload` +LAMBDA_FUNCTION_STATUS_CODE=$(aws lambda invoke --function-name $LAMBDA_FUNCTION_NAME --invocation-type "RequestResponse" --payload '{ "test": "event" }' response.json --cli-binary-format raw-in-base64-out --region $REGION --query 'StatusCode' --output text$PROFILE) + +#grabbing dataset revision status +echo "grabbing dataset revision status" +DATASET_REVISION_STATUS=$(aws dataexchange list-data-set-revisions --data-set-id $DATASET_ID --region $REGION --query "sort_by(Revisions, &CreatedAt)[-1].Finalized"$PROFILE) + +update () { + echo "" + echo "Manually create the ADX product and enter in the Product ID below:" + read -p "Product ID: " NEW_PRODUCT_ID + + # Cloudformation stack update + echo "updating pre-processing cloudformation stack" + aws cloudformation update-stack --stack-name $CFN_STACK_NAME --use-previous-template --parameters ParameterKey=S3Bucket,ParameterValue=$S3_BUCKET ParameterKey=DataSetName,ParameterValue=$DATASET_NAME ParameterKey=DataSetArn,ParameterValue=$DATASET_ARN ParameterKey=ProductId,ParameterValue=$NEW_PRODUCT_ID ParameterKey=Region,ParameterValue=$REGION --region $REGION --capabilities "CAPABILITY_AUTO_EXPAND" "CAPABILITY_NAMED_IAM" "CAPABILITY_IAM"$PROFILE + + echo "waiting for cloudformation stack update to complete" + aws cloudformation wait stack-update-complete --stack-name $CFN_STACK_NAME --region $REGION$PROFILE + + if [[ $? -ne 0 ]] + then + echo "Cloudformation stack update failed" + break + fi + echo "cloudformation stack update completed" +} + +delete () { + echo "Destroying the CloudFormation stack" + aws cloudformation delete-stack --stack-name $CFN_STACK_NAME --region $REGION$PROFILE + + #check status of cloudformation stack delete action + aws cloudformation wait stack-delete-complete --stack-name $CFN_STACK_NAME --region $REGION$PROFILE + if [[ $? -eq 0 ]] + then + # Cloudformation stack deleted + echo "CloudFormation stack successfully deleted" + break + else + # Cloudformation stack deletion failed + echo "Cloudformation stack deletion failed" + exit 1 + fi +} + +if [[ $DATASET_REVISION_STATUS == "true" ]] +then + echo "Dataset revision completed successfully" + echo "" + + while true; do + echo "Do you want use this script to update the CloudFormation stack? If you enter 'n' your CloudFormation stack will be destroyed:" + read -p "('y' to update / 'n' to destroy): " Y_N + case $Y_N in + [Yy]* ) update; exit;; + [Nn]* ) delete; break;; + * ) echo "Enter 'y' or 'n'.";; + esac + done + + echo "Manually create the ADX product and manually re-run the pre-processing CloudFormation template using the following params:" + echo "" + echo "S3Bucket: $S3_BUCKET" + echo "DataSetName: $DATASET_NAME" + echo "DataSetArn: $DATASET_ARN" + echo "Region: $REGION" + echo "S3Bucket: $S3_BUCKET" + echo "" + echo "For the ProductId param use the Product ID of the ADX product" + +else + echo "Dataset revision failed" + cat response.json +fi diff --git a/product-description.md b/product-description.md new file mode 100644 index 0000000..66814eb --- /dev/null +++ b/product-description.md @@ -0,0 +1,59 @@ + + +NPPES NPI Registry Data | CMS +========================= +You can subscribe to the AWS Data Exchange product utilizing the automation featured in this repository by visiting + +Main Overview +------------- + +This release contains an achieved copy of the National Plan and Provider Enumeration System (NPPES)'s National Provider Identifier (NPI) registry data. NPI numbers are utilized throughout various interactions within the United State medical system to identify individual healthcare providers. + + +Centers for Medicare & Medicaid provide substantial amounts of data to +the public about their programs, various health care topics, and care +settings. If you are interested in learning more about datasets released +by the CMS, please visit the [CMS Data homepage](https://data.cms.gov/). + +#### Data Source + +The provided data files are presented in CSV format, while the associated documentation files are presented PDF format. For details regarding the included files visit [CMS's webpage for NPI Data Dissemination](https://www.cms.gov/Regulations-and-Guidance/Administrative-Simplification/NationalProvIdentStand/DataDissemination). + + + +More Information +---------------- + +- Source - [Centers for Medicare & Medicaid Services + (CMS)](https://www.cms.gov/Regulations-and-Guidance/Administrative-Simplification/NationalProvIdentStand/DataDissemination) +- [CMS Homepage](https://www.cms.gov/) +- [Terms of Use](https://www.usa.gov/government-works) +- Frequency: Weekly +- Formats: CSV, PDF + +Contact Details +--------------- + +- If you find any issues or have enhancements with this product, open + up a GitHub + [issue](https://github.com/rearc-data/Data-Dissmenination-CMS/issues) + and we will gladly take a look at it. Better yet, submit a pull + request. Any contributions you make are greatly appreciated :heart:. +- If you are interested in any other open datasets, please create a + request on our project board + [here](https://github.com/rearc-data/covid-datasets-aws-data-exchange/projects/1). +- If you have questions about this source data, please send the + Center for Medicare & Medicaid Services an email at customerservice@npienumerator.com. +- If you have any other questions or feedback, send us an email at + data@rearc.io. + +About Rearc +----------- + +Rearc is a cloud, software and services company. We believe that +empowering engineers drives innovation. Cloud-native architectures, +modern software and data practices, and the ability to safely experiment +can enable engineers to realize their full potential. We have partnered +with several enterprises and startups to help them achieve agility. Our +approach is simple — empower engineers with the best tools possible to +make an impact within their industry. diff --git a/rearc_logo_rgb.png b/rearc_logo_rgb.png new file mode 100644 index 0000000..787b2fc Binary files /dev/null and b/rearc_logo_rgb.png differ