###Refresh OMOP Vocabularies In A Single Script###
###Latest updates (recent first):###
2022-09-09
- MIMIC 2.0 is issued: run ETL on MIMIC 2.0.
- bq_run_script.py is updated to fit current BQ requirement of single line queries
- load_row_id is commented out in custom_vocabularies.sql to match standard vocabs loaded outside vocabulary_refresh
- @bq_target_project and @bq_target_dataset are moved to "variables" section in vocabulary_refresh.conf
- known issue: if a custom mapping csv is renamed, the old copy should be deleted from google storage bucket manually. Otherwise, there will be two mappings under old name and under the new.
2020-09-11
- Put it into MIMIC IV to OMOP project
###Usage:###
#####To refresh standard vocabularies#####
- Download and unzip files from Athena (to a path known to vocabulary_refresh.py)
-
set Athena local and gs path, target dataset and project name in vocabulary_refresh.conf
- run
python vocabulary_refresh.py -s10
- run
-
alternatively, run step by step: 2. Copy files to GCP bucket * gsutil cp /*.csv gs:https://the_project_bucket/vocabulary_YYYY_MM_DD/ 3. Load files to intermediate BQ tables * load_to_bq_vocab.py (uncomment content of voc list) 4. Populate target vocabulary tables from intermediate tables * create_voc_from_tmp.sql
#####To refresh or add new custom mapping#####
-
set custom mapping CSV local and gs path, target dataset and project name in vocabulary_refresh.conf
- run
python vocabulary_refresh.py -s20
- run
-
alternatively, run step by step:
- Copy custom mapping files to custom_mapping_csv/ folder, and update custom_mapping_list.tsv
- Copy custom mapping files to GCP bucket
- gsutil cp custom_mapping_csv/*.csv gs:https://some_path/CUSTOM_MAPPING/
- Load files to the intermediate BQ table (tmp_custom_mapping)
- load_to_bq_vocab.py (uncomment value of custom_mapping_table variable)
- check_custom_loaded.sql
- Add custom concepts to vocabulary tables from the intermediate table
- custom_vocabularies.sql
#####Verify the result#####
-
set target dataset and project name in vocabulary_refresh.conf
- run
python vocabulary_refresh.py -s30
- run
-
alternatively, run step by step: 5. Verify target tables * vocabulary_check_bq.sql 6. Clean up temporary tables from the previous step * vocabulary_cleanup_bq.sql
#####Comments#####
- custom_mapping_list.tsv is populated manually, and is helpful to track custom vocabularies and custom concepts ranges in use.
#####Tne End#####