Skip to content

Repository containing scripts for importing OpenAlex snapshots into BigQuery

License

Notifications You must be signed in to change notification settings

naustica/openalex

Repository files navigation

Workflow for Processing and Loading OpenAlex data into Google BigQuery

This repository contains instructions on how to extract and transform OpenAlex data for data analysis with Google BigQuery.

Requirements

Download Snapshot

$ aws s3 sync 's3:https://openalex' 'openalex-snapshot' --no-sign-request

Data transformation

$ sbatch openalex_hpc.sh

Uploading Files to Google Bucket

$ gsutil -m cp -r /scratch/users/haupka/works gs:https://bigschol

Creating a BigQuery Table

$ bq load --ignore_unknown_values --source_format=NEWLINE_DELIMITED_JSON subugoe-collaborative:openalex.works gs:https://bigschol/works/*.gz schema_openalex_work.json

About

Repository containing scripts for importing OpenAlex snapshots into BigQuery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published