Instead of using the one script (workflow.py
) that executes the Luigi Workflow Orchestration as shown in
Step 2 of the main README.md,
this documentation will break down the tasks in the script so that it can be run individually and manually.
- Execute python script to extract airport data from website link and load into database:
# In the root of project directory: $ python ./extract_load/airports.py
- Install the dbt dependencies:
Specifically, we will be using macros from dbt_util in our custom macros that are ultimately used in the SQL queries to forms our final analytics tables.
# If not in the dbt directory: $ cd ./dbt/ # In the ./dbt directory: $ dbt deps
-
Use dbt to easily seed CSV files stored locally:
# In the ./dbt directory: $ dbt seed --profiles-dir ./
This alternative to using SQL Alchemy in Python scripts to upload to the databases.
In this step, dbt will upload the
./dbt/data/raw_airports.csv
to the database.
- Run dbt that cleans the Airport data to be used in later steps to scrape arrival data (using airport iata/icao code):
In this step, dbt will compile and execute the SQL Query to create:
# In the ./dbt directory: $ dbt run --profiles-dir ./ --model tag:cleaned_airports
- base_airports table.
- Execute python script to extract arrival data from website link and load into database:
The script in this step will query the table in the database created in Step 2.4 above to obtain the airport codes (iata/icao) and use them to loop through the arrival data website.
# In the root of project directory: $ python ./extract_load/arrivals.py
Note:
- Use dbt to easily seed CSV files stored locally:
In this step, dbt will upload the
# In the ./dbt directory: $ dbt seed --profiles-dir ./
./dbt/data/raw_arrivals.csv
to the database.
- Run dbt that cleans the Arrival data and Transform the tables to its Factual Tables for analytics:
In this step, dbt will compile and execute the SQL Query to create:
# In the ./dbt directory: $ dbt run --profiles-dir ./ --exclude tag:cleaned_airports
- base_arrivals__malaysia table.
- stg_airports__malaysia_distances table.
- fct_airports__malaysia_distances_km table.
- fct_arrivals__malaysia_summary table.