Skip to content

Latest commit

 

History

History
82 lines (69 loc) · 3.34 KB

run_tasks_manually.md

File metadata and controls

82 lines (69 loc) · 3.34 KB

Running tasks manually

Context

Instead of using the one script (workflow.py) that executes the Luigi Workflow Orchestration as shown in Step 2 of the main README.md, this documentation will break down the tasks in the script so that it can be run individually and manually.

workflow-visualisation-luigi

Running Luigi tasks manually

Step 1. ExtractLoadAirportData

  • Execute python script to extract airport data from website link and load into database:
    # In the root of project directory:  
    $  python ./extract_load/airports.py

Step 2. DbtDeps

Step 3. DbtSeedAirports

  • Use dbt to easily seed CSV files stored locally:

    # In the ./dbt directory:  
    $ dbt seed --profiles-dir ./

    This alternative to using SQL Alchemy in Python scripts to upload to the databases.

    In this step, dbt will upload the ./dbt/data/raw_airports.csv to the database.

Step 4. DbtRunAirports

  • Run dbt that cleans the Airport data to be used in later steps to scrape arrival data (using airport iata/icao code):
    # In the ./dbt directory:  
    $ dbt run --profiles-dir ./ --model tag:cleaned_airports
    In this step, dbt will compile and execute the SQL Query to create:

Step 5. ScrapeLoadArrivalData

  • Execute python script to extract arrival data from website link and load into database:
    # In the root of project directory:  
    $ python ./extract_load/arrivals.py
    The script in this step will query the table in the database created in Step 2.4 above to obtain the airport codes (iata/icao) and use them to loop through the arrival data website.

    Note:

Step 6. DbtSeedArrivals

  • Use dbt to easily seed CSV files stored locally:
    # In the ./dbt directory:  
    $ dbt seed --profiles-dir ./
    In this step, dbt will upload the ./dbt/data/raw_arrivals.csv to the database.

Step 7. DbtRunAnalysis