Skip to content

alswaina/gcam_data_extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DESCRIPTION

Code to extract queries from a single or multiple Basex files to CSV files. The code has the following features:

  1. Choosing query and country(ies)
  2. Aggregate results per region or query
  3. Logging execution

Part 1: Script

SYNTAX:

Rscript <file_script.R> (-d <DB_PATH> | -f <DBs_FOLDER>)

OPTIONS:

-d for SINGLE DB run followed by the path to the database

-f for FOLDER that contains multiple dbs followed by the path to the folder

Example of a single database run:

Rscript data_extractor.R -d ./output

The line above will run data_extractor.R on a single database inside output folder located in current directory of the script

Example of a multiple databases run:

Rscript data_extractor.R -f ./set_of_dbs

The line above will run data_extractor.R on a all databases inside set_of_dbs folder located in current directory of the script

Note: database name must start with "database_" in order to by recognized. For example: database_5p4_nze01

Part 2: Configuration File

QUERIES:

Queries title and XML are listed in the list queries_xml with the following structure:

queries_xml <- list( "" = list (<query.title> = <query.xml>), ...)

Query title is used when are querying the Main.Query using the title to fetch the XML query in file. Or we can use the xml query directly in the list. The former method is prefered as it is cleaner. The latter more prone to human error. To choose between the title or XML to construct the query, we configure Query.BY.

Query.BY <- {"title", "xml"}

  • "title": (default) use the title provided in queries_xml list to extract the query's xml from Main_queries.xml.
  • "xml": use the the actual xml query provided in queries_xml list to run the query

MAIN.QUERY a path to the main queries xml file to read the query from. Uses forward-slash path seperator i.e. "/". Note: this variable is mandatory.

MAIN.QUERY <- "/path/to/Main_queries.xml"

To select the queries to run from the list queries_xml, configure the vector SELECTED.QUERIES by including the queries numbers. Assigning c() or NULL means include all queries in the list (default)

Example: SELECTED.QUERIES <- c(7, 9) to run queries 7 and 9

Example: SELECTED.QUERIES <- c(1:18) for a range of numbers

To select subset of the databases in the folder, list the names of the databases or leave empty to run all folders:

Example: SELECTED.DBs <- c("database_02", ..) runs query on db name database_02

Example: SELECTED.DBs <- c() or NULL for all dbs in the folder (default)

To specify the regions to query, lists the regions (including Global) or leave empty (c() or NULL) to run all regions:

Example: REGIONS <- c('USA', 'Canada') for USA and Canada regions.

Example: REGIONS <- c('Global') for Global region (default)

Part 3: Logging

Coming soon..

Part 4: Output

Coming soon..

Releases

No releases published

Packages

No packages published

Languages