Code to extract queries from a single or multiple Basex files to CSV files. The code has the following features:
- Choosing query and country(ies)
- Aggregate results per region or query
- Logging execution
SYNTAX:
Rscript <file_script.R> (-d <DB_PATH> | -f <DBs_FOLDER>)
OPTIONS:
-d
for SINGLE DB run followed by the path to the database
-f
for FOLDER that contains multiple dbs followed by the path to the folder
Example of a single database run:
Rscript data_extractor.R -d ./output
The line above will run data_extractor.R on a single database inside output folder located in current directory of the script
Example of a multiple databases run:
Rscript data_extractor.R -f ./set_of_dbs
The line above will run data_extractor.R on a all databases inside set_of_dbs folder located in current directory of the script
Note: database name must start with "database_" in order to by recognized. For example: database_5p4_nze01
QUERIES:
Queries title and XML are listed in the list queries_xml with the following structure:
queries_xml <- list( "" = list (<query.title> = <query.xml>), ...)
Query title is used when are querying the Main.Query using the title to fetch the XML query in file. Or we can use the xml query directly in the list. The former method is prefered as it is cleaner. The latter more prone to human error. To choose between the title or XML to construct the query, we configure Query.BY.
Query.BY <- {"title", "xml"}
- "title": (default) use the title provided in queries_xml list to extract the query's xml from Main_queries.xml.
- "xml": use the the actual xml query provided in queries_xml list to run the query
MAIN.QUERY a path to the main queries xml file to read the query from. Uses forward-slash path seperator i.e. "/". Note: this variable is mandatory.
MAIN.QUERY <- "/path/to/Main_queries.xml"
To select the queries to run from the list queries_xml, configure the vector SELECTED.QUERIES by including the queries numbers. Assigning c() or NULL means include all queries in the list (default)
Example: SELECTED.QUERIES <- c(7, 9)
to run queries 7 and 9
Example: SELECTED.QUERIES <- c(1:18)
for a range of numbers
To select subset of the databases in the folder, list the names of the databases or leave empty to run all folders:
Example: SELECTED.DBs <- c("database_02", ..)
runs query on db name database_02
Example: SELECTED.DBs <- c() or NULL
for all dbs in the folder (default)
To specify the regions to query, lists the regions (including Global) or leave empty (c() or NULL) to run all regions:
Example: REGIONS <- c('USA', 'Canada')
for USA and Canada regions.
Example: REGIONS <- c('Global')
for Global region (default)
Coming soon..
Coming soon..