Tracking Land-Use Change

This is an open, reproducible, computational research project on land-use change in the UK.

The code is developed by:

Peter Levy

based on a combination of targets and workflowr packages in R.

The background to the method for the estimation of land-use change is described in this paper.

Project dependencies

This is an open, shareable, reproducible, computational research project.

All the computational work and document preparation is done with the R statistical computing environment.
The research project is contained in a single directory, with the exception that some data sets are too large to store on GitHub.
We use the renv package to manage the R package versions used by the project
We are using the targets package to structure the project so that the work is computationally reproducible.
The project code and documents are shared publicly on GitHub at https://github.com/NERC-CEH/luct
The main report is produced using bookdown and shared publicly on GitHub at https://nerc-ceh.github.io/luct/
We are exploring the workflowr package to structure the project so that all the materials and outputs are available via an openly accessible, automatically generated website. However, GitHub cannot currently show both the bookdown and workflowr website documents simultaneously, so this is still under investigation.

Workflow management

The project uses the R targets package to structure and manage the workflow and to make it reproducible. Central to this is the idea of the workflow as a "pipeline" - a defined list of functions which transform data. Here, the core pipeline contains the computational steps that read, reformat and process the input data (time series and maps of land use and land-use change data), and run the data assimilation steps that estimate the matrices of land-use change, and produce the maps of past land use. Potentially there can be multiple pipelines, which produce other analyses, reports, or publications, in addition to the core process. These are used to generate documentation in the form of web pages with the workflowr package, but are not discussed further here.

The core pipeline is defined in the file _targets.R as a list of "targets".
The targets represent the steps in the series of computations which make up the pipeline. A target is defined with the syntax tar_target(target_name, function_name(inputs)). The target is thus a named R data object which is the outcome of a named function with specified inputs. The one exception to this is that the target may simply be a file for input or output. In the current project, the core pipeline is a list of 87 targets which specify the input files, the reformatting and transformation of these data, and subsequent calculations which make up the data assimilation algorithm.

The pipeline is managed using a "Make"-like procedure, which analyses the dependencies between the different steps in the pipeline. If there have been no changes to the code in the target functions or input data since the last time it was run, it identifies that everything is up-to-date, and no further computation is needed. If any the source code of target function or the content of any data file has changed, it identifies which parts of the pipeline are affected by this, and all the dependencies are recomputed. This has several advantages: forcing the workflow to be declared at a higher level of abstraction; only running the necessary computation, so saving run-time for tasks that are already up to date; and most importantly, providing tangible evidence that the results match the underlying code and data, and confirm the computation is reproducible. So as to identify changes, each target is represented by its hash value, stored in the _targets directory.

Project directory structure

`_targets` directory

This directory is managed by the targets package. It contains the metadata describing the status of the computational pipelines and the cached results of those computations.

`analysis` directory

workflowr creates a set of standard directories. See the package documentation for details on how these directories are used. The analysis contains rmarkdown notebooks which document the workflow. These are still in development.

`R` directory

This contains the bulk of the R source code for the functions used in the project.

`data-raw` directory

This contains the raw data files for the project, in their original form as far as possible. To avoid duplication, this is a symbolic link to an earlier iteration However, many of these are too large to share via GitHub, and would need to be shared by another mechanism (e.g. as binary assets).

`data` directory

This contains the processed data files resulting from transformations of the raw data. This typically involves reprojection, reclassification, filtering and unit conversions. Again, many of these are too large to share via GitHub.

`docs` directory

This contains the html web pages generated by the Rmarkdown files in with workflowr or bookdown.

`output` directory

This contains output files from the project, the results of the data assimlation.

`slurm` directory

This contains files for the steps which require high-performance computing, run via slurm, the widely used job scheduling system on HPC systems. These are generic enough to run on any HPC machine with slurm, and have been run on both JASMIN and POLAR, althouh the queue names, number of processors and memory limits will be system-specific.

`manuscripts` directory

The report is prepared and formatted using bookdown in a subdirectory of manuscripts that contains all the necessary infrastructure files (templates, bibliographies, etc.).

`renv` directory

The renv package keeps track of the R packages (and their versions) used by the project. It allows anyone to reinstate the same packages and versions in their local copy of the project.

The renv directory contains the information need by renv to reinstate the local package environment

`.gitignore`

.gitignore in the R project root directory is used for all manual entries so that all the manual rules are in one place. Packages, such as renv, may create their own .gitignore files in subdirectories that they manage.

Installation

Assuming you already have a current version of R installed, clone the project repository https://github.com/NERC-CEH/luct from GitHub.

When you open the project, you may get warning messages about packages not being installed. This is because you need to use the renv package to reinstate the packages that are used by the project.

Install renv in that project if it is not already installed
Use renv::restore() to install all the needed packages in the project-specific library:
```
renv::restore()
```

Get data

Any files in data, output and _targets that are more than trivially small are not shared via Git and GitHub. They will be shared via a separate, yet to be determined, mechanism (e.g. Zenodo).

`renv` collaboration

The renv package is used to keep track of the installed packages and their versions. See the renv collaboration guide or the workflow for synchronising package environments between collaborators.

Still to do

More detailed setup instructions and notes should go in this project-level READ.md file.
The README.md files in the subdirectories are currently generic, but should describe the purpose of each subdirectory and the files in that directory.

Acknowledgements

The website is based on a template by Ross W. Gayler

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
R		R
_targets		_targets
analysis		analysis
data/mask		data/mask
docs		docs
manuscripts		manuscripts
public_output/WPAB		public_output/WPAB
renv		renv
slurm		slurm
.Rprofile		.Rprofile
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION		CITATION
LICENSE.md		LICENSE.md
README.md		README.md
_targets.R		_targets.R
_targets_packages.R		_targets_packages.R
_workflowr.yml		_workflowr.yml
luct.Rproj		luct.Rproj
renv.lock		renv.lock
run.R		run.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tracking Land-Use Change

Project dependencies

Workflow management

Project directory structure

`_targets` directory

`analysis` directory

`R` directory

`data-raw` directory

`data` directory

`docs` directory

`output` directory

`slurm` directory

`manuscripts` directory

`renv` directory

`.gitignore`

Installation

Get data

`renv` collaboration

Still to do

Acknowledgements

About

Releases

Packages

Languages

License

vmyrgiotis/luct

Folders and files

Latest commit

History

Repository files navigation

Tracking Land-Use Change

Project dependencies

Workflow management

Project directory structure

_targets directory

analysis directory

R directory

data-raw directory

data directory

docs directory

output directory

slurm directory

manuscripts directory

renv directory

.gitignore

Installation

Get data

renv collaboration

Still to do

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`_targets` directory

`analysis` directory

`R` directory

`data-raw` directory

`data` directory

`docs` directory

`output` directory

`slurm` directory

`manuscripts` directory

`renv` directory

`.gitignore`

`renv` collaboration

Packages