Skip to content

Latest commit

 

History

History
83 lines (45 loc) · 9.83 KB

contributing.md

File metadata and controls

83 lines (45 loc) · 9.83 KB

Contributing to the sits R package

We welcome all contributors to sits package! Please submit questions, bug reports, and requests in the issues tracker. If you plan to contribute code, go ahead! Fork the repo and submit a pull request. A few notes:

  • This package is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
  • If you have large changes, please open an issue first to discuss.
  • We will include contributors as authors in the DESCRIPTION file (with their permission) for contributions that go beyond small typos in code or documentation.
  • This package generally uses the rOpenSci packaging guidelines for style and structure.
  • Documentation is generated by roxygen2. Please write documentation in code files and let it auto-generate documentation files.
  • For more substantial contributions, consider adding a new section to one of the chapters of the SITS book (https://e-sensing.github.io/sitsbook/), which has been written in R markdown and whose source is available in the sitsbook repository.
  • We aim for testing that has high coverage and is robust. Include tests with any major contribution to code.
  • We particularly welcome additions in two areas: new STAC-based image repositories and new raster machine learning/deep learning algorithms. Please see more details below.

General structure of sits code.

New functions that build on the sits API should follow the general principles below.

API design

  • The target audience for sits is the community of remote sensing experts with Earth Sciences background who want to use state-of-the-art data analysis methods with minimal investment in programming skills. The design of the sits API considers the typical workflow for land classification using satellite image time series and thus provides a clear and direct set of functions, which are easy to learn and master.

  • For this reason, we welcome contributors that provide useful additions to the existing API, such as new ML/DL classification algorithms. In case of a new API function, before making a pull request please raise an issue stating your rationale for a new function.

R programming models

  • Most functions in sits use the S3 programming model with a strong emphasis on generic methods wich are specialized depending on the input data type. See for example the implementation of the sits_bands() function.

  • Please do not include contributed code using the S4 programming model. Doing so would break the structure and the logic of existing code. Convert your code from S4 to S3.

  • Use generic functions as much as possible, as they improve modularity and maintenance. If your code has decision points using if-else clauses, such as if A, do X; else do Y consider using generic functions.

  • Functions that use the torch package use the R6 model to be compatible with that package. See for example, the code in sits_tempcnn.R and api_torch.R. To convert pyTorch code to R and include it is straightforward. Please see the Technical Annex of the sits on-line book.

Adherence to the tidyverse, sf and terra

The sits code relies on the packages of the tidyverse to work with tables and list. We use dplyr and tidyr for data selection and wrangling, purrr and slider for loops on lists and table, lubridate to handle dates and times.

Adherence to the sits data types

  • The sits package in built on top of three data types: time series tibble, data cubes and models. Most sits functions have one or more of these types as inputs and one of them as return values.

  • The time series tibble contains data and metadata. The first six columns contain the metadata: spatial and temporal information, the label assigned to the sample, and the data cube from where the data has been extracted. The time_series column contains the time series data for each spatiotemporal location. All time series tibbles are objects of class sits.

  • The cube data type is designed to store metadata about image files. In principle, images which are part of a data cube share the same geographical region, have the same bands, and have been regularized to fit into a pre-defined temporal interval. Data cubes in sits are organized by tiles. A tile is an element of a satellite's mission reference system, for example MGRS for Sentinel-2 and WRS2 for Landsat. A cube is a tibble where each row contains information about data covering one tile. Each row of the cube tibble contains a column named file_info; this column contains a list that stores a tibble

  • The cube data type is specialised in raster_cube (ARD images), vector_cube (ARD cube with segmentation vectors). probs_cube (probabilities produced by classification algorithms on raster data), probs_vector_cube(probabilites generated by vector classification of segments), uncertainty_cube (cubes with uncertainty information), and class_cube (labelled maps). See the code in sits_plot.R as an example of specialisation of plot to handle different classes of raster data.

  • All ML/DL models in sits which are the result of sits_train belong to the ml_model class. In addition, models are assigned a second class, which is unique to ML models (e.g, rfor_model, svm_model) and generic for all DL torch based models (torch_model). The class information is used for plotting models and for establishing if a model can run on GPUs.

Literal values, error messages and colors

  • The internal sits code has no literal values, which are all stored in the YAML configuration files ./inst/extdata/config.yml and ./inst/extdata/config_internals.yml. The first file contains configuration parameters that are relevant to users, related to visualisation and plotting; the second contains parameters that are relevant only for developers. These values are accessible using the .conf function. For example, the value of the default size for leaflet objects (64 MB) is accessed using the command .conf["view", "leaflet_megabytes"].

  • Error messages are also stored outside of the code in the YAML configuration file ./inst/extdata/config_messages.yml. These values are accessible using the .conf function. For example, the error associated to an invalid NA value for an input parameter is accessible using th function .conf("messages", ".check_na_parameter").

  • Color handling in sits is described in the Technical Annex section "How colors work in sits". The legends and colors available by default are described in the YAML file ./inst/extdata/config_colors.yml.

Supporting new STAC-based image catalogues

  • If you want to include a STAC-based catalogue not yet supported by sits, we encourage you to look at existing implementations of catalogues such as Microsoft Planetary Computer (MPC), Digital Earth Africa (DEA) and AWS.

  • STAC-based catalogues in sits are associated to YAML description files, which are available in the directory .inst/exdata/sources. For example, the YAML file config_source_mpc.yml describes the contents of the MPC collections supported by sits. Please first provide an YAML file which lists the detailed contents of the new catalogue you wish to include. Follow the examples provided.

  • After writing the YAML file, you need to consider how to access and query the new catalogue. The entry point for access to all catalogues is the sits_cube.stac_cube() function, which in turn calls a sequence of functions which are described in the generic interface api_source.R. Most calls of this API are handled by the functions of api_source_stac.R which provides an interface to the rstac package and handles STAC queries.

  • Each STAC catalogue is different. The STAC specification allows providers to implement their data descriptions with specific information. For this reason, the generic API described in api_source.R needs to be specialized for each provider. Whenever a provider needs specific implementations of parts of the STAC protocol, we include them in separate files. For example, api_source_mpc.R implements specific quirks of the MPC platform. Similarly, specific support for CDSE (Copernicus Data Space Environment) is available in api_source_cdse.R.

Supporting new Machine Learning and Deep Learning algorithms

  • In general terms, ML/DL algorithms in sits are encapsulated as closures which are the output of the sits_train() function. In line with the established practices in R, each closure contains a function that classifies input values, as well as information on the samples used to train the model.

  • Please read the Technical Annex to the sits book. It describes how include a new ML method, in this case the lightGBM algorithm. Follow those guidelines to include a new ML algorithm.

  • If you aim to include a torch based deep learning method, in addition to understanding the concepts presented in the Technical Annex please study carefully the implementation of sits_tempcnn() and sits_lighttae().

  • Bear in mind that your only task is to provide a new function that is compatible with the requirements of ML/DL methods in sits. Once the function has been correctly implemented, you will be able to use in connection with the rest of sits.

Roadmap

  • The roadmap for sits is included as part of the issues tracker. Issues created by the developers are assigned to milestones. Each milestone corresponds to an expected new version of sits to be released in CRAN.