Skip to content

Multi-omics data normalisation, model fitting and visualisation.

License

Notifications You must be signed in to change notification settings

AstraZeneca/Omicsfold

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Omicsfold

Maturity level-Prototype

Multi-omics data normalisation, model fitting, and visualisation.

Overview

This is a utility R package containing custom code and scripts developed to establish a working approach for integration of multi-omics data.

The package provides a unified toolkit for the analysis and integration of multi-omic high-throughput data. It relies upon the mixOmics toolkit to provide implementations of many of the underlying projection to latent structures (PLS) methods used to analyse high-dimensional data. In addition to this, it includes custom implementations of data pre-processing, normalisation, collation, model validation, visualisation & output functions.

The originally individual scripts have been collected into a formal package that should be installable and usable within an analysts' R environment without further configuration. The package is fully documented at the function level.

Getting Started

This package and analysis requires R v3.6 or above. It is largely built upon the mixOmics integration framework. The dependencies vary significantly in source, so an installation script is provided to make satisfying the dependencies as simple as possible. mixOmics installs its own dependencies as well. Note that we install mixOmics from the GitHub repository as this version is more up to date than the one on Bioconductor and has a number of fixes which are needed to avoid bugs.

Notable dependencies that will be installed if they are not already:

  • mixOmics
  • WGCNA
  • ggplot2
  • dplyr & magrittr
  • reshape2

See the DESCRIPTION file for a complete dependency list

Installation

Due to the number of dependencies and the number of places those dependencies come from, there is an installation script available. This can be run by opening up an R session in your preferred environment, ensuring your working directory is the OmicsFold directory, then issuing the following commands:

source('install.R')
install.omicsfold()

This should install all the dependencies and then finally the OmicsFold package itself. If there are any issues due to versions changing or changes in which repository maintains the active version of a package, you may have to update the script.

If you are having issues installing OmicsFold in a conda environment, please try the following steps:

First, create the conda environment:

conda create --name OmicsFold 
source activate OmicsFold
conda install r=3.6.0
conda install -c conda-forge boost-cpp

Second, launch R in the conda environment and manually install the following packages (or if you are installing directly in a local instance of R):

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("metagenomeSeq")
BiocManager::install("org.Mm.eg.db")
install.packages("XML", repos = "https://www.omegahat.net/R")
source("https://labs.genetics.ucla.edu/horvath/htdocs/CoexpressionNetwork/GeneAnnotation/installAnRichment.R")
installAnRichment()
source('install.R')
install.omicsfold()

For installation using nextflow (https://www.nextflow.io/docs/latest/getstarted.html) please see https://github.com/AstraZeneca/Omicsfold/tree/master/OmicsFold/nextflow_pipeline

Usage

Import the OmicsFold and the mixOmics packages in R and you're ready to go. Some functions also require dplyr to be loaded so it's a good idea to load it anyway. Certain plotting functions also may require ggplot2 to be loaded.

library(OmicsFold)
library(mixOmics)
library(dplyr)
library (ggplot2) #(optional)

Data Normalisation

A number of normalisation functions have been provided. Each has documentation which can be read in the usual way in R. For example, the help for the function normalise.tss can be viewed by calling ?normalise.tss. A brief description of the usage of each function can be read in the Getting Started with Normalisation document, with a few key functions also showing example code for how to use it.

  • low.count.removal()
  • normalise.tss()
  • normalise.css()
  • normalise.logit()
  • normalise.logit.empirical()
  • normalise.clr()
  • normalise.clr.within.features()

Analysis of mixOmics Output

Once a mixOmics model has been fitted, OmicsFold can be used to perform a number of visualisation and data extraction functions. Below is a brief list of the functionality provided. While these are well documented in the R help system, descriptions of how to use each function can also be found in the Getting Started with Model Analysis document.

  • Model variance analysis - functions are provided to extract the percentage contributions of each component to the model variance and the centroids of variance across the blocks of a DIABLO model.
  • Feature analysis for sPLS-DA models - feature loadings on the fitted singleomics model can be exported as a sorted table, while feature stability across many sparse model fits can also be exported. As there may be many components to export stability for, another function lets you combine these into a single table as well as a plotting function allowing you to plot stability of the selected features as a visualisation.
  • Feature analysis for DIABLO models - similarly to the features for singleomics models above, multiomics models can also have feature loadings and stability exported. Associated correlations between features of different blocks can be exported as either a matrix and then also converted to a CSV file appropriate for importing into Cytoscape where it can form a network graph.
  • Model predictivity - we provide a function to plot the predictivity of a model from a confusion matrix.
  • Utility functions - offers a way to take long feature names being passed to plots and truncate them for display.
  • BlockRank - implements a novel approach to analysing feature importance between blocks of data.

Other Information

To contact the maintainers or project director, please refer to the AUTHORS file. If you are thinking of contributing to OmicsFold, all the information you will need is in the CONTRIBUTING file.

OmicsFold is licensed under the Apache-2.0 software licence as documented in the LICENCE file. Separately installed dependencies of OmicsFold may be licensed under different licence agreements. If you plan to create derivative works from OmicsFold or use OmicsFold for commercial or profitable enterprises, please ensure you adhere to all the expectations of these dependencies and seek legal advice if you are unsure.