PhyloProcessR

R package for processing high-throughput sequencing data from raw reads to alignments for many samples from targeted sequence capture for use in phylogenomic/phylogenetic analyses.

The R package and pipeline does the following:

Organize raw read data
Remove adaptor contamination and merge paired-end reads
Decontaminate reads from other organisms
Assemble cleaned reads into contigs
Use a sample-based iterative mapping approach to call SNPs and export for popular programs
Match contigs to design targets for sequence capture
Align and trim contigs from samples
Concatenate all targets or only targets from the same gene

PhyloProcessR prerequisites

PhyloProcessR uses several R packages and other outside programs for certain functions, which will be installed all at once below using an anaconda environment file.

R packages (R version 4.2.2 tested)

From CRAN: devtools, ape, stringr, data.table, seqinr, foreach, doparallel, rdrop2, biomartr
From BioConductor: rsamtools, genomicranges, biostrings

Stand alone programs

fastp: adaptor trimming and paired-end read merging
ORNA: read normalization
bwa: read mapping
hisat2: alternative mapper
spades: assembly
BLAST: matching assembled contigs to targets, other utilities
mafft: creating alignments
trimal: trimming alignments
IqTree: gene tree and concatenation trees
GATK4: variant calling functions
SamTools: variant calling and read mapping tools

Quick installation instructions

First, clone this repository to your computer to obtain the setup files. Or alternatively go to the green "Code" button in top right of this repository and select "download ZIP".

git clone https://github.com/chutter/PhyloProcessR.git

Second, change your working directory in the terminal to the downloaded repository. The key file here is the "environment.yml" anaconda environment file, which must be present in the working directory being used.

cd /PhyloProcessR/setup-configuration_files/

The R packages and outside programs can be installed manually or more easily through the anaconda environment file provided (version numbers are provided in environment file for reporting and exact replication). To install with the environment file, the easiest and quickest way is to first install the Anaconda package manager. Anaconda can be downloaded and installed for different operating systems from https://anaconda.org. Miniconda is recommended as it has a smaller footprint (smaller size and fewer files). Once installed, you can create a new environment for PhyloProcessR by:

conda env create -f environment.yml -n PhyloProcessR

**** WARNING: It is possible that the environment file may fail, however, it has been tested on Linux and MacOS on April 3 2023 and installed fine. For MacOS, you must use the X84 (not M1) version of anaconda as most packages are not available for M1 but can be emulated through X84. Occasionally things break and there are manual installation methods in the Wiki (the first tutorial).

And finally, the cloned GitHub directory may be deleted after installing the prerequisites through the conda env file that manually installs the anaconda environment. There are some useful example files (also in the tutorial here), which could be saved.

To use the environment, it must first be activated in your current terminal session or placed in your cluster job script.

conda activate PhyloProcessR

Installation of R package

The main functions of PhyloProcessR are contained in an R package that has been tested on R version 4.0.2 and use the listed programs above along with custom scripts. To install PhyloProcessR from GitHub, you can use the R package devtools included in the environment above. When running in a cluster environment, the code for installation here should be included at the top of your R script with your selected PhyloProcessR functions. Here are step-by-step instructions for installation:

Install PhyloProcessR by typing in your R console:

devtools::install_github("chutter/PhyloProcessR", upgrade = "never", dependencies = FALSE)

The update = "never" flag ensures that packages already installed via the anaconda environment are not changed, which will often break things. Additionally, dependencies = FALSE is set for the same reason.

Devtools should finish and say the package loaded properly with no errors. Load the package in your R script with:

library(PhyloProcessR)

And installation should be done! All the functions for PhyloProcessR should be ready to go! It is recommended to keep the install line above in your R script as the package is frequently updated for bugs and other features.

You can run the following function to see if PhyloProcessR can find the dependencies:

< coming soon a function to test if they can found >

PhyloProcessR pipeline tutorials

Installation: detailed installation instructions and trouble-shooting

Tutorial 1: PhyloProcessR configuration

Tutorial 2: PhyloProcessR pipeline workflows

Tutorial 3: Assess sequence capture results

Tutorial 4: Combine legacy genbank data with sequence capture

Name		Name	Last commit message	Last commit date
Latest commit History 392 Commits
R		R
man		man
setup-files		setup-files
workflows		workflows
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
PhyloProcessR.Rproj		PhyloProcessR.Rproj
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhyloProcessR

PhyloProcessR prerequisites

Quick installation instructions

Installation of R package

PhyloProcessR pipeline tutorials

About

Releases

Packages

Languages

chutter/PhyloProcessR

Folders and files

Latest commit

History

Repository files navigation

PhyloProcessR

PhyloProcessR prerequisites

Quick installation instructions

Installation of R package

PhyloProcessR pipeline tutorials

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages