Skip to content

Latest commit

 

History

History
126 lines (82 loc) · 6.4 KB

README.md

File metadata and controls

126 lines (82 loc) · 6.4 KB

Reproducing the results

Prerequisites

We compare RawHash2 with UNCALLED, Sigmap, and RawHash. We specify the versions we use for each tool below.

We list the links to download and compile each tool for comparison below:

We use minimap2 (v2.24) to generate the ground truth mapping information by mapping basecalled seqeunces to their corresponding reference genomes. We use the following minimap2 version:

We use various tools to process and analyze the data we generate using each tool. The following tools must also be installed in your machine. We suggest using conda to install these tools with their specified versions as almost all of them are included in the conda repository.

Please make sure that all of these tools are in your PATH

Creating a Virtual Environment (Optional)

You can generate the virtual environment using conda to make sure all the prerequisities are met. Here is an example usage that replicates our virtual environment we use in our evaluations.

#We will use rawhash2-env directory to download and compile the tools from their repositories
mkdir -p rawhash2-env/bin && cd rawhash2-env

#Important: If you already completed the Step 0 and Step 1 as described, you can skip these steps and add the binaries to your PATH again
#Re-adding the binary to your path is necessary after you start a new shell session.

#Optional Step 0 Creating a conda environment (Note we highly recommend using conda for easy installation of dependencies).
#If not using conda, the packages with the specified versions below (e.g.,  python=3.6.15) must be installed manually in your environment
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

conda create -n rawhash2-env python=3.6.15 pip=21.3.1 ont_vbz_hdf_plugin=1.0.1 seqkit=2.5.1
conda activate rawhash2-env

#Installing SRA Toolkit
wget -qO- https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/3.0.1/sratoolkit.3.0.1-ubuntu64.tar.gz | tar xzv; cp -r ./sratoolkit.3.0.1-ubuntu64/bin/* bin/; rm -rf sratoolkit.3.0.1-ubuntu64

#Step 1 Compiling the tools
#Cloning and compiling RawHash2
git clone --recursive https://github.com/CMU-SAFARI/RawHash.git rawhash2 && cd rawhash2 && make && cp ./bin/rawhash2 ../bin/ && cd ..

#Cloning and compiling RawHash
wget -qO- https://github.com/CMU-SAFARI/RawHash/releases/download/v1.0/RawHash-1.0.tar.gz | tar -xzv && cd rawhash && make && cp ./bin/rawhash ../bin/ && cd ..

#Cloning and compiling Sigmap v0.1 commit c9a40483264c9514587a36555b5af48d3f054f6f
git clone --recursive https://github.com/haowenz/sigmap.git sigmap && cd sigmap && git checkout c9a40483264c9514587a36555b5af48d3f054f6f && make && cp sigmap ../bin/ && cd ..

#Cloning and compiling UNCALLED v2.1 commit 58dbec69f625e0343739d821788d536b578ea41d
git clone --recursive https://github.com/skovaka/UNCALLED.git uncalled && cd uncalled && git checkout 58dbec69f625e0343739d821788d536b578ea41d && cd submods/bwa && git pull origin master && cd ../../ && pip3 install . && cd ..

#Downloading and compiling minimap2 v2.24
wget https://github.com/lh3/minimap2/releases/download/v2.24/minimap2-2.24.tar.bz2; tar -xf minimap2-2.24.tar.bz2; rm minimap2-2.24.tar.bz2; mv minimap2-2.24 minimap2; cd minimap2 && make && cp minimap2 ../bin/ && cd ..

#Step 2 Adding binaries to PATH
#If you are skipping Step 0 and Step 1, uncomment the following line and execute:
# conda activate rawhash2-env
export PATH=$PWD/bin:$PATH
cd rawhash2/test/

Datasets

The scripts and a README can be found in the data directory to download the datasets.

You can start the evaluation step after downloading the datasets as described in README.

To start downloading the datasets, enter data

cd ./data

Evaluation

Read Mapping

The scripts and a README can be found in the read mapping directory to perform read mapping using UNCALLED, Sigmap, RawHash, and RawHash2.

To start performing the read mapping evaluations, enter ./evaluation/read_mapping

cd ./evaluation/read_mapping

Relative Abundance Estimation

The scripts and a README can be found in the relative abundance directory to perform the relative abundance estimation using UNCALLED, Sigmap, RawHash, and RawHash2.

To start performing the relative abundance estimation evaluations, enter ./evaluation/relative_abundance

cd ./evaluation/relative_abundance

Contamination Analysis

The scripts and a README can be found in the contamination directory to perform the contamination analysis using UNCALLED, Sigmap, RawHash, and RawHash2.

To start performing the contamination analysis evaluations, enter ./evaluation/contamination

cd ./evaluation/contamination/

Sequence Until

The scripts and a README can be found in the sequence until directory to evaluate the benefits of the Sequence Until mechanism using UNCALLED and RawHash2.

To start evaluating Sequence Until, enter ./evaluation/relative_abundance/sequenceuntil

cd ./evaluation/relative_abundance/sequenceuntil