-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improving the scripts for reproducibility
- Loading branch information
1 parent
836fa99
commit 49a5983
Showing
66 changed files
with
50,862 additions
and
50,462 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
# Reproducing the results | ||
|
||
## Prerequisites | ||
|
||
We compare RawHash with [UNCALLED](https://github.com/skovaka/UNCALLED) and [Sigmap](https://github.com/haowenz/sigmap). We specify the versions we use for each tool below. | ||
|
||
We list the links to download and compile each tool for comparison below: | ||
|
||
* [UNCALLED](https://github.com/skovaka/UNCALLED/tree/74a5d4e5b5d02fb31d6e88926e8a0896dc3475cb) | ||
* [Sigmap](https://github.com/haowenz/sigmap/commit/c9a40483264c9514587a36555b5af48d3f054f6f) | ||
|
||
We use minimap2 to generate the ground truth mapping information by mapping basecalled seqeunces to their corresponding reference genomes. We use the following minimap2 version: | ||
|
||
* [Minimap2 v2.24](https://github.com/lh3/minimap2/releases/tag/v2.24) | ||
|
||
We use various tools to process and analyze the data we generate using each tool. The following tools must also be installed in your machine. We suggest using conda to install these tools with their specified versions as almost all of them are included in the conda repository. | ||
|
||
* [Python v3.6.15](https://www.python.org/downloads/release/python-3615/) | ||
* [pip v20.2.3 -- via conda](https://anaconda.org/conda-forge/pip/20.2.3/download/noarch/pip-20.2.3-py_0.tar.bz2) | ||
* [ont_vbz_hdf_plugin v1.0.1 -- via conda](https://anaconda.org/bioconda/ont_vbz_hdf_plugin/files?version=1.0.1) | ||
|
||
Please make sure that all of these tools are in your `PATH` | ||
|
||
## Creating a Virtual Environment (Optional) | ||
|
||
You can generate the virtual environment using conda to make sure all the prerequisities are met. Here is an example usage that replicates our virtual environment we use in our evaluations. | ||
|
||
```bash | ||
#We will use rawhash-env directory to download and compile the tools from their repositories | ||
mkdir -p rawhash-env/bin && cd rawhash-env | ||
|
||
#Important: If you already completed the Step 0 and Step 1 as described, you can skip these steps and add the binaries to your PATH again | ||
#Re-adding the binary to your path is necessary after you start a new shell session. | ||
|
||
#Optional Step 0 Creating a conda environment (Note we highly recommend using conda for easy installation of dependencies). | ||
#If not using conda, the packages with the specified versions below (e.g., python=3.6.15) must be installed manually in your environment | ||
conda create -n rawhash-env python=3.6.15 pip=20.2.3 ont_vbz_hdf_plugin=1.0.1 | ||
conda activate rawhash-env | ||
|
||
#Step 1 Compiling the tools | ||
#Cloning and compiling RawHash | ||
git clone --recursive https://github.com/CMU-SAFARI/RawHash.git rawhash && cd rawhash && make && cp ./bin/rawhash ../bin/ && cd .. | ||
|
||
#Cloning and compiling Sigmap | ||
git clone --recursive https://github.com/haowenz/sigmap.git sigmap && cd sigmap && make & cp sigmap ../bin/ && cd .. | ||
|
||
#Cloning and compiling UNCALLED | ||
git clone --recursive https://github.com/skovaka/UNCALLED.git uncalled && cd uncalled/submods/bwa && git pull origin master && cd ../../ && pip3 install . && cd .. | ||
|
||
#Downloading and compiling minimap2 | ||
wget https://github.com/lh3/minimap2/releases/download/v2.24/minimap2-2.24.tar.bz2; tar -xf minimap2-2.24.tar.bz2; rm minimap2-2.24.tar.bz2; mv minimap2-2.24 minimap2; cd minimap2 && make && cp minimap2 ../bin/ && cd .. | ||
|
||
#Step 2 Adding binaries to PATH | ||
#If you are skipping Step 0 and Step 1, uncomment the following line and execute: | ||
# conda activate rawhash-env | ||
export PATH=$PWD/bin:$PATH | ||
cd rawhash/test/ | ||
``` | ||
|
||
# Datasets | ||
|
||
The scripts and a [README](./data/README.md) can be found in the [data directory](./data/) to download the datasets. | ||
|
||
You can start the evaluation step after downloading the datasets as described in [README](./data/README.md). | ||
|
||
To start downloading the datasets, enter [data](./data/) | ||
|
||
```bash | ||
cd ./data | ||
``` | ||
|
||
# Evaluation | ||
|
||
## Read Mapping | ||
|
||
The scripts and a [README](./evaluation/read_mapping/README.md) can be found in the [read mapping directory](./evaluation/read_mapping/) to perform read mapping using UNCALLED, Sigmap and RawHash. | ||
|
||
To start performing the read mapping evaluations, enter [./evaluation/read_mapping](./evaluation/read_mapping/) | ||
|
||
```bash | ||
cd ./evaluation/read_mapping | ||
``` | ||
|
||
## Relative Abundance Estimation | ||
|
||
The scripts and a [README](./evaluation/relative_abundance/README.md) can be found in the [relative abundance directory](./evaluation/relative_abundance/) to perform the relative abundance estimation using UNCALLED, Sigmap and RawHash. | ||
|
||
To start performing the relative abundance estimation evaluations, enter [./evaluation/relative_abundance](./evaluation/relative_abundance/) | ||
|
||
```bash | ||
cd ./evaluation/relative_abundance | ||
``` | ||
|
||
## Contamination Analysis | ||
|
||
The scripts and a [README](./evaluation/contamination/README.md) can be found in the [contamination directory](./evaluation/contamination/) to perform the contamination analysis using UNCALLED, Sigmap and RawHash. | ||
|
||
To start performing the contamination analysis evaluations, enter [./evaluation/contamination](./evaluation/contamination/) | ||
|
||
```bash | ||
cd ./evaluation/contamination/ | ||
``` | ||
|
||
## Sequence Until | ||
|
||
The scripts and a [README](./evaluation/relative_abundance/sequenceuntil/README.md) can be found in the [sequence until directory](./evaluation/relative_abundance/sequenceuntil/) to evaluate the benefits of the Sequence Until mechanism using UNCALLED and RawHash. | ||
|
||
To start evaluating Sequence Until, enter [./evaluation/relative_abundance/sequenceuntil](./evaluation/relative_abundance/sequenceuntil/) | ||
|
||
```bash | ||
cd ./evaluation/relative_abundance/sequenceuntil | ||
``` | ||
|
||
# Reproducing the Manuscript | ||
|
||
## Generating the Figures | ||
|
||
We will provide the scripts to generate the figures we show in the preprint. | ||
|
||
## Generating the Manuscript | ||
|
||
We will provide the tex source to generate preprint PDF from its TeX source based on the results and figures you generate. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Datasets | ||
|
||
We assume your current directory points to this directory ([`data`](./)) | ||
|
||
Download and process each dataset so that they are fully ready to be used in the evaluation step: | ||
|
||
```bash | ||
bash download_d1_sars-cov-2_r94.sh #Download Dataset D1 SARS-CoV-2 | ||
bash download_d2_ecoli_r94.sh #Download Dataset D2 E. coli | ||
bash download_d3_yeast.sh #Download Dataset D3 Yeast | ||
bash download_d4_green_algae.sh #Downlaod Dataset D4 Green Algae | ||
bash download_d5_human_na12878.sh #Download Dataset D5 Human HG001 | ||
|
||
#Prepare the contamination dataset | ||
cd contamination && bash generate_fast5_files.sh && cd .. | ||
|
||
#Prepare the relative abundance dataset | ||
cd relative_abundance && bash generate_fast5_files.sh && bash generate_ref.sh && cd .. | ||
|
||
#Create a random community of randomly ordered reads from D1-D5 reads read_ids.txt | ||
cd random_community && bash 0_generate_random_ids.sh && bash 1_symbolic_links.sh && cd .. | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.