Skip to content

ricardoi/virDB

Repository files navigation

vinaDB - Virome Network Analysis DataBase

A program to retrieve the Reference Sequence datasets for plant, insect and fungal viruses.
Keep in mind this is a beta version, bugs are expected. Program tested only in Mac and Linux OS.


Installation

First, you need to install R (version >4.0) if you don't have it.
You can use on mac brew or in linux apt get to install R.
Then, install the R libraries execute the sh installRpkg.sh install command and it will install the packages necessary to run virDB-launcher.sh.

Running

You need to exectute virDB-launcher.sh in the command line and with an stable internet connection - otherwise, NCBI fetcher will be interrupted, and the process needs to starts again.

#You have 5 options (select a number)
#1: plants
#2: land plants
#3: invertebrates,land plants
#4: invertebrates
#5: fungi

This is an example to run virDB-launcher:

sh virDB-launcher.sh 1 # to retrieve option 1: plants

BLAST

Preparing BLAST databases

makeblastdb -in Viral_nucleic_acid_database -out virusDB -dbtype nucl -hash_index

BLASTn and BLASTx

# blast nucleotides
blastn -query contigs.fasta -db virDBnt -out contigs_blastn.tsv  -num_threads 4 
# blast nucleotides x proteins
blastx -query contigs.fasta -db virDBaa -out  contigs_blastx.tsv -num_threads 4  

Note:

The NCBI entries for plant hosts were splited as plants, land plants, and viruses that infect plants and are vectored by insects as invertebrates,landplants, because this has to do with NCBI indexing. You need to download plants, land plants and invertebrates,landplants to have the FULL plants database. After downloading all the databases, you can merge your fasta files with cat plants_722.fasta, land_plants_722.fasta invertebrates,landplants_722.fasta

About

virus database beta release

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published