Viral Open Reading Frame assembly (VORF)

Pandemics due to novel pathogens are an imminent threat, yet few methods for their speedy characterization exist. Current assemblies fall short of identifying novel sequences from environmental samples in an organism-agnostic manner.

VORF is an RNA-seq-based open reading frame assembler capable of detecting amino acid sequences from clinical samples without a reference genome. Our paper describing the methods and results is available here.

Datasets

We applied our tool as well as the assembler called IVA to nine samples across three viruses: COVID-19, HIV and Lassa. All these are RNA viruses and were downloaded from the VGEA manuscript. An example dataset from COVID-19 is available in our repository here. A summary of all nine samples is available here.

Processing of raw reads

The data we downloaded from VGEA contained BAM files with unaligned reads. We first split BAM files into forward and reverse fastq files. We then ran some QC on the reads, aligned them to the human reference and kept only those that did not align. All this processing is summarized in the following script: full_processing_pipeline.sh. All processing scripts were run one at a time on all BAM files in parallel.

Assembly algorithm

The assembly algorithm was written in python and can be run using the main.py file. The methods used for assembly are stored in the utils.py file.

To run VORF, the following packages need to be installed (can easily be done with conda): scikit-learn, biopython and levenshtein with a python 3 environment.

#enter directory with fastq input file 
r1=/home/keren/ANALYSIS/VORF/data/CV167_1.fastq
r2=/home/keren/ANALYSIS/VORF/data/CV167_2.fastq

#run VORF 
script=/home/keren/ANALYSIS/VORF/main.py
python $script $r1 $r2

Below, an example of an output is shown for the above samples.

>0 <unknown description>
MVFLVCHGLVSCNNPLAITVLYPHQCLARGTT
>10 <unknown description>
MARGLLQLTNPWQTKNTIKWFLGRQTAGANVRCQEGNNPDRQLRSQMID
>11 <unknown description>
MVWILLLSDMDLSTHALTPWILLNVHSEFVWT
>12 <unknown description>
MASVTSKNTTKARKRSTLLTINVPVSSETNEYISSYSSACAYKGTLVVVVGSS
>13 <unknown description>
MHSTATPRIPPTSSTLKPASINGNLGVKLLDFTADLTGRLRTL
>14 <unknown description>
MATSSFITVLTKKNLAVSHWYFAHMRDKDTKCLTTTTVLNAFEFCYQLNHNMTMRCSSSI

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
algorithm		algorithm
analysis		analysis
assemblers		assemblers
data		data
processing		processing
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Viral Open Reading Frame assembly (VORF)

Datasets

Processing of raw reads

Assembly algorithm

About

Releases

Packages

Contributors 2

Languages

karini925/VORF

Folders and files

Latest commit

History

Repository files navigation

Viral Open Reading Frame assembly (VORF)

Datasets

Processing of raw reads

Assembly algorithm

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages