-
Notifications
You must be signed in to change notification settings - Fork 1
fzyan/MetaCSST
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
############################################################################################### ##Package: Metagenomic Complex Sequence Scanning Tool (MetaCSST) ## ##Developer: Fazhe Yan ## ##Email: [email protected] ; [email protected] ## ##Department: Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University ## ############################################################################################### ################## ## Introduction ## ################## Metagenomic Complex Sequencing Scanning Tool (MetaCSST) is a tool to predict DGRs in sequenced genomes as well as metagenomic datasets. It is based on Generalized Hidden Markov Model (GHMM), using motif patterns to identify the elements in DGRs. ############### ## Copyright ## ############### This software is free for personal, academic and non-profit use from https://github.com/fzyan/MetaCSST (GitHub website) For commercial users, please contact <[email protected]>. ######################### ## System requirements ## ######################### Linux operation system, memory 2G to use multiple threads. Perl 5.8.5 or up and gcc version 4.1.2 or up. ########### ## Usage ## ########### 1>Identify sub structures (TR, VR or RT) in DGRs: ./MetaCSSTsub -build TR.config -in $fa [-out $out_dir] [-thread $thread] or ./MetaCSSTsub -build VR.config -in $fa [-out $out_dir] [-thread $thread] or ./MetaCSSTsub -build RT.config -in $fa [-out $out_dir] [-thread $thread] # $fa : input file in FASTA format (Maybe a pretreatment is in need: ./src/chomp.pl $input ) # $out_dir : output directory. If not given, the default out directory will be "out_metacsst" # $thread : thread number, default 1 2>DGR prediction Step1: ./MetaCSSTmain -build arg.config -in $fa [-out $out_dir1] [-thread $thread] #Identification of the sub structures using GHMM Step2: perl src/callVR.pl $out_dir1/raw.gtf $fa $out-tmp #calling VRs according to the identified TRs Step3: perl src/removeRepeat.pl $out-tmp $out-DGR #remove identical TR-VR pairs generated by callVR.pl ############### ## OUT files ## ############### 1>Identify sub structures (TR, VR or RT) in DGRs: out_dir/out.txt : Identified sub structures #The input fasta sequences are followed by the elements found in the sequences #File format example: >gi|377805758|gb|JQ680349.1| ##ID CCCACAGTGCGTGTATGAT......GATTAATACAGAATTACTACG ##sequence Score:6.57 + matchSeq(31631-31680):CTATCTTTGGGATATTCTATAGTTCTAGCTATAACATCAATTCCACCAAC ##element1 Score:62.73 - matchSeq(39481-39544):AACAACAGCTGGAACGTGAACTTTAGTAATGGCAACTTCAACAACAACAACAAGTACAACAGTA ##element2 #For each identified element, the format: Score:($score) $string matchSeq(start-end):sequence of this element out_dir/align.txt : count matirx for each position, used to build PWMs out_dir/score.txt : PWMs (scoring matrices) 2>DGR prediction Step1: out_tmp1/raw.gtf : TRs and RTs identified. #ID element score string start end sequence out_tmp1/align.txt : count matirx for each position, used to build PWMs out_tmp1/score.txt : PWMs (scoring matrices) Step2: out_tmp2.txt : TRs are followed by paired VRs #ID TR string original_start original_end start end A-to-N-substitutions Non-A-to-N-substitutions sequence #ID VR string * * start end A-to-N-substitutions Non-A-to-N-substitutions sequence #ID RT string start end sequence Step3: out-DGR.gtf : The file format is the same as out_tmp2.txt ########### ## Files ## ########### |-MetaCSSTmain executable program to predict DGRs |-MetaCSSTsub executable program to identify TRs, VRs or RTs |-arg(/TR/VR/RT).config config files in the GHMM |-align/*align align matrix used to develop the GHMM |-main.cpp source code to build MetaCSSTmain |-sub.cpp source code to build MetaCSSTsub |-ghmm.h & fun.h some functions, structures and objects |-callVR.pl used to search VRs according to the raw GTF file generated by MetaCSSTmain |-removeRepeat.pl remove identical TR-VR pairs generated by callVR.pl |-chomp.pl preprocess the input file in FASTA format |-addition/merged* collected DGRs |-addition/training training set |-addition/test test set |-addition/classify classification of TRs/VRs/RTs, generated by MUSCLE |-example and example.sh a example to identify DGRs |-callORF.pl script to call Open Reading Frames |-coden.txt coden table uesd to call ORFs ################## ## Installation ## ################## MetaCSSTmain and MetaCSSTsub are executable programs. If you want to modify the codes and recompile: g++ -lpthread src/main.cpp -o MetaCSSTmain_new g++ -lpthread src/sub.cpp -o MetaCSSTsub_new ############# ## Contact ## ############# If you have any questions, feel free to contact us: [email protected] [email protected]
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published