wham/utils at master · zeeev/wham

History

Name		Name	Last commit message	Last commit date
parent directory ..
README		README
Snakefile		Snakefile
classify_WHAM_vcf.py		classify_WHAM_vcf.py
dellyToBedPe.pl		dellyToBedPe.pl
deltaAfWham		deltaAfWham
filtWhamG.pl		filtWhamG.pl
gvfToBedPe.pl		gvfToBedPe.pl
whamToBed.pl		whamToBed.pl
whamToBedPe.pl		whamToBedPe.pl

README

The following README contains information on the classifier_parse.py

Developed by EJ Osborne and Z. Kronenberg

	  [email protected]; [email protected]

##############
USAGE: 
##############

For help:

    python classify_WHAM_vcf.py -h


Runs RandomForest classifier on WHAM output VCF files to classify structural
variant type. Appends WC and WP flags for user to explore structural variant
calls. The output is a VCF file written to standard out. The optional --filter
flag will aid in providing results of higher sensitivity of specificity. 
Leaving the option out returns SV calls for all of the data. 

positional arguments:
  VCF              User supplied VCF with WHAM variants; VCF needs AT field
                   data
  training_matrix  training dataset for classifier derived from simulated read
                   dataset

optional arguments:
  -h, --help       show this help message and exit
  --filter FILTER  optional arg for filtering type one of : ['sensitive',
                   'specific']; defaults to output all data if filtering if
                   argument is not supplied.

Typical usage:

  python classify_WHAM_vcf.py [VCF file] [training dataset] --filter [sensitive/specific]
  	 #In this default mode, the new VCF file with SV calls will be written
	 #standard out
  python classify_WHAM_vcf.py [VCF file] [training dataset] > [output VCF]
  	 # the standard out can be re-directed to an output VCF file with the
	 # '>' sign as above
  python classify_WHAM_vcf.py X.vcf WHAM_training_data.txt --filter sensitive > out.sensitive.vcf

#######
OUTPUT:
#######

The new output VCF file has two new FIELDS appended to the VCF file, WC and WP
which stand for:

WC - "WHAM CALL" : SV TYPE
WP - "WHAM PROBABILITY" : PROBABILITIES FROM RANDOM FOREST MODEL FOR EACH
   - IMPLEMENTED SV TYPE. CHECK DOCS FOR DETAILS ON WHICH TYPES ARE CURRENTLY
   - IMPLEMENTED. 


##############
DEPENDENCIES:
##############

Your python distirbution will need to run scikit-learn for the RandomForest 
modeling. Information can be obtained here: 

http:https://scikit-learn.org/stable/

All code was developed on Python2.7 Anaconda distribution:

http:https://continuum.io/downloads


##############
TRAINING DATA:
##############
We supply a training dataset deirved form simulated read data. The file,
with its supplied md5sum is:

     cb30db2b8dc0c6b2693a6a1595855272  WHAM_training_data.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utils

utils

README

Files

utils

Directory actions

More options

Directory actions

More options

Latest commit

History

utils

Folders and files

parent directory

README