CNV analysis based on the depth of coverage for gene panels and exomes from Miseq Reporter, Local Run Manager data or samtools (see below)
MobiCNV is a simple python script, which runs on python > 2.5 or 3 (tested on 2.7 and 3.5 and above)
$cd /where/you/want/to/install
$git clone https://github.com/mobidic/MobiCNV
$cd MobiCNV
$pip install -r requirements.txt
If you successfully ran the requirements file (see above), you don't need the following, otherwise:
MobiCNV requires the xlxswriter , numpy and PyVCF python modules:
$pip install xlsxwriter numpy PyVCF
MobiCNVcomputes as primary inputs csv/tsv files that are generated by the DNA enrichments workflows of MiSeq Reporter or Local Run Manager. You can also generate input files with a slightly modified samtools command (see below).
In addition and in order to reduce the false positives rate, MobiCNV can handle VCF files and basically searches for heterozygous variants in regions suspected to harbor heterozygous deletions. For X chr, the same rule applies for females.
Your input directory will be the Data/Intensities/BaseCalls/AlignmentX of your run of interest, X being '' or 2, 3... if you relaunch the analysis. The actual targets are the Sample.coverage.csv files.
LRM produces inside the run root a directory called 'Alignment_X', with X being 1, 2, 3 depending on the number of times you ran the analysis. Inside Alignment_X, you will find a subfolder which atually contains the data. This subfolder is your input directory. As for MSR, the actual targets are the Sample.coverage.csv files
Nenufaar & MobiDL panelCapture workflows
Nenufaar and mobiDL panelCpature workflows produce tsv files that can be directly used by MobiCNV. Your input directory is the MobiCNVtsvs directory located in your output directory of the nenufaar/mobiDL analysis.
If you use another commercial pipeline, or, better, a custom pipeline, you just need the BAM files to produce the coverage files with this command:
$samtools bedcov -Q 30 Intervals.bed sample.bam | sort -k1,1 -k2,2n -k3,3n | awk 'BEGIN {OFS="\t"}{a=($3-$2+1);b=($7/a);print $1,$2,$3,$4,b,"+","+"}' > sample_coverage.tsv
Please note that in the command above, the awk part b=($7/a) should be changed depending on your original BED file. The rule is simple: if your original BED has 4 columns, then you should replace $7 with $5, one column more than the number of columns present in the original BED (count beginning at 1).
Intervals.bed is your ROI file. Please note it is really better to annotate your ROIs with gene names and exons (4th column) in order to interpret your data. It is also mandatory if you want to use the gene panel option. To help with this task, you might want to have a look at MobiBedAnnotator.
Put all your coverage files in a single directory which you can provide as input for MobiCNV.
The coverage files must be named following this template:
sampleID[._]coverage.(tsv|csv)
Example: RS2412.coverage.csv or RS2412_coverage.tsv...
The sampleID might be important as this is the string MobiCNV will use to match samples in the VCF files if provided (recommanded).
If your Intervals bed file or Illumina manifest was annotated at least with the gene names, you can use the gene panel option, which will generate a supplementary worksheet focusing on the genes of interest. Simply provide a txt file with the gene names (one per line).
Either tsv or csv input files.
You can optionally provide a path to a directory containing your sample's VCFs. These can be one VCF per sample, or a merged VCF for all samples, or a mix. You are not required to provide a VCF for all samples. They can be bgzipped or not. Basically, MobiCNV will check all VCF files until it finds the current sample. MobiCNV will then store all heterozygous PASS calls in order to reduce the false positive rate of heterozygous deletions and duplications. Great Idea from AnnotSV.
$python MobiCNV.py -i path/to/coverage/files/directory/ -t [tsv|csv] [-p path/to/gene/panel/file.txt -o output_file.xlsx -v path/to/vcfs/directory/]
You will soon get an Excel spreadsheet, with one summary worksheet showing remarkable events and gender predictions (if ROIs include the X chr), another worksheet for autosomes, optional supplementary worksheets for sexual chromosomes and gene panels. In addition, calls in low coverage regions are stored in a separate sheet. The fifth column of each sheet shows the mean coverage for all samples for the considered ROI, with the following 3 colors traffic lights code:
-green: mean >= 100 X
-orange: 50X <= mean < 100X
-red: mean < 50X
MobiCNV works with hg19 or hg38 data, even with hg18 in theory.
Montpellier Bioinformatique pour le Diagnostique Clinique (MoBiDiC)
CHU de Montpellier
France