NAME

coverm contig - Calculate read coverage per-contig (version 0.7.0)

SYNOPSIS

coverm contig <MAPPING_INPUT> ..

DESCRIPTION

coverm contig calculates the coverage of a set of reads on a set of contigs.

This process can be undertaken in several ways, for instance by specifying BAM files or raw reads as input, using different mapping programs, thresholding read alignments, using different methods of calculating coverage and printing the calculated coverage in various formats.

The source code for CoverM is available at https://github.com/wwood/CoverM

READ MAPPING PARAMETERS

-1 PATH ..

Forward FASTA/Q file(s) for mapping. These may be gzipped or not.

-2 PATH ..

Reverse FASTA/Q file(s) for mapping. These may be gzipped or not.

-c, --coupled PATH ..

One or more pairs of forward and reverse possibly gzipped FASTA/Q files for mapping in order <sample1_R1.fq.gz> <sample1_R2.fq.gz> <sample2_R1.fq.gz> <sample2_R2.fq.gz> ..

--interleaved PATH ..

Interleaved FASTA/Q files(s) for mapping. These may be gzipped or not.

--single PATH ..

Unpaired FASTA/Q files(s) for mapping. These may be gzipped or not.

-b, --bam-files PATH

Path to BAM file(s). These must be reference sorted (e.g. with samtools sort) unless --sharded is specified, in which case they must be read name sorted (e.g. with samtools sort -n). When specified, no read mapping algorithm is undertaken.

REFERENCE

-r, --reference PATH

FASTA file of contigs e.g. concatenated genomes or metagenome assembly, or minimap2 index (with --minimap2-reference-is-index), strobealign index (with --strobealign-use-index), or BWA index stem (with -p bwa-mem/bwa-mem2). If multiple references FASTA files are provided and --sharded is specified, then reads will be mapped to references separately as sharded BAMs. [required unless -b/--bam-files is specified]

SHARDING

--sharded

If -b/--bam-files was used: Input BAM files are read-sorted alignments of a set of reads mapped to multiple reference contig sets. Choose the best hit for each read pair. Otherwise if mapping was carried out: Map reads to each reference, choosing the best hit for each pair. [default: not set]

MAPPING ALGORITHM OPTIONS

-p, --mapper NAME

Underlying mapping software used [default: minimap2-sr]. One of:
name description
minimap2-sr minimap2 with '-x sr' option
bwa-mem bwa mem using default parameters
bwa-mem2 bwa-mem2 using default parameters
minimap2-ont minimap2 with '-x map-ont' option
minimap2-pb minimap2 with '-x map-pb' option
minimap2-hifi minimap2 with '-x map-hifi' option
minimap2-no-preset minimap2 with no '-x' option
--minimap2-params PARAMS

Extra parameters to provide to minimap2, both indexing command (if used) and for mapping. Note that usage of this parameter has security implications if untrusted input is specified. '-a' is always specified to minimap2. [default: none]

--minimap2-reference-is-index

Treat reference as a minimap2 database, not as a FASTA file. [default: not set]

--bwa-params PARAMS

Extra parameters to provide to BWA or BWA-MEM2. Note that usage of this parameter has security implications if untrusted input is specified. [default: none]

--strobealign-params PARAMS

Extra parameters to provide to strobealign. Note that usage of this parameter has security implications if untrusted input is specified. [default: none]

--strobealign-use-index

Use a pregenerated index (one that has been created with 'strobealign --create-index'). The --reference option should be specified as the original FASTA file i.e. 'ref.fna' not 'ref.fna.r100.sti' [default: not set]

ALIGNMENT THRESHOLDING

--min-read-aligned-length INT

Exclude reads with smaller numbers of aligned bases. [default: 0]

--min-read-percent-identity FLOAT

Exclude reads by overall percent identity e.g. 95 for 95%. [default: 0]

--min-read-aligned-percent FLOAT

Exclude reads by percent aligned bases e.g. 95 means 95% of the read's bases must be aligned. [default: 0]

--min-read-aligned-length-pair INT

Exclude pairs with smaller numbers of aligned bases. Implies --proper-pairs-only. [default: 0]

--min-read-percent-identity-pair FLOAT

Exclude pairs by overall percent identity e.g. 95 for 95%. Implies --proper-pairs-only. [default: 0]

--min-read-aligned-percent-pair FLOAT

Exclude reads by percent aligned bases e.g. 95 means 95% of the read's bases must be aligned. Implies --proper-pairs-only. [default: 0]

--proper-pairs-only

Require reads to be mapped as proper pairs. [default: not set]

--exclude-supplementary

Exclude supplementary alignments. [default: not set]

--include-secondary

Include secondary alignments. [default: not set]

COVERAGE CALCULATION OPTIONS

-m, --methods METHOD

Method(s) for calculating coverage [default: mean]. A more thorough description of the different methods is available at https://github.com/wwood/CoverM\#calculation-methods but briefly:
method description
mean (default) Average number of aligned reads overlapping each position on the contig
trimmed_mean Average number of aligned reads overlapping each position after removing the most deeply and shallow-ly covered positions. See --trim-min/--trim-max to adjust.
coverage_histogram Histogram of coverage depths
covered_bases Number of bases covered by 1 or more reads
variance Variance of coverage depths
length Length of each contig in base pairs
count Number of reads aligned to each contig. Note that supplementary alignments are not counted.
metabat ("MetaBAT adjusted coverage") Coverage as defined in Kang et al 2015 https://doi.org/10.7717/peerj.1165
reads_per_base Number of reads aligned divided by the length of the contig
rpkm Reads mapped per kilobase of contig, per million mapped reads
tpm Transcripts Per Million as described in Li et al 2010 https://doi.org/10.1093/bioinformatics/btp692
--min-covered-fraction FRACTION

Contigs with less covered bases than this are reported as having zero coverage. [default: 0]

--contig-end-exclusion INT

Exclude bases at the ends of reference sequences from calculation [default: 75]

--trim-min FRACTION

Remove this smallest fraction of positions when calculating trimmed_mean [default: 5]

--trim-max FRACTION

Maximum fraction for trimmed_mean calculations [default: 95]

OUTPUT

-o, --output-file FILE

Output coverage values to this file, or '-' for STDOUT. [default: output to STDOUT]

--output-format FORMAT

Shape of output: 'sparse' for long format, 'dense' for species-by-site. [default: dense]

--no-zeros

Omit printing of genomes that have zero coverage. [default: not set]

--bam-file-cache-directory DIRECTORY

Output BAM files generated during alignment to this directory. The directory may or may not exist. Note that BAM files in this directory contain all mappings, including those that later are excluded by alignment thresholding (e.g. --min-read-percent-identity) or genome-wise thresholding (e.g. --min-covered-fraction). [default: not used]

--discard-unmapped

Exclude unmapped reads from cached BAM files. [default: not set]

GENERAL OPTIONS

-t, --threads INT

Number of threads for mapping, sorting and reading. [default: 1]

-h, --help

Output a short usage message. [default: not set]

--full-help

Output a full help message and display in 'man'. [default: not set]

--full-help-roff

Output a full help message in raw ROFF format for conversion to other formats. [default: not set]

-v, --verbose

Print extra debugging information. [default: not set]

-q, --quiet

Unless there is an error, do not print log messages. [default: not set]

FREQUENTLY ASKED QUESTIONS (FAQ)

Can the temporary directory used be changed? CoverM makes use of the system temporary directory (often /tmp) to store intermediate files. This can cause problems if the amount of storage available there is small or used by many programs. To fix, set the TMPDIR environment variable e.g. to set it to use the current directory: TMPDIR=. coverm genome <etc>

For thresholding arguments e.g. \-\-dereplication\-ani and \-\-min\-read\-percent\-identity, should a percentage (e.g 97%) or fraction (e.g. 0.97) be specified? Either is fine, CoverM determines which is being used by virtue of being less than or greater than 1.

EXIT STATUS

0

Successful program execution.

1

Unsuccessful program execution.

101

The program panicked.

EXAMPLES

Calculate mean coverage from reads and assembly

$ coverm contig --coupled read1.fastq.gz read2.fastq.gz --reference assembly.fna

Calculate MetaBAT adjusted coverage from a sorted BAM file, saving the unfiltered BAM files in the saved_bam_files folder

$ coverm contig --method metabat --bam-files my.bam --bam-file-cache-directory saved_bam_files

AUTHOR

Ben J. Woodcroft, Centre for Microbiome Research, School of Biomedical Sciences, Faculty of Health, Queensland University of Technology <benjwoodcroft near gmail.com>