Skip to content

Map Pacbio methylation data to genome graphs.

Notifications You must be signed in to change notification settings

cgroza/panmethyl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Panmethyl

Panmethyl maps methylation data from long-reads to pangenomes.

Inputs

It takes the following inputs:

  • --out - directory in which panmethyl will write the output files.
  • --bams - CSV file listing the BAM files to be mapped to the pangenome.

The format of this CSV file is:

sample,path
name1,path/to/bam1
name2,path/to/bam2

These BAM files must be annotated with the appropriate methylation information. For example, the location of modified bases must be encoded in the MM tag and the likelihoods of methylation must be encoded in the ML tag.

  • --graph - Pangenome in rGFA format. Currently, only graphs created with minigraph are supported, but can be extended to all graphs by replacing minigraph with other graph aligners.

Outputs

Panmethyl outputs a .graphMethylaion plain text file for each entry in --bams. This file is a CSV file listing the graph node, the position of the modified base, its strand, the coverage on the modified base, and the average methylation level, encoded on a scale from 0 to 255 (as in the ML tag).

Geeneral steps in the pipeline

  1. Index the position of every CpG dinucleotide in the input graph (bin/index_cpg.py).
  2. Convert BAM file to FASTQ (samtools).
  3. Annotate reads in FASTQ with methylation information from the BAM (tagtobed).
  4. Map the FASTQ to the pangenome with minigraph.
  5. Lift the methylation annotation from the reads to the graph (bin/lift_5mC.py).
  6. Count the average methylation level of CpGs (bin/nodes_methylation.py).

About

Map Pacbio methylation data to genome graphs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published