Skip to content

Latest commit

 

History

History
144 lines (101 loc) · 4.86 KB

README.md

File metadata and controls

144 lines (101 loc) · 4.86 KB

EpiJinn logo

EpiJinn

version build

Work in progress!

EpiJinn is a Python package for working with modified (methylated) nucleotides.

  • Create a readable report from bedMethyl files created by modkit.
  • Annotate prokaryotic DNA methylase enzyme recognition sites in a Biopython SeqRecord.
  • Check whether recognition sites of prokaryotic DNA methylase enzymes overlap with a recognition site of a restriction enzyme, in a DNA sequence. Methylation of restriction site nucleotides blocks recognition/restriction and thus DNA assembly.

The software is geared towards working with plasmids. Several more future functionalities are planned: comparing methylation status with expected methylation levels; methylase site recognition etc.

Install

pip install git+https://github.com/Edinburgh-Genome-Foundry/EpiJinn.git

See additional install instructions for the PDF Reports dependency, and its Weasyprint dependency.

Usage

bedMethyl files

import epijinn
bedmethylitemgroup = epijinn.read_sample_sheet(
    sample_sheet="sample_sheet.csv",
    genbank_dir='genbank',
    bedmethyl_dir='bedmethyl',
    parameter_sheet='param_sheet.csv',)
bedmethylitemgroup.perform_all_analysis_in_bedmethylitemgroup()
epijinn.write_bedmethylitemgroup_report(bedmethylitemgroup=bedmethylitemgroup, pdf_file="report.pdf", html_file="report.html")

Both pdf_file and html_file are optional, specify None to exclude either of them. An example sample sheet and parameter sheet is included in the examples directory. Note that multiple methylase enzymes (separated by space) can be specified in the parameter sheet.

Recognition site overlap

The module contains the Methylator class for storing a sequence, methylation enzymes and a restriction enzyme recognition site. It has a method for finding overlaps, and uses DNA Chisel to find sequence matches.

An example overlap:

...ccgcatgaagggcgcgccaggtctcaccctgaattcgcg...
                      ggtctc    : BsaI restriction site
                   CCAGGTCTCACC : Match in positive strand
                   CCWGG        : Dcm methylation site
                    *           : methylated cytosine
                      *         : methylated cytosine (on other strand)

For information on the effect of DNA methylation on each enzyme, see the Restriction Enzyme Database.

import epijinn
methylator = epijinn.Methylator(sequence=str(sequence), site=site_BsaI)
methyl.find_methylation_sites_in_pattern()

Example

import epijinn
import Bio

sequence = 'ATGTCCCCATGCCTAC' + 'AGCAAGGC' + 'CGTCTC' + 'A' + 'GGCCCCCCCCCCCCA'  # seq + EcoBI (+ BsmBI +) EcoBI + seq

rest_dict = Bio.Restriction.Restriction_Dictionary.rest_dict
site_BsmBI = rest_dict['BsmBI']['site']

epijinn.EcoBI.sequence
# 'TGANNNNNNNNTGCT'
methylator = epijinn.Methylator(sequence, site=site_BsmBI)
methylator.find_methylation_sites_in_pattern()
print(methylator.report)

Result:

Matches against methylase enzyme sites:

EcoKDam
=======
Region: 22-32(+)
Positive strand: -
Negative strand: -


EcoKDcm
=======
Region: 21-33(+)
Positive strand: -
Negative strand: -


EcoBI
=====
Region: 13-42(+)
Positive strand: -
Match in negative strand: TACAGCAAATCCGTCTCAGGCCCCCCCCC


EcoKI
=====
Region: 14-41(+)
Positive strand: -
Negative strand: -

DNA sulfur modification

The same approach can be used for finding enzyme site overlaps with other epigenetic modifications. For example, in DNA phosphorothioation, an oxygen on the DNA backbone is replaced with sulfur.

thio = epijinn.Methylator(sequence, site=site_BsmBI, methylases=epijinn.DND)
thio.find_methylation_sites_in_pattern()

This returns an overlap with a putative dnd target site of Streptomyces lividans 1326 with conserved sequence GGCC:

Dnd_Sli1326
===========
Region: 21-33(+)
Match in positive strand: GGCCGTCTCAGG
Match in negative strand: GGCCGTCTCAGG

Versioning

EpiJinn uses the semantic versioning scheme.

License = GPLv3+

Copyright 2024 Edinburgh Genome Foundry, University of Edinburgh

EpiJinn was written at the Edinburgh Genome Foundry by Peter Vegh, and is released under the GPLv3 license.