Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are all CpG reported ? #192

Open
OceaneMion opened this issue May 28, 2024 · 7 comments
Open

Are all CpG reported ? #192

OceaneMion opened this issue May 28, 2024 · 7 comments
Labels
question Further information is requested

Comments

@OceaneMion
Copy link

Hi I would like to know if all CpG are reported in the modkit pileup if we specify no-filtering ? Even those that are not methylated ?

@ArtRand
Copy link
Contributor

ArtRand commented May 28, 2024

Hello @OceaneMion,

Positions with $N_{\text{valid}}$ will not get bedMethyl records, so if you have set the --no-filtering flag, CpGs without any read coverage will not be reported, as documented. One thing to keep in mind depending on your experiment, assuming that it is modified or canonical when it does not have any read coverage will probably lead to incorrect results. In fact, off the top of my head I'm having a hard time coming up with a situation where this kind of imputation would make sense.

However if you want to make a BED table with all CpGs, you can use modkit motif-bed and bedtools.

$ modkit pileup ${mod_bam} ${bedmethyl_cpg} --cpg --ref ${ref} --no-filtering 
$ modkit modif-bed ${ref} CG 0 > cpgs.bed
$ bedtools intersect -a cpgs.bed -b ${bedmethyl_cpg} -loj -wb

@ArtRand ArtRand added the question Further information is requested label May 28, 2024
@OceaneMion
Copy link
Author

Thanks a lot !
I think it is Indeed better to keep only the cpg with coverage.
What I would like to know also is how to get a table with the number of potentially methylated site with 5hmC, then the one with 5mC, and the total number of cpg that have coverage, at the level of the assembly and not reads ? Because I only see it at the level of reads

@ArtRand
Copy link
Contributor

ArtRand commented May 28, 2024

Hello @OceaneMion,

What I would like to know also is how to get a table with the number of potentially methylated site with 5hmC, then the one with 5mC, and the total number of cpg that have coverage, at the level of the assembly and not reads?

These are the steps

# step 1: create a pileup
$ modkit pileup ${mod_bam} ${bedmethyl_cpg} --cpg --ref ${ref}`

# step 2: separate 5mC from 5hmC
$ awk '$4=="m"' ${bedmethyl_cpg} > ${pileup_5mC}
$ awk '$4=="h"' ${bedmethyl_cpg} > ${pileup_5hmC}

# alternate step 2, if you want only sites with >0% modification
$ awk '($4=="m") && ($11>0.0)' ${bedmethyl_cpg} > ${pileup_5mC_only_mod}
$ awk '($4=="h") && ($11>0.0)' ${bedmethyl_cpg} > ${pileup_5hmC_only_mod}

# step 3: calculate total number of sites with coverage
$ awk '$4=="m"' ${bedmethyl_cpg} | wc -l 
# this should be the same number as
$ awk '$4=="h"' ${bedmethyl_cpg} | wc -l 

@OceaneMion
Copy link
Author

Thanks for your awnser yes if I do not put a threshold therefore I will have the same number of 5hmC and 5mC here, which is then more accurate to use reads for the global methylation level and not the assembly.
I have another question, what kind of statistical test will you use to see if methylation at CpG sites is random or not, do you think Kolmogorov-Smirnov is appropriated ?

@ArtRand
Copy link
Contributor

ArtRand commented Jun 5, 2024

@OceaneMion

have another question, what kind of statistical test will you use to see if methylation at CpG sites is random or not, do you think Kolmogorov-Smirnov is appropriated ?

Do you mean you want a test to see if the methylation at a CpG is not uniform over the potential classes? I.e. P(canonical) = P(5hmC) = P(5mC) = 1/3?

@selmapichot
Copy link

Hi,
I have a kind of similar question, on the bedmethyl file, I have reads that have both MC and hMC at 0. So, are all CpG (even not methylated ones) reported ? I am using nextflow human_variation workflow with the -mod flag.
Many thanks.

@ArtRand
Copy link
Contributor

ArtRand commented Jun 25, 2024

Hello @selmapichot,

Sorry I missed this question. All CpGs that have modified base information (including canonical calls) will be included in the bedMethyl as long as they pass the pass threshold. If all of the calls fail to pass this threshold there won't be a bedMethyl record. You can bypass filtering with --no-filtering however the accuracy will be lower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants