-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarifications on DMR Analysis #191
Comments
Also, is there any threshold involved in map-pvalue or effect size to be considered for the significant dmr |
Hello @Proy321, The score and MAP-based p-value are calculated using the
If you are looking for generally differentially modified sites, I would use the MAP-based p-value. You can decide on a threshold depending on your experiment. I would sanity check your results with the effect size. If you want to discover changes in methylation accounting for changes in type (i.e. 5mC vs 5hmC) I would use the score. The score is quite good at ranking regions and sites by how different they are. The details for the calculations is in the documentation. If you want to find differentially methylated regions while accounting for different modification types, I'd recommend using the
No, |
Thanks for your response. Thanks&Regards |
Hello @Proy321,
Yes, as documented, use can pass Currently there isn't a way to hone into 5mC<>5hmC differential methylation, but I'm planning on adding this in a future release. As I mentioned in the previous comment, the easiest way to find sites where you're seeing changes in 5hmC will be to parse the
You don't need to provide a BED file, a BED file will be produced with the segment labels. You should analyze this file the same way as the single-site one, find the "different" regions that have large changes in 5hmC.
A negative effect size only means that the change is a reduction in modification levels with respect to the |
Thank you so much for your response. Thanks |
Hello @ArtRand Thanks |
Hello @ArtRand I believe understanding the relationship between the 'state-name' and 'score' columns is crucial for accurately interpreting the results of my analysis. It would be nice to have your inputs for the same. Thanks |
Hello @ArtRand Thanks |
Hello @Proy321, To answer your first question regarding the interpretation of the DMR output. Both positions (8441 and 8443) are the same. In sample The effect size is then the percent modified in The MAP-based p-value is below the often used 0.05 number, however in my experiments at least, I don't use any positions with at least 5 reads of valid coverage for both samples. To your second question, the score is, unfortunately, somewhat correlated with the number of potentially modified positions (CpGs in this case I believe) in the region. So large regions of "same" will have higher scores. All of the effect sizes in the table you've shown are very small, so I would not interpret a high score as evidence of any substantial change in methylation. If you want to find regions where there is evidence for differential 5hmC, I would filter to just the "different" records and look for large changes in percent 5hmC (which should also have high score). In the next release the "score" column in the DMR output will be replaced with a probability of being in that state and there will be a "between modification" effect size that will make this analysis a little easier. |
Thank you so much for your response @ArtRand Column 'a' with 5 reads showing 0% modification Thank you for your assistance and clarification on this matter. Thanks |
Hello @PRIYANKA-22091995, If you have a site with 0 of 5 reads reporting modification (all 5 report canonical) in one condition and 7 of 7 reporting methylation (of any type) in another condition. The MAP-based p-value will be 0.00024, meaning the (posterior) probability of zero effect divided by the observed effect ( The next release will have a simple function that will calculate the MAP-based p-value for a given number of modified/canonical reads so you can explore what posterior probabilities look like. |
Thank you for your response. Thanks |
I'm also looking for 5hmC-associated-DMRs. In this regard, the modbam2bed tool offers an option (-m) to target specific modified base types. For eg, if my focus is on detecting 5hmC-related DMRs, can I reliably use the following code as recommended in the ONT community documentation (https://community.nanoporetech.com/docs/prepare/library_prep_protocols/ligation-sequencing-gdna-rrms/v/rrms_9164_v110_revd_30may2022/downstream-analysis?devices=gridion#faq-tab=). It extracts 5hmC data (or any modified base), which can then be analyzed further using tools such as DSS or similar packages. Below is the code:
While the code mentioned is intended for processing RRMS data, it seems adaptable for other forms of Nanopore data. I understand the differences between using modbam2bed and modkit's pile-up function, as highlighted in discussions (issue #14 and #2). However, working with the 5hmC data appears straightforward using this, assuming accuracy. I compared both bed file outputs and found that most columns contain identical values, except for the read coverage from modbam2bed and Nvalidcov from modkit. This difference is expected since each of them have different calculation method. My question is whether it's possible to exclusively extract a modified base of interest? If not, is it feasible to incorporate this functionality into modkit dmr or during the creation of the bedmethyl file using pileup? Thanks! |
Hello @Manoswini-02, If you only want bedmethyl rows with 5hmC records, you can filter the output through $ modkit pileup ${mod_bam} stdout [options] | awk '$4=="h"' > 5hmC.bedmethyl However, if you do this the output will not be suitable for |
Hello @ArtRand
I would like to confirm that the scores obtained in the DMR are indeed calculated based on the last two columns, i.e., pct_a_samples and pct_b_samples.
Specifically, I would also like to understand the roles of the map-p value and effect size in this process. Could you please elaborate on the significance of these parameters and advise on whether we should prioritize the score, map-p value, or effect size when filtering out the most significant DMR regions?
Also. i want to specifically identify dmr in 5hmC, so during pileup, should I include a flag with "--ignore C" to focus solely on identifying DMR genes associated with 5hmc?
Your insights and guidance on these matters would be greatly appreciated as i strive to conduct a thorough and accurate analysis of our DMR data.
Looking forward to your response.
Thanks
The text was updated successfully, but these errors were encountered: