-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison between modkit
and modbam2bed
.
#14
Comments
Next we tested one human sample, I restrict the analysis only on chr1 to save time. However, here there is 6% difference between the methylation level estimated by two methods, with For
|
I am not sure what causing such differences. As I would expect that when using the same BAM file with same setting, I should obtain similar results. What do you think is the possible reason for this differences? And inconsistency betweeen two datasets? Many thanks. |
@qiangfu-eu could you tell me which versions of modkit and modbam2bed you're using? Also let me know if you see anything unusual in the modkit log? Also if you could run |
The logs for puc19 sample. Note that
The log for human sample, and row count of bed files.
I actually did not expect row count to be so different between two methods on human data, as I am focusing on CpG only and using the same BAM file. |
I rechecked
|
@qiangfu-eu one more quick check for me if you would: run the pUC19 BAM with modkit using |
@ArtRand thanks for pointing my errors in the
There is a 3% diff on ratios, more canonicals. Here is the flagstat of puc19 BAM. It is a mixture of lambda and puc19, hence only a small fraction of reads are of puc19.
Maybe more interesting is This is not the case for the human dataset I tried above.
|
Hello @qiangfu-eu I've run some tests on my side directly comparing the two programs on larger modBAMs. So far I have not been able to reproduce the difference in valid calls that you're seeing. In fact, in my experiments modkit usually has a few more valid calls by about 0.1%. The difference in methylation counts and canonical counts ends up 0.7% and 0.1%, respectively. Similar to you I'm invoking modkit and modbam2bed like this: filter_threshold="0.6074219"
modkit pileup \
${bam} \
${modkit_results_combine} \
--threads 20 --region chr20 \
--filter-threshold ${filter_threshold} \
--combine-strands \
--cpg --ref ${reference} \
--log-filepath modkit.log
modbam2bed -t 20 -m 5mC -e --aggregate --cpg \
-p test_modbam2bed_06fthresh \
-f ${filter_threshold} ${reference} ${bam} > ${mb2b_results_stranded} 2> modbam2bed.log To validate, I'm also using awk
I did find a bug where the N-fail (a.k.a. N-filtered) calls will not be aggregated across strands when one of the strands has 0 N-valid. This doesn't affect the methylation statistics because those reference positions have 0 methylation and 0 canonical by definition. I'll be sure to fix it in this week's release. If you would be willing to share the pUC19 data you have, I'd be keen to take a look and see where the difference is coming from. |
Thanks @ArtRand for checking this in detail. I will see whether I can share directly the BAM file tomorrow. I did test on human dataset. But as shown in the second post of this session, there are also bigger differences. |
Hello, @ArtRand. You can retrieve it at https://drive.google.com/file/d/1QGUJgNZYWj1h79SLOCzsYQhwVRHckx6B/view?usp=share_link. Let me know when you download it. I just want to stress that due to short fragments sequencing, only half of the bases in reads can be reliably aligne to genome. That is what you will observe in this BAM file. Many thanks. |
Exactly. |
Hello @qiangfu-eu so I've spent some time working on this. I was able to reproduce the results you've shown on the pUC19 reads with modkit and with pysam. Both modkit and pysam use the pileup functionality in htslib. What I've discovered is the behavior is unpredictable with reads that map to the ends of the plasmid sequence (presumably because they "wrap around") at the depth you have (12,000x) in this BAM. I have implemented a version of modkit without leveraging the pileup API, but it will take a little longer to get fully tested. It probably won't be ready for the next release, but likely the one following. |
Hi, @ArtRand, thanks for the efforts. Indeed, the circular nature of plasmid could causing issues for the tool. So you mean that However, as given in post 2 of this discussion, a bigger difference is observed between tools when run on one of human samples. In which although more or less similar amount of Thanks again for your help and time. |
@qiangfu-eu As far as the difference you're seeing in the human data, it could be a due to one or a few things. Firstly, you may be somehow seeing the same unpredictable behavior in pileup. The logic around the threshold may be just different enough, you may try |
@ArtRand I tried the latest version. The differences are still there. One extra difference that I have overlooked previously is that On all my dataset, even human, the estimate threshold is not very strict, mostly around 0.66, I wonder that might also have some influence. Should one rather choose more strict threshold and consider only most reliable calls? |
@qiangfu-eu I haven't updated the pileup engine yet, so I wouldn't expect the results would change with the more recent releases. I'll let you know when those changes go in. It's on the roadmap but it's a large refactor. I agree, that the counts between the two tools should be similar. I've narrowed it down to something in the way the alignments are culled when performing the pileup on the data you sent me previously, however I have not been able to reproduce this difference with any internal datasets. What threshold does |
@ArtRand I finally made some progress on the possible culprit for the difference between I did a test today by first removing all unmapped reads (which dose have some influence on puc18 sample, as majority of the reads are from lambda hence not mapped). Then remove duplication using picard (just as a test, not mean it is the right tool). Then I run both
The real difference is for calling different #nocall, which can explain the observed difference in #valid calls and #canonical. While now, #filtered and #5mC calls are the same. Could you kindly check whether the way the cufoff for filtering canonical calls are set differently? Also clearly For our human dataset, I will have a look when I have more time. Although I am not sure the duplication levels there, for normal nanopore protocol, there should be almost no duplications? |
@qiangfu-eu yes I can check as to the differences in Would you be able to share the BAM you're using for this test? (before+after marking dupes). |
Sure. Here are the links
I also noticed that there are still secondary alignments in the deduped BAM. Could those cause problems? |
Secondary, duplicate, and unmapped reads shouldn't have any effect, thanks for the BAMs, I'll take a look at these and get back. |
In our case, we never thought to check duplicate reads. As we do low-pass sequencing for Human, the duplicates is not really an issue in such a dataset. Hence, those reads, although are duplicates, are not marked as is and or removed, and remain in BAM as normally aligned reads. This then causes issue. |
@qiangfu-eu |
Thanks @ArtRand for checking this. Now the only issue still remain is the cutoff for canonical call, as this remain the only difference between two tools. I did test on my cfDNA samples, resulting the same pattern. The 5mC and 5hmC calls are identical, but two tools identify different number of canoical calls. I assume that they do not handle cutoff in the same manner. If you have time, could you look into this? Many thanks. |
@qiangfu-eu Sorry, could you show me an example (result from the |
@qiangfu-eu release 0.1.11 has the |
@ArtRand Thanks for the swift update. Using the files I provided to you last time, you can reproduce the differences on canonical call. In the post just above the one with files, I demonstrated the differences on canonical call on those files. |
@qiangfu-eu I have not forgotten about this issue, just been busy with some other stuff. I should be able to get to a conclusion this week. |
Hi, @ArtRand I am also trying to use modkit to replace modbam2bed, and found this question. May I ask is there any progress of this issue? As it has beed half an year past. Thanks. |
Hello @Wensu-Liu, The differences in modbam2bed compared to modkit have to do with the hierarchical threshold algorithm used in modkit. We've seen that this thresholding mechanism produces the best results. At this point we've moved to modkit for |
Thank you @ArtRand |
Following discussion already have in post #2, I did some tests on two of our samples. However, I observe different results among them. Note that for the simplicity,
On puc19 control samplem both methods detected comparable methylation ratio, however the count statistics are very different. There is 50% difference on # of valid calls. More filtered calls, more nocalls from
modbam2bed
.For
modbam2bed
, I run analysis and summary as follow. Note that 'cpg.acc' bed file is used as it is the one contains aggregated results.For
modkit
, I run analysis and summary as follow.The text was updated successfully, but these errors were encountered: