Diffrent threshold from dorado models #918

Macdot3 · 2024-06-19T14:12:40Z

Hi everyone,
I ran some samples using an updated Dorado model and I'm getting different thresholds from Modkit for the same sample. In my case, I have a lower calling threshold compared to, for example, the same sample processed with an older model and version. What could be causing this? What impact might it have on my downstream analysis, and what do you recommend I do in this case? Below, I’ve included the outputs.

with dorado v0.5.3 - model [email protected]
modkit pileup --ref rCRS_16426.fasta --cpg --combine-strands ../PCR_MT10288M_final_filtered.bam ../PCR_MT10288M.bed
> calculated chunk size: 6, interval size 100000, processing 600000 positions concurrently
> filtering to only CpG motifs
> attempting to sample 10042 reads
> Using filter threshold 0.90234375 for C.
> Done, processed 858 rows. Processed ~432 reads and skipped zero reads

with dorado v0.7.1 - model [email protected]
 modkit pileup --ref rCRS_16426.fasta --cpg --combine-strands ../PCR_MT10288M_final_filtered_new.bam ../PCR_MT10288M_new.bed
> calculated chunk size: 6, interval size 100000, processing 600000 positions concurrently
> filtering to only CpG motifs
> attempting to sample 10042 reads
> Threshold of 0.50390625 for base C is very low. Consider increasing the filter-percentile or specifying a higher threshold.
> Done, processed 856 rows. Processed ~434 reads and skipped zero reads

Thank you very much for your help.

The text was updated successfully, but these errors were encountered:

ArtRand · 2024-06-19T15:24:51Z

Hello @Macdot3,

Could you do two things for me to help diagnose this issue?

Tell me the exact dorado basecalling command you used so I know the basecalling model and the modified bases model?
Run modkit sample-probs ../PCR_MT10288M_final_filtered.bam --hist ./probability_histograms and send me the contents of the directory?

Thanks

Macdot3 · 2024-06-20T07:45:57Z

Hi @ArtRand ,
Here is the folder for you
PCR_MT10288_filter.zip. Compared to what I wrote above, the threshold is around 0.86 because I had forgotten to apply a filter with samtools.
Regarding the Dorado commands, for the file PCR_MT10288M_final_filtered, I have these:

cd ../dorado-0.5.3-linux-x64/bin
./dorado basecaller /Model/[email protected]/ /home/Nanopore/Dorado/POD5/POD5_barcode12_PCR/ --modified-bases 5mCG_5hmCG --device cpu > /home/CALLS/PCR_MT10288.bam

This was followed by alignment with dorado aligner. I subsequently ran the same sample with these new versions:

cd ../dorado-0.7.1-linux-x64/bin
./dorado basecaller /Model/[email protected]/ /home/Nanopore/Dorado/POD5/POD5_barcode12_PCR/ --modified-bases 5mCG_5hmCG --device cpu > /home/CALLS/PCR_MT10288_new_model.bam

marcus1487 · 2024-06-20T16:43:36Z

We have been able to reproduce a similar result and are looking into this. It does appear that the v5 5mC+5hmC model produces lower confidence canonical calls than the v4.3 5mC+5hmC model, but the overall accuracy is improved with the v5 model. We will dig into this result a bit further and aim to produce a more robust modified base model in future releases. I would suggest that you can use the results with confidence as the accuracy of the v5 model is increased even if the canonical probabilities have lowered a bit.

One point that may help a bit is setting the threshold manually. We will be releasing a ground truth analysis blog post in the coming months with this information, but for the v5 model we find that a threshold of about 0.76 works best on a set of balanced C/5mC/5hmC calls. This will filter canonical calls a bit more heavily than the previous v4.3 model, but should produce more accurate results overall. Note that this threshold will change for different basecalling and modified base models. The blog post will outline how this threshold is determined and allow users to estimate a new threshold for new models/conditions.

iiSeymour transferred this issue from nanoporetech/modkit Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diffrent threshold from dorado models #918

Diffrent threshold from dorado models #918

Macdot3 commented Jun 19, 2024 •

edited

Loading

ArtRand commented Jun 19, 2024

Macdot3 commented Jun 20, 2024

marcus1487 commented Jun 20, 2024

Diffrent threshold from dorado models #918

Diffrent threshold from dorado models #918

Comments

Macdot3 commented Jun 19, 2024 • edited Loading

ArtRand commented Jun 19, 2024

Macdot3 commented Jun 20, 2024

marcus1487 commented Jun 20, 2024

Macdot3 commented Jun 19, 2024 •

edited

Loading