-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diffrent threshold from dorado models #918
Comments
Hello @Macdot3, Could you do two things for me to help diagnose this issue?
Thanks |
Hi @ArtRand ,
This was followed by alignment with dorado aligner. I subsequently ran the same sample with these new versions:
|
We have been able to reproduce a similar result and are looking into this. It does appear that the v5 5mC+5hmC model produces lower confidence canonical calls than the v4.3 5mC+5hmC model, but the overall accuracy is improved with the v5 model. We will dig into this result a bit further and aim to produce a more robust modified base model in future releases. I would suggest that you can use the results with confidence as the accuracy of the v5 model is increased even if the canonical probabilities have lowered a bit. One point that may help a bit is setting the threshold manually. We will be releasing a ground truth analysis blog post in the coming months with this information, but for the v5 model we find that a threshold of about 0.76 works best on a set of balanced C/5mC/5hmC calls. This will filter canonical calls a bit more heavily than the previous v4.3 model, but should produce more accurate results overall. Note that this threshold will change for different basecalling and modified base models. The blog post will outline how this threshold is determined and allow users to estimate a new threshold for new models/conditions. |
Hi everyone,
I ran some samples using an updated Dorado model and I'm getting different thresholds from Modkit for the same sample. In my case, I have a lower calling threshold compared to, for example, the same sample processed with an older model and version. What could be causing this? What impact might it have on my downstream analysis, and what do you recommend I do in this case? Below, I’ve included the outputs.
Thank you very much for your help.
The text was updated successfully, but these errors were encountered: