-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated filter threshold set to 1 for pileup and questions about thresholding #198
Comments
Hello @kylepalos, What's likely happening is that when you switched to the "all-context" model in dorado 0.7.0 bases that have <5% probability of being modified are clamped to 0% probability of modification. These calls then become the 10% most confident ones (the modbase model will probably never predict 100% chance of modification). A couple things you could do:
Let me know if this helps. |
Hi @ArtRand Thank you for your help with this. I performed step 1 as you suggested and my thresholds for my mRNA m6A BAM file were: ![]() The thresholds are nearly identical for this (mRNA) sample and my IVT sample. I attached 2 probability files for mRNA and IVT where they vary quite dramatically. Based on the differences in probabilities but the similarity in thresholds, would the following values presumably retain reasonably confident modifications?
|
Hello @kylepalos, The probability distributions in the IVT samples are going to look "strange" compared to native mRNA because there aren't any m6A residues, so all of the calls should (and appear to be) low confidence. These distributions look like what I would expect. These settings:
Look reasonable to me, you may consider |
Excellent, thanks again for your clarification. |
Hi,
Thanks for the great toolkit.
I am interested in calling m6A and pseudoU on direct RNA-sequencing data.
I used Dorado v0.7.0 as follows:
Then I converted this to fastq and mapped to minimap2. The MM/ML/MN tags are retained in the BAM file.
In the past (dorado v0.5), I have run modkit loosely as follows:
And this would produce a pileup file with thousands of lines that have a percent modified > 0
However, with the updated version of Dorado, every line in the
pileup
output has a percent modified = 0 and it seems to be because the default filter threshold gets set to 1.Similarly, when trying to map all the pseudoU sites using:
I get:
Is a
-p
value of .9 not sensible? My understanding is that it only keeps the 10% highest confidence Nmod counts in the pileup output. Is that correct? I am using such a high value because we have in-vitro transcribed libraries that seems to have a lot of called "modifications" which begin to decrease at higher-p
values.Similarly, based on my understanding of the two-way base modification calls manual page could I get the desired effect of top 10% highest confidence m6A calls by setting:
?
Sorry for the long and basic questions, please let me know if you need any additional information.
The text was updated successfully, but these errors were encountered: