-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probability threshold #80
Comments
For the low threshold on the older model, this makes sense. The best recommendation here would be to upgrade to the latest kits where the accuracy (and confidence in each call) is much higher. For higher accuracy from these older models you may have some success increasing the filtering threshold, but this will obviously drop some data as well.
|
Thanks for the advice Hopefully can move on to the newer kits soon we just have a bunch of older flow cells the boss wants me to use up :) While I understand your statement that it's not feasible to provide a global threshold (always the way with bioinformatics...), the warnings within the software about low threshold seem to indicate that certain thresholds are not advisable. This makes me think that I should be manually upping it a little bit at least |
@NickNCL One thing you could do is run
Then inspect probabilities.txt and probabilities.tsv and set the threshold manually with
|
@ArtRand would it make sense to concatenate one's samples into a single BAM and run |
Hello @Ge0rges, You can, there is a one-liner below. Do you want to know the distribution of probabilities when the samples are combined? Using a combination of If you're trying to estimate a pass-threshold you don't combine the samples together unless you're planning on operating on them all together. The estimator in # the sort by read name is a way to pseudo randomize them
samtools merge ${bam1} ${bam2} -o - | samtools sort -n | ${modkit} sample-probs - --hist -o probs |
Hi,
I wanted to ask about the rationale of empirically determined filtering threshold by default? Surely it would make more sense for this to be fixed e.g. between different replicates/conditions. Would date be comparable between replicates if the filtering threshold varies?
I'm getting a warning that the selected threshold by default is low
Threshold of 0.5722656 for base A is very low. Consider increasing the filter-percentile or specifying a higher threshold.
Threshold of 0.6074219 for base C is low. Consider increasing the filter-percentile or specifying a higher threshold.
I'm presuming this is because I have generally low probability scores, maybe due to the older nanopore sequencing chemistry I'm using (V10) and possibly also the basecalling model I used (all_context modification model)
I'm hoping that high sequencing depth might, to a certain extend, compensate for a higher error rate
Is there any guidance on a generally acceptable threshold?
The text was updated successfully, but these errors were encountered: