-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uninformative error when using -s 0 #96
Comments
Thanks! Good point. I'll clean this up later today. |
I tried to map a dataset today that actually has an SD of 0 (all reads are 94bp long, https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR7609031) and encountered the same error with kallisto version 0.46.0. I am a little confused though: Why does kallisto calculate these values itself for paired-end read sequences but not for single-end ones? Is there some biological or methodological caveat that I'm overlooking? |
@Thyra: As far as I understand (didn't look at the paper or code in ages), this boils down to the way length normalization is done in kallisto: For PE data each fragment contains information over its actual length (by the distance between the mates). So I assume the effective count for each fragement can be derived from that and them summed up per feature. edit: Also, on second read, please double check you don't confuse read length and fragment length. While all youe reads may be exactly 94 bp long, the fragments they have been derived from could very well have been (and like were) longer than that and varying in size. |
@mschilli87 Oh, I was totally unaware of the difference between fragment length and read length, thanks for pointing that out! (sorry, I'm a complete noob when it comes to transcriptomics). Do you have a suggestion on how to choose mean and SD fragment lengths for single-end SRA data then? From what I've understood there isn't really a way to calculate/estimate these parameters from single-end reads unless you have access to the raw data and not everybody might publish these values in their manuscripts either (at least not the SD)? |
@Thyra: Sorry for the late reply. I usually have access to Bionalazyer profiles. You could contact the authors, they might have more data than available online. Or you take an educated guess and hope for the best. I found that most of my final conclusions from DGE analyses do not depend too much on those parameters: Even if I change them quite a bit from what I believe to be 'the best' guess, most genes typically are unaffected. Just be aware that especially for shorter transcripts or so you might have some bias. Not much you can do AFAICT. |
@mschilli87 OK, that sounds like a reasonable strategy. THANK YOU! :-) |
After updating from a version without
--sd
/-s
option to one with that parameter I first tried to reproduce some old data using the same-l
value as before and-s 0
. I thought theoretically this should correspond to the fixed length behaviour applied before.Obviously this is not supported and I also don't really care because the actual estimate of the SD will never be zero. However, the error message I've got was quite confusing:
Given that my call contained
-l 300 -s 0
, it was hard to understand why it was failing.I had to inspect the code to find out that
0.0
is used as initial value that is tested against to check if the parameter was set or not.If there really is no way to support
--sd=0
(initializing to a negative value?), the error/help messages could be adjusted to tell the user that-s
(and -l
) have to be greater than zero.The text was updated successfully, but these errors were encountered: