Use apparent transcript length rather than actual transcript length (feature request) #8

sjackman · 2015-07-14T16:30:06Z

When calculating the TPM, it may be an idea to use the length of the transcript that has reads mapped to it rather than the FASTA length of the transcript. It may be difficult to define "the length of the transcript that has reads mapped to it" or require choosing arbitrary thresholds to define what portion of the transcript is transcribed.

See https://twitter.com/sjackman/status/620984740150030336

mdshw5 · 2016-08-05T16:12:04Z

Maybe I'm missing something, but it seems like the current code for calculating TPM does incorporate the EffectiveLength measure. Does EffectiveLength not take into account the portion of the transcript that is mapped to?

rob-p · 2016-08-05T16:19:51Z

I think that @sjackman is referring to something even more "subtle" than effective length. The effective length accounts for the ability of all locations on a transcript to generate fragments (according to e.g. the fragment length distribution, and, when modeled, different biases). However, I think what @sjackman is referring to is more akin to simultaneous abundance estimation and "transcript modification". For example, consider I have a transcript sequence in my fasta that is 5kb long, which is highly expressed, but I never see any reads mapping to the last 1kb. In this case, perhaps I actually have a variant of that transcript that is expressed, rather than the sequence in the fasta file. You could also imagine situations like this coming up in de novo assemblies as well, where portions of the assembled contigs are not covered, while others have high coverage, leading a human observer to posit that perhaps there's a mis-assembly. Could something like this be taken into account? Perhaps, but you could imagine why this might become very tricky.

sjackman · 2016-08-05T16:45:16Z

My particular use case was a gene that has two exons and one intron in reality, and the intron was 90% of the length of the gene, but the annotated transcript missed the annotation of the intron, so appeared 10x larger than it was in truth.

mdshw5 · 2016-08-05T17:09:43Z

@rob-p Thanks for that clarification. I actually thought the EffectiveLength measure accounted for this. I guess the situation does become tricky, but maybe the position-specific start distribution data could be helpful in constructing a "baseline" profile of transcript coverage and then comparing each transcript's coverage vector against this would give you a scaling factor to incorporate in the EffectiveLength calculation?

rob-p · 2016-08-05T18:45:01Z

@mdshw5 — I certainly think that this information could be useful (and bias terms are taken into effect when computing the effective length, when bias modeling is enabled). The problem is that the position-specific start distribution is learned globally (well, conditioned on a few different length classes), rather than being transcript specific. So, it's not exactly clear how it would help too much in Shaun's case, since this is a particular transcript, where a splicing variation is causing a huge portion of the transcript to have no mapped reads. Unless this happens in many transcripts (globally), this particular transcript's contribution to the global position-specific start distribution will likely be rather small.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use apparent transcript length rather than actual transcript length (feature request) #8

Use apparent transcript length rather than actual transcript length (feature request) #8

sjackman commented Jul 14, 2015

mdshw5 commented Aug 5, 2016

rob-p commented Aug 5, 2016

sjackman commented Aug 5, 2016

mdshw5 commented Aug 5, 2016

rob-p commented Aug 5, 2016

Use apparent transcript length rather than actual transcript length (feature request) #8

Use apparent transcript length rather than actual transcript length (feature request) #8

Comments

sjackman commented Jul 14, 2015

mdshw5 commented Aug 5, 2016

rob-p commented Aug 5, 2016

sjackman commented Aug 5, 2016

mdshw5 commented Aug 5, 2016

rob-p commented Aug 5, 2016