Ambiguous Novel Transcript Locations and Merging Transcript Models #233

BenneyMRArgue · 2024-09-04T17:34:26Z

Hi,

Thanks so much for developing this tool, I'm excited to be able to use this in my long-read isoform analysis!

I have had a couple questions come up since looking through the output from some IsoQuant runs (v3.4.1). When looking through the transcript model output (transcript_model_grouped_counts.tsv) I noticed that novel transcripts appear to be labeled "transcript####.chr##". I need to merge the outputs from the individual IsoQuant runs (I have a large sc-RNAseq dataset which I have had to split by sample because of resource limitations for each job), so realizing that these transcripts appear to be listed in the order that IsoQuant processed them during the individual experiment raised concern. Is it possible to merge novel transcripts which have the same genomic coordinates but have been assigned different numbers in their respective runs?

I also noticed that some novel transcripts are listed multiple times in the tsv, connected to different chromosomes. For instance, I looked at one in IGV which was placed both in chromosome 7 and 10:

Do you have any insight on why this occurs and how to identify which location is correct?

Thanks,
-Benney

andrewprzh · 2024-09-12T23:00:17Z

Dear @BenneyMRArgue

Thanks for the feedback!

I would recommend you to try the latest IsoQuant version (3.5.2). It has far better RAM consumption compared to 3.4.1 - a major problem was fixed since version 3.4.2 resulting in ~10-30x RAM decrease on different tested datasets. Probably, you'd be able to process you dataset at once.

Regarding duplicated transcripts. IsoQuant assigns transcript ids sequentially, but the independent runs will not have identical ids for the same novel transcripts. So unfortunately, it is impossible to track novel transcripts between different runs. Moreover, chromosome name is a part of transcript id, so it's OK to have transcript58.chr7.nic and transcript58.chr10.nic -- these are two completely different transcripts ids.

If you still would like to merge different GTFs, I'd suggest using gffcompare tool.

Best
Andrey

BenneyMRArgue · 2024-09-16T17:12:05Z

Hi Andrey,

Thanks for the input! It's good to know that the novel transcripts can't be compared between runs. I will try all together with the newest version first.

Also thanks so much for clarifying that point about the transcript ids and chromosome assignments, it's a relief to find they are not supposed to be the same!

Best,
-Benney

andrewprzh added the question Further information is requested label Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ambiguous Novel Transcript Locations and Merging Transcript Models #233

Ambiguous Novel Transcript Locations and Merging Transcript Models #233

BenneyMRArgue commented Sep 4, 2024

andrewprzh commented Sep 12, 2024

BenneyMRArgue commented Sep 16, 2024

Ambiguous Novel Transcript Locations and Merging Transcript Models #233

Ambiguous Novel Transcript Locations and Merging Transcript Models #233

Comments

BenneyMRArgue commented Sep 4, 2024

andrewprzh commented Sep 12, 2024

BenneyMRArgue commented Sep 16, 2024