Output file interpretation #162

molinfzlvvv · 2024-05-21T07:58:38Z

Hello! I've successfully executed the TOGA pipeline, but I'm still unclear on how to interpret the output file.

My objective was to identify missing or inactivated genes. I've observed that there's a file named loss_summ_data.tsv, which is divided into eight categories. Should I primarily focus on the UL (uncertain loss) and L (clearly lost) categories and disregard the others? Additionally, there's an inact_mut_data.txt file for visualization.What these two details of the contents of the documents, I should how to correctly identify genetic loss events and the inactivated genes.

In fact, I took the pika genome and compared it with hg38 using lastal, and the maf output was converted to a chain file(more faster). After TOGA, the loss_summ_data.tsv file had only 19 non-redundant results, and the inact_mut_data.txt file had 319 results. But after getting assembly quality statistics, the results are shown in the figure. Is this reasonable and what might be the cause

toga_statsplot.pdf

I'm sorry for disturbing you so many times. I really look forward to your reply, which is very important to me.

Best regards!

MichaelHiller · 2024-05-23T12:42:26Z

Hi,

your stats plot shows that almost all genes are classified as missing. I assume this is because the chains you use are very incomplete.
Yes, extracting lost and UL genes is what I would do. For UL, you may want to run RELAX in addition to check if the gene evolves under relaxed selection, which would be stronger evidence that the gene (and not only 1 exon) is lost.
If you have a highly complete genome, you can also extract M genes. E.g. in Rhie et al Nature (the VGP paper) there are genes lost between rearranged genomic regions in bats (Fig 5). TOGA would classify them as missing, which is the correct classification if the assembly is not very complete. But for several Bat1K quality genomes, M then likely indicates a true loss.
Wrt whats in the files pls look into https://genome.senckenberg.de/download/TOGA/README.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output file interpretation #162

Output file interpretation #162

molinfzlvvv commented May 21, 2024

MichaelHiller commented May 23, 2024

Output file interpretation #162

Output file interpretation #162

Comments

molinfzlvvv commented May 21, 2024

MichaelHiller commented May 23, 2024