Smart-Seq2 kallisto analysis pipeline #235

zolotarovgl · 2019-11-08T14:33:20Z

Dear kallisto team,

I have Smart-Seq2 data which I would like to analyse in kallisto:bustools framework even though it lacks UMIs.
I tried to search for related issues but didn't find any.
Is there any pipeline you can recommend?

The text was updated successfully, but these errors were encountered:

Zha0rong · 2019-11-19T15:02:35Z

Hi zolotarovgl:
I am not part of the team but i think there is a paragraph on the manual which says the following:

Additionally kallisto bus will accept a string specifying a new technology in the format of bc:umi:seq where each of bc,umi and seq are a triplet of integers separated by a comma, denoting the file index, start and stop of the sequence used. For example to specify the 10xV2 technology we would use 0,0,16:0,16,26:1,0,0. The first part bc is 0,0,16 indicating it is in the 0-th file (also known as the first file in plain english), the barcode starts at the 0-th bp and ends at the 16-th bp in the sequence (i.e. 16bp barcode), the UMI is similarly in the same file, right after the barcode in position 16-26 (a 10bp UMI), finally the sequence is in a separate file, starts at 0 and ends at 0 (in this case stopping at 0 means there is no limit, we use the entire sequence).

lakigigar · 2019-12-13T00:01:16Z

SMART-seq2 does not have UMIs so while one could possibly wrangle the pseudoalignments into BUS format the resulting BUS files wouldn't have useful information beyond what would be produced by running kallisto pseudo. SMART-seq3 does include UMIs and we do plan to provide a kallisto | bustools workflow for that.

Zha0rong · 2019-12-13T00:09:19Z

Hi @lakigigar ,
Thanks for the help! I found that in this dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4194802 the researchers writes that in their data the read 2 contain the umi and cell barcode:

SMART-seq2

TSO primer sequence: AAGCAGTGGTATCAACGCAGAGTACATrGrG+G
Structure of the raw data:
Read2: Barcode(8bp) +UMI(8bp)+PolyT+(mRNA)
Read1: TSO+ mRNA +PolyA.

In this case maybe we can use the read two as 0,0,8 and 0,8,16 to specify the cell barcode and umi?
Thanks again!

sbooeshaghi · 2020-04-20T07:54:29Z

We have used kallisto to process SmartSeq2 data for our recent manuscript Isoform cell type specificity in the mouse primary motor cortex

You can find the relevant code here: https://github.com/pachterlab/BYVSTZP_2020

daria-dc · 2020-07-21T08:53:59Z

Dear kallisto team,

I have two questions regarding the workflow with kallisto pseudo.

Can users also specify samples that span multiple fastq files in the batch file?
Is the genes x cells count matrix generated by adding the --quant flag to the command?

lakigigar closed this as completed Dec 13, 2019

winni2k mentioned this issue Jun 10, 2020

Smart-seq3 support in kallisto bus #267

Open

sanxiongliu mentioned this issue Sep 11, 2020

kallisto | bustools workflow for smart-seq3 #281

Open

This was referenced Nov 19, 2020

downstream analysis in SMART-SEQ2 #274

Closed

How to generate count matrix after generating output from kallisto pseudo #275

Closed

alex-d13 mentioned this issue Aug 5, 2021

Simulation Implementation omnideconv/SimBu#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart-Seq2 kallisto analysis pipeline #235

Smart-Seq2 kallisto analysis pipeline #235

zolotarovgl commented Nov 8, 2019

Zha0rong commented Nov 19, 2019

lakigigar commented Dec 13, 2019

Zha0rong commented Dec 13, 2019

sbooeshaghi commented Apr 20, 2020

daria-dc commented Jul 21, 2020

Smart-Seq2 kallisto analysis pipeline #235

Smart-Seq2 kallisto analysis pipeline #235

Comments

zolotarovgl commented Nov 8, 2019

Zha0rong commented Nov 19, 2019

lakigigar commented Dec 13, 2019

Zha0rong commented Dec 13, 2019

sbooeshaghi commented Apr 20, 2020

daria-dc commented Jul 21, 2020