Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart-Seq2 kallisto analysis pipeline #235

Closed
zolotarovgl opened this issue Nov 8, 2019 · 5 comments
Closed

Smart-Seq2 kallisto analysis pipeline #235

zolotarovgl opened this issue Nov 8, 2019 · 5 comments

Comments

@zolotarovgl
Copy link

Dear kallisto team,

I have Smart-Seq2 data which I would like to analyse in kallisto:bustools framework even though it lacks UMIs.
I tried to search for related issues but didn't find any.
Is there any pipeline you can recommend?

@Zha0rong
Copy link

Hi zolotarovgl:
I am not part of the team but i think there is a paragraph on the manual which says the following:

Additionally kallisto bus will accept a string specifying a new technology in the format of bc:umi:seq where each of bc,umi and seq are a triplet of integers separated by a comma, denoting the file index, start and stop of the sequence used. For example to specify the 10xV2 technology we would use 0,0,16:0,16,26:1,0,0. The first part bc is 0,0,16 indicating it is in the 0-th file (also known as the first file in plain english), the barcode starts at the 0-th bp and ends at the 16-th bp in the sequence (i.e. 16bp barcode), the UMI is similarly in the same file, right after the barcode in position 16-26 (a 10bp UMI), finally the sequence is in a separate file, starts at 0 and ends at 0 (in this case stopping at 0 means there is no limit, we use the entire sequence).

@lakigigar
Copy link
Contributor

SMART-seq2 does not have UMIs so while one could possibly wrangle the pseudoalignments into BUS format the resulting BUS files wouldn't have useful information beyond what would be produced by running kallisto pseudo. SMART-seq3 does include UMIs and we do plan to provide a kallisto | bustools workflow for that.

@Zha0rong
Copy link

Hi @lakigigar ,
Thanks for the help! I found that in this dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4194802 the researchers writes that in their data the read 2 contain the umi and cell barcode:

SMART-seq2

  1. TSO primer sequence: AAGCAGTGGTATCAACGCAGAGTACATrGrG+G
  2. Structure of the raw data:
    Read2: Barcode(8bp) +UMI(8bp)+PolyT+(mRNA)
    Read1: TSO+ mRNA +PolyA.

In this case maybe we can use the read two as 0,0,8 and 0,8,16 to specify the cell barcode and umi?
Thanks again!

@sbooeshaghi
Copy link
Collaborator

We have used kallisto to process SmartSeq2 data for our recent manuscript Isoform cell type specificity in the mouse primary motor cortex

You can find the relevant code here: https://github.com/pachterlab/BYVSTZP_2020

@daria-dc
Copy link

Dear kallisto team,

I have two questions regarding the workflow with kallisto pseudo.

  1. Can users also specify samples that span multiple fastq files in the batch file?
  2. Is the genes x cells count matrix generated by adding the --quant flag to the command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants