Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning long reads with STAR (e.g. nanopore) #88

Open
acesnik opened this issue May 7, 2018 · 2 comments
Open

Aligning long reads with STAR (e.g. nanopore) #88

acesnik opened this issue May 7, 2018 · 2 comments

Comments

@acesnik
Copy link
Collaborator

acesnik commented May 7, 2018

We need to check the average read length, and if it's above 250 or so, we'll need to use STARlong.

To do this, we need to make clean in the STAR/source directory, and then make STARlong to make STARlong. This doesn't build unless STAR is all cleaned out.

We also need less stringent parameters and more seeds per read.

@acesnik acesnik changed the title Aligning long reads (e.g. nanopore) Aligning long reads with STAR (e.g. nanopore) May 7, 2018
@chemello7
Copy link

STAR/source/STARlong --genomeDir /mnt/e/Chemello/Spritz0.0.2/Homo_sapiens.GRCh38.dna.primary_assembly --genomeLoad LoadAndKeep --readFilesIn /mnt/e/Chemello/NanoporeX1/nanopore_exp1-trimmed.fastq --limitBAMsortRAM 88156000000 --outFileNamePrefix /mnt/e/Chemello/NanoporeX1/nanopore_exp1-trimmed --outSAMtype BAM SortedByCoordinate --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMismatchNmax 100 --seedSearchLmax 30 --seedPerReadNmax 100000 --seedPerWindowNmax 100 --alignTranscriptsPerReadNmax 100000 --alignTranscriptsPerWindowNmax 10000

@acesnik
Copy link
Collaborator Author

acesnik commented Nov 13, 2018

I gave this a couple days of work.

Some notes on nanopore sequencing data:

  • Base calling nanopore data is fairly straightforward with albacore. I used the docker container for it with docker run --rm -i -t -v G:/data:/mnt genomicpariscenter/albacore and then ran the command read_fast5_basecaller.py -t 11 -i /mnt/data/reads -s /mnt/data/AlbacoreResults/ -r -f FLO-MIN107 -k SQK-LSK108 where the last two are the chemistries we used for the run.
  • After basecalling, I tried aligning with STARlong and minimap2. I got more spliced reads with the latter, but they were mostly junk.
  • I tried correcting the indels with a tool called canu, but that didn't improve things much.

Some notes on PacBio:

  • The best tool for generating gene models based off of PacBio runs is called quiver, but this only works with PacBio BAM files. The ones generated from STARlong with FASTA read input didn't work.
  • The PacBio tools are available from bioconda, and downloading them was pretty straightforward.
  • conda install genomicconsensus

@acesnik acesnik added this to the New projects milestone Nov 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants