kallisto index freezes on Ensembl database #210

clydeandforth · 2019-05-22T14:00:41Z

Hi all,

I downloaded cDNA fasta files and gtf files of the fungal database in Ensembl. I concatenated these into single fasta and gtf files and then tried to index them. However, I get the following error when I run the index command, even on a 382G high memory node:

kallisto index -i test.gtf.gz test.fa.gz

[build] loading fasta file test.fa.gz [build] k-mer length: 31 [build] warning: clipped off poly-A tail (longer than 10) from 4479 target sequences [build] warning: replaced 2455576 non-ACGUT characters in the input sequence with pseudorandom nucleotides [build] counting k-mers ... terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc /var/log/slurm/spool_slurmd/job6981046/slurm_script: line 20: 136148 Aborted kallisto index -i test.fa.gz

The test data completes with the kallisto index command. Is this a memory issue? Do I need more that a 382G node with 2 CPUs and 40 cores? Here is my system information:

kallisto 0.45.1
(GNU libc) 2.17
Linux RedHatEnterpriseServer
Red Hat Enterprise Linux Server release 7.4 (Maipo)

Input files:

database size:
test.gtf.gz 1.3G
test.fa.gz 4.3G

I removed most of the header information from the fasta, here is a chunk of the file:

test.fa.gz

>SAM02534
ATGCCTTCCCTGTCCCGAGTGATTAACCATCCTCTGTTTAACGTTGTCTTCTTTTTGCTG
GCTCGACAAGTAACCAAGGTCCTCCCATTAGAAGACGGGTCTTACTTATGGGGCCTTCGT
GCTCTTTACTATGGCGCTCAAGCTGCGATTATGTTACTAAATCTTTACATTATCCAGATC
ATTGAAAAGAAAAACGATCAGACTGTTTTGCGCTACGTGGAACCGGCGAAACAAACCTGG
GACGGGACCACTACAAAGGATACATTGGTGGTGACCAACTTTGCCGATTACGACAAGAGT
GAAGTCTTGAAGGGGTTGAAACAATCGGGGATTGGGCTGGCCATGGTGACCTTCTTGCAC
TTCAAATTTGGATATGTACAGCCTTTGATCATCCAAGCAATCCTTGGTTTCAAGACCTTC
TTCACGACCAAAGAAGCAAGAATCCACCTATTCAACCAATCCACCAGCAGCGGTGATCTG
AAACGACCTTTCCGGGTGGATTCTCCTTTTGGAATGAACTCACTCAACCCTCAACCCAAG
ACCGACAAGGCATCCATCAAAAAGGCGGAACGTGCTATGAAGGCGGATTAG
>SAM02535
ATGAAAGACGGCTTCAAGTCCATTACGATCGAACCGTTTAATGGGTATCTCGACTTTCAG
GGACCTATCAACGCACAGCAGTCCACCGGCAACATGGTTCTCAAAGGCGACATTCACCTG
GAGCTCACCAAAGCGGTCAATGTCAAGAAGGCCACCCTCAGGTTTATTGGGTCTAGTCGT
GTCTGCCACCACAACACCCTCGATACCGTCGATATCAGCACTCCGATCCTGCCGAAACTC
AAGACACATCTCTTCTCTTCCACTACAACACTTGGTCCTGGCGAGGTGATCTTACCGTGG
GAAATGGAAATCCTCAACATATATCCGTGCAGCGTCATGATCAAACGGGTCACCGTCTCA

I removed comment lines from the gtf file which contained information about each fungal species, here is a chunk of the file:

test.gtf.gz

scf_12295       ena     CDS     1080    1208    .       +       0       gene_id "SAM05242"; transcript_id "SAM05242"; exon_number "1"; gene_name "ABSGL_11117.1 scaffold 12295"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "ABSGL_11117.1 scaffold 12295-1"; transcript_source "ena"; transcript_biotype "protein_coding"; protein_id "SAM05242";
scf_12295       ena     start_codon     1080    1082    .       +       0       gene_id "SAM05242"; transcript_id "SAM05242"; exon_number "1"; gene_name "ABSGL_11117.1 scaffold 12295"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "ABSGL_11117.1 scaffold 12295-1"; transcript_source "ena"; transcript_biotype "protein_coding";
scf_12295       ena     exon    1293    1366    .       +       .       gene_id "SAM05242"; transcript_id "SAM05242"; exon_number "2"; gene_name "ABSGL_11117.1 scaffold 12295"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "ABSGL_11117.1 scaffold 12295-1"; transcript_source "ena"; transcript_biotype "protein_coding"; exon_id "SAM05242-2";
scf_12295       ena     CDS     1293    1366    .       +       0       gene_id "SAM05242"; transcript_id "SAM05242"; exon_number "2"; gene_name "ABSGL_11117.1 scaffold 12295"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "ABSGL_11117.1 scaffold 12295-1"; transcript_source "ena"; transcript_biotype "protein_coding"; protein_id "SAM05242";

Thanks,

James

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kallisto index freezes on Ensembl database #210

kallisto index freezes on Ensembl database #210

clydeandforth commented May 22, 2019

kallisto index freezes on Ensembl database #210

kallisto index freezes on Ensembl database #210

Comments

clydeandforth commented May 22, 2019