FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28, and also not mapped reads #2202

lopezCascales · 2024-08-22T17:00:11Z

Hi Alex, I have some issues with a data that I have receipt from collaborators, I have normally worked with Smart-seq plate based protocols, but this samples are 10x, and I don't have a lot of information. I have followed some others issues workflows, but I dont get why I cant solved the problem. I dont know if Im doing well, but Im goind to copy more or less the steps that Im following. With the fastq that I have, I can see that I have sequencer specific @kxxxx - HiSeq 3000(?)/4000, and Flowcells ending with BBXX? HiSeq 3000/4000 run.

Example of 1 sample- >
gunzip *.fastq.gz

cat file1.fastq file2.fastq > bigfile.fastq

cat file.fastq | head -n40

@K00360:651:HHKHYBBXY:1:1101:3640:1086 1:N:0:NCTCGTTT

NCTCGTTT

+

#####################################################
The only information of the sample
hashtag_oligo|well10X|RunID
TGTCTTTCCTGCCAG | 3 | 20200811-SCS47-2

With that I supposed that I need to use this index for the barcodes 3M-february-2018.txt
I did a Genome index for Human of 100 ( I don't know if its enough)
#####################################################
Based on different answers on forums, I used this code for STAR

STAR --genomeDir ./indexHuman100 --readFilesIn 20200811-SCS47-2-HT_S4_R2.fastq 20200811-SCS47-2-HT_S4_R1.fastq --outFileNamePrefix scRNA20200811-SCS47-2-HT --outFilterType BySJout --outFilterMultimapNmax 20 --alignIntronMax 100000 --outFilterMismatchNmax 4 --outFilterMatchNminOverLread 0.3 --outFilterScoreMinOverLread 0.3 --outFilterScoreMin 30 --alignEndsType Local --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --runThreadN 128 --clipAdapterType CellRanger4 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB NH HI GX GN --soloFeatures Gene --soloCBwhitelist 3M-february-2018.txt

EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28
Read ID=@K00360:651:HHKHYBBXY:1:1101:3640:1086 ; Sequence=NGGTACATCGGTAATTCCCTTTCGAGGTTTGCTAGGACCGGCNGTANAGNCCGANGGCTNNACATCTGGCAACCGNANTTCATNANANCNGAAGAGNANACGNCTGAACTCCAGTCACTCTCGTTTATCTCGTATGCCGTCTTCTGCTTGA
SOLUTION: check the formatting of input read files.
If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength
To avoid checking of barcode read length, specify --soloBarcodeReadLength 0

########################################################
--soloBarcodeReadLength 150
--soloBarcodeReadLength 151
I add this 2 options, the firs its not working, the second one worked,
Aug 22 18:19:57 ..... started STAR run
Aug 22 18:19:58 ..... loading genome
Aug 22 18:20:42 ..... started mapping
Aug 22 18:26:19 ..... finished mapping
Aug 22 18:26:20 ..... started Solo counting
Aug 22 18:26:36 ..... finished Solo counting
Aug 22 18:26:36 ..... started sorting BAM
Aug 22 18:26:39 ..... finished successfully

########################################################
But this is the log out file,

                             Started job on |	Aug 22 18:19:57
                         Started mapping on |	Aug 22 18:20:42
                                Finished on |	Aug 22 18:26:39
   Mapping speed, Million of reads per hour |	513.01

                      Number of input reads |	50873627
                  Average input read length |	150
                                UNIQUE READS:
               Uniquely mapped reads number |	148635
                    Uniquely mapped reads % |	0.29%
                      Average mapped length |	116.95
                   Number of splices: Total |	26365
        Number of splices: Annotated (sjdb) |	26216
                   Number of splices: GT/AG |	26276
                   Number of splices: GC/AG |	75
                   Number of splices: AT/AC |	9
           Number of splices: Non-canonical |	5
                  Mismatch rate per base, % |	1.93%
                     Deletion rate per base |	0.01%
                    Deletion average length |	1.77
                    Insertion rate per base |	0.02%
                   Insertion average length |	1.62
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |	19370
         % of reads mapped to multiple loci |	0.04%
    Number of reads mapped to too many loci |	77
         % of reads mapped to too many loci |	0.00%
                              UNMAPPED READS:

For the solo.out
noNoAdapter 0
noNoUMI 0
noNoCB 0
noNinCB 0
noNinUMI 4566
noUMIhomopolymer 1216
noNoWLmatch 316720
noTooManyMM 0
noTooManyWLmatches 0
yesWLmatchExact 49288951
yesOneWLmatchWithMM 476773
yesMultWLmatchWithMM 785401

and the matrix.txt

%%MatrixMarket matrix coordinate integer general
%
62710 6794880 81035

#########################################################

There are not mapped reads. Do you have any suggestions? Thank you in advance for your help
Have a nice day.
Mayte

The text was updated successfully, but these errors were encountered:

lopezCascales · 2024-08-22T17:09:19Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28, and also not mapped reads #2202

FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28, and also not mapped reads #2202

lopezCascales commented Aug 22, 2024

lopezCascales commented Aug 22, 2024

FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28, and also not mapped reads #2202

FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28, and also not mapped reads #2202

Comments

lopezCascales commented Aug 22, 2024

@K00360:651:HHKHYBBXY:1:1101:3640:1086 1:N:0:NCTCGTTT

NCTCGTTT

+

lopezCascales commented Aug 22, 2024