Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28, and also not mapped reads #2202

Open
lopezCascales opened this issue Aug 22, 2024 · 1 comment

Comments

@lopezCascales
Copy link

Hi Alex, I have some issues with a data that I have receipt from collaborators, I have normally worked with Smart-seq plate based protocols, but this samples are 10x, and I don't have a lot of information. I have followed some others issues workflows, but I dont get why I cant solved the problem. I dont know if Im doing well, but Im goind to copy more or less the steps that Im following. With the fastq that I have, I can see that I have sequencer specific @kxxxx - HiSeq 3000(?)/4000, and Flowcells ending with BBXX? HiSeq 3000/4000 run.

Example of 1 sample- >
gunzip *.fastq.gz

cat file1.fastq file2.fastq > bigfile.fastq

cat file.fastq | head -n40

@K00360:651:HHKHYBBXY:1:1101:3640:1086 1:N:0:NCTCGTTT

NCTCGTTT

+

#####################################################
The only information of the sample
hashtag_oligo|well10X|RunID
TGTCTTTCCTGCCAG | 3 | 20200811-SCS47-2

With that I supposed that I need to use this index for the barcodes 3M-february-2018.txt
I did a Genome index for Human of 100 ( I don't know if its enough)
#####################################################
Based on different answers on forums, I used this code for STAR

STAR --genomeDir ./indexHuman100 --readFilesIn 20200811-SCS47-2-HT_S4_R2.fastq 20200811-SCS47-2-HT_S4_R1.fastq --outFileNamePrefix scRNA20200811-SCS47-2-HT --outFilterType BySJout --outFilterMultimapNmax 20 --alignIntronMax 100000 --outFilterMismatchNmax 4 --outFilterMatchNminOverLread 0.3 --outFilterScoreMinOverLread 0.3 --outFilterScoreMin 30 --alignEndsType Local --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --runThreadN 128 --clipAdapterType CellRanger4 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB NH HI GX GN --soloFeatures Gene --soloCBwhitelist 3M-february-2018.txt

EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28
Read ID=@K00360:651:HHKHYBBXY:1:1101:3640:1086 ; Sequence=NGGTACATCGGTAATTCCCTTTCGAGGTTTGCTAGGACCGGCNGTANAGNCCGANGGCTNNACATCTGGCAACCGNANTTCATNANANCNGAAGAGNANACGNCTGAACTCCAGTCACTCTCGTTTATCTCGTATGCCGTCTTCTGCTTGA
SOLUTION: check the formatting of input read files.
If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength
To avoid checking of barcode read length, specify --soloBarcodeReadLength 0

########################################################
--soloBarcodeReadLength 150
--soloBarcodeReadLength 151
I add this 2 options, the firs its not working, the second one worked,
Aug 22 18:19:57 ..... started STAR run
Aug 22 18:19:58 ..... loading genome
Aug 22 18:20:42 ..... started mapping
Aug 22 18:26:19 ..... finished mapping
Aug 22 18:26:20 ..... started Solo counting
Aug 22 18:26:36 ..... finished Solo counting
Aug 22 18:26:36 ..... started sorting BAM
Aug 22 18:26:39 ..... finished successfully

########################################################
But this is the log out file,

                             Started job on |	Aug 22 18:19:57
                         Started mapping on |	Aug 22 18:20:42
                                Finished on |	Aug 22 18:26:39
   Mapping speed, Million of reads per hour |	513.01

                      Number of input reads |	50873627
                  Average input read length |	150
                                UNIQUE READS:
               Uniquely mapped reads number |	148635
                    Uniquely mapped reads % |	0.29%
                      Average mapped length |	116.95
                   Number of splices: Total |	26365
        Number of splices: Annotated (sjdb) |	26216
                   Number of splices: GT/AG |	26276
                   Number of splices: GC/AG |	75
                   Number of splices: AT/AC |	9
           Number of splices: Non-canonical |	5
                  Mismatch rate per base, % |	1.93%
                     Deletion rate per base |	0.01%
                    Deletion average length |	1.77
                    Insertion rate per base |	0.02%
                   Insertion average length |	1.62
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |	19370
         % of reads mapped to multiple loci |	0.04%
    Number of reads mapped to too many loci |	77
         % of reads mapped to too many loci |	0.00%
                              UNMAPPED READS:

Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 50691944
% of reads unmapped: too short | 99.64%
Number of reads unmapped: other | 13601
% of reads unmapped: other | 0.03%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%

For the solo.out
noNoAdapter 0
noNoUMI 0
noNoCB 0
noNinCB 0
noNinUMI 4566
noUMIhomopolymer 1216
noNoWLmatch 316720
noTooManyMM 0
noTooManyWLmatches 0
yesWLmatchExact 49288951
yesOneWLmatchWithMM 476773
yesMultWLmatchWithMM 785401

and the matrix.txt

%%MatrixMarket matrix coordinate integer general
%
62710 6794880 81035

#########################################################

There are not mapped reads. Do you have any suggestions? Thank you in advance for your help
Have a nice day.
Mayte

@lopezCascales
Copy link
Author

sorry I didnt copy the unmapped reads
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 50691944
% of reads unmapped: too short | 99.64%
Number of reads unmapped: other | 13601
% of reads unmapped: other | 0.03%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant