You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Alex, I have some issues with a data that I have receipt from collaborators, I have normally worked with Smart-seq plate based protocols, but this samples are 10x, and I don't have a lot of information. I have followed some others issues workflows, but I dont get why I cant solved the problem. I dont know if Im doing well, but Im goind to copy more or less the steps that Im following. With the fastq that I have, I can see that I have sequencer specific @kxxxx - HiSeq 3000(?)/4000, and Flowcells ending with BBXX? HiSeq 3000/4000 run.
#####################################################
The only information of the sample
hashtag_oligo|well10X|RunID
TGTCTTTCCTGCCAG | 3 | 20200811-SCS47-2
With that I supposed that I need to use this index for the barcodes 3M-february-2018.txt
I did a Genome index for Human of 100 ( I don't know if its enough)
#####################################################
Based on different answers on forums, I used this code for STAR
EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28
Read ID=@K00360:651:HHKHYBBXY:1:1101:3640:1086 ; Sequence=NGGTACATCGGTAATTCCCTTTCGAGGTTTGCTAGGACCGGCNGTANAGNCCGANGGCTNNACATCTGGCAACCGNANTTCATNANANCNGAAGAGNANACGNCTGAACTCCAGTCACTCTCGTTTATCTCGTATGCCGTCTTCTGCTTGA
SOLUTION: check the formatting of input read files.
If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength
To avoid checking of barcode read length, specify --soloBarcodeReadLength 0
########################################################
--soloBarcodeReadLength 150
--soloBarcodeReadLength 151
I add this 2 options, the firs its not working, the second one worked,
Aug 22 18:19:57 ..... started STAR run
Aug 22 18:19:58 ..... loading genome
Aug 22 18:20:42 ..... started mapping
Aug 22 18:26:19 ..... finished mapping
Aug 22 18:26:20 ..... started Solo counting
Aug 22 18:26:36 ..... finished Solo counting
Aug 22 18:26:36 ..... started sorting BAM
Aug 22 18:26:39 ..... finished successfully
########################################################
But this is the log out file,
Started job on | Aug 22 18:19:57
Started mapping on | Aug 22 18:20:42
Finished on | Aug 22 18:26:39
Mapping speed, Million of reads per hour | 513.01
Number of input reads | 50873627
Average input read length | 150
UNIQUE READS:
Uniquely mapped reads number | 148635
Uniquely mapped reads % | 0.29%
Average mapped length | 116.95
Number of splices: Total | 26365
Number of splices: Annotated (sjdb) | 26216
Number of splices: GT/AG | 26276
Number of splices: GC/AG | 75
Number of splices: AT/AC | 9
Number of splices: Non-canonical | 5
Mismatch rate per base, % | 1.93%
Deletion rate per base | 0.01%
Deletion average length | 1.77
Insertion rate per base | 0.02%
Insertion average length | 1.62
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 19370
% of reads mapped to multiple loci | 0.04%
Number of reads mapped to too many loci | 77
% of reads mapped to too many loci | 0.00%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 50691944
% of reads unmapped: too short | 99.64%
Number of reads unmapped: other | 13601
% of reads unmapped: other | 0.03%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
sorry I didnt copy the unmapped reads
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 50691944
% of reads unmapped: too short | 99.64%
Number of reads unmapped: other | 13601
% of reads unmapped: other | 0.03%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
Hi Alex, I have some issues with a data that I have receipt from collaborators, I have normally worked with Smart-seq plate based protocols, but this samples are 10x, and I don't have a lot of information. I have followed some others issues workflows, but I dont get why I cant solved the problem. I dont know if Im doing well, but Im goind to copy more or less the steps that Im following. With the fastq that I have, I can see that I have sequencer specific @kxxxx - HiSeq 3000(?)/4000, and Flowcells ending with BBXX? HiSeq 3000/4000 run.
Example of 1 sample- >
gunzip *.fastq.gz
cat file1.fastq file2.fastq > bigfile.fastq
cat file.fastq | head -n40
@K00360:651:HHKHYBBXY:1:1101:3640:1086 1:N:0:NCTCGTTT
NCTCGTTT
+
#####################################################
The only information of the sample
hashtag_oligo|well10X|RunID
TGTCTTTCCTGCCAG | 3 | 20200811-SCS47-2
With that I supposed that I need to use this index for the barcodes 3M-february-2018.txt
I did a Genome index for Human of 100 ( I don't know if its enough)
#####################################################
Based on different answers on forums, I used this code for STAR
STAR --genomeDir ./indexHuman100 --readFilesIn 20200811-SCS47-2-HT_S4_R2.fastq 20200811-SCS47-2-HT_S4_R1.fastq --outFileNamePrefix scRNA20200811-SCS47-2-HT --outFilterType BySJout --outFilterMultimapNmax 20 --alignIntronMax 100000 --outFilterMismatchNmax 4 --outFilterMatchNminOverLread 0.3 --outFilterScoreMinOverLread 0.3 --outFilterScoreMin 30 --alignEndsType Local --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --runThreadN 128 --clipAdapterType CellRanger4 --outSAMtype BAM SortedByCoordinate --outSAMattributes CR UR CY UY CB UB NH HI GX GN --soloFeatures Gene --soloCBwhitelist 3M-february-2018.txt
EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 151 not equal to expected 28
Read ID=@K00360:651:HHKHYBBXY:1:1101:3640:1086 ; Sequence=NGGTACATCGGTAATTCCCTTTCGAGGTTTGCTAGGACCGGCNGTANAGNCCGANGGCTNNACATCTGGCAACCGNANTTCATNANANCNGAAGAGNANACGNCTGAACTCCAGTCACTCTCGTTTATCTCGTATGCCGTCTTCTGCTTGA
SOLUTION: check the formatting of input read files.
If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength
To avoid checking of barcode read length, specify --soloBarcodeReadLength 0
########################################################
--soloBarcodeReadLength 150
--soloBarcodeReadLength 151
I add this 2 options, the firs its not working, the second one worked,
Aug 22 18:19:57 ..... started STAR run
Aug 22 18:19:58 ..... loading genome
Aug 22 18:20:42 ..... started mapping
Aug 22 18:26:19 ..... finished mapping
Aug 22 18:26:20 ..... started Solo counting
Aug 22 18:26:36 ..... finished Solo counting
Aug 22 18:26:36 ..... started sorting BAM
Aug 22 18:26:39 ..... finished successfully
########################################################
But this is the log out file,
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 50691944
% of reads unmapped: too short | 99.64%
Number of reads unmapped: other | 13601
% of reads unmapped: other | 0.03%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%
For the solo.out
noNoAdapter 0
noNoUMI 0
noNoCB 0
noNinCB 0
noNinUMI 4566
noUMIhomopolymer 1216
noNoWLmatch 316720
noTooManyMM 0
noTooManyWLmatches 0
yesWLmatchExact 49288951
yesOneWLmatchWithMM 476773
yesMultWLmatchWithMM 785401
and the matrix.txt
%%MatrixMarket matrix coordinate integer general
%
62710 6794880 81035
#########################################################
There are not mapped reads. Do you have any suggestions? Thank you in advance for your help
Have a nice day.
Mayte
The text was updated successfully, but these errors were encountered: