Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Before the present embodiments are further described, it is to be understood that the scope of the invention is not limited to the particular embodiments described below; it is also to be understood that the terminology used in the examples is for the purpose of describing particular embodiments, and is not intended to limit the scope of the present invention; in the description and claims of the present application, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise.
When numerical ranges are given in the examples, it is understood that both endpoints of each of the numerical ranges and any value therebetween can be selected unless the invention otherwise indicated. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In addition to the specific methods, devices, and materials used in the examples, any methods, devices, and materials similar or equivalent to those described in the examples herein can be used in the practice of the invention, as would be known to one skilled in the art and the description of the invention.
Unless otherwise indicated, the experimental methods, detection methods, and preparation methods disclosed herein all employ techniques conventional in the art of molecular biology, biochemistry, chromatin structure and analysis, analytical chemistry, cell culture, recombinant DNA technology, and related arts.
The detection method for the tag sequence pollution of the sequencing platform at least comprises the following steps:
(1) connecting the tag sequence to be detected with the known sequence to obtain a tag sequence-known sequence, wherein the tag sequence type of the sequencing platform is fixed and known;
(2) sequencing the sequence obtained in the step (1) to obtain a sequencing result, wherein the sequencing result comprises the base sequence and the number of the sequence;
(3) splitting a sequencing result according to the difference of the types of the tag sequences, and if a known sequence rm appears in the classification results of other tag sequences Tn besides the corresponding tag sequence Tm, polluting the tag sequence Tm by the other tag sequences Tn; wherein m and n are natural numbers, and m is not equal to n.
The tag sequence refers to a nucleic acid fragment used for distinguishing different samples during mixed sequencing.
The known sequence refers to a nucleic acid fragment having a known base sequence.
The number of sequences refers to the number of nucleic acid fragments.
Further, when detecting whether the sequencing platform tag sequence is contaminated, the steps (1) - (3) can be adopted to directly detect the tag sequence.
The other known sequence rn may be one or more. When the other known sequence rn is plural, it indicates that the tag sequence Tm contaminates plural tag sequences. For example, if the tag sequence T1 contains other known sequences r2 and r5 in addition to the known sequence r1 corresponding to the tag sequence T1, it indicates that the tag sequence T1 corresponding to the known sequence r1 contaminates the tag sequence T2 corresponding to the other known sequence r2 and the tag sequence T5 corresponding to the other known sequence r 5.
Further, in step (1), the known sequence is derived from the genome of any species.
In a further embodiment, in step (1), the known sequence is derived from an archaea or phage genome. The genomes of archaea or bacteriophages are the source of known sequences. Convenient to use, preventing the presence of multiple copies of the sequence.
Preferably, the known sequences are all from the same organism or the same individual organism in a single assay. The method can ensure that each known sequence is different to the maximum extent.
In one embodiment, the bacteriophage is selected from lambda bacteriophage.
In one embodiment, the archaea is selected from the phylum archaeota (Korarchaeota), naarchaeota (Nanoarchaeota), Thaumarchaeota (Thaumarchaeota) or Euryarchaeota (Euryarchaeota). Such as Pyrococcus (Thermococcus) or Pyroluta (Pyroditicum) or Halobacterium (Halobacterium salinum).
The tag sequence may be a single-ended or double-ended barcode.
The invention can detect whether one label sequence or a plurality of label sequences are polluted or not.
In the step (1), when the types of the tag sequences to be detected are more than 1, the known sequences connected with different tag sequences are different. That is, in step 1, the tag sequences and the known sequences need to be linked in a one-to-one correspondence, i.e., a tag sequence to be tested is only linked to a known sequence in a corresponding manner.
Further, the detection method further comprises the following steps: calculating the pollution ratio of the label sequence Tm to the label sequence Tn by adopting the following method:
the number of sequences of the known sequence rm in the class of the tag sequences Tn/the number of sequences of the known sequence rm in the class of the tag sequences Tm is 100%.
Preferably, the length of the known sequence is 100-250 bp.
The tag sequence to be detected and the known sequence can be connected by a PCR mode or a ligase. The tag sequence may be directly linked to the known sequence or indirectly linked to the known sequence. When indirectly linked, the linkage may be by primer sequences. The primer sequence refers to a joint commonly used in sequencing, such as a short Y joint or a long Y joint.
In one embodiment, in the sequencing mode of multiplex pooling, the tag sequence can be linked to a known sequence by PCR methods.
In one embodiment, in the sequencing mode of the whole genome library, if a short Y-linker is used, the tag sequence is first ligated to the short Y-linker and the known sequence is then ligated to the short Y-linker by PCR to obtain the tag sequence-short Y-linker-known sequence.
In one embodiment, the indirect ligation of the known sequence to the tag sequence can be achieved by ligating the long Y-linker to the known sequence by a ligase if a long Y-linker is used, which already contains the tag sequence.
Optionally, the sequencing platform is an Illumina next generation sequencing platform.
The detection method for tag sequence contamination of the sequencing platform can be used for gene sequencing.
Example 1
1.1 reagents used: 2 XTAQ Plus Master Mix (Dye Plus), Ligation Module (assist in san), DNA purification magnetic beads (assist in san), 2 XTKapa Enzyme Mix, NextSeq 500/550 Mix Output reagent card V2, NextSeq Access Box V2, NextSeq 500/550 Mix Output Flow cell card V2, NextSeq 500/550Buffer card V2
1.2 fragment amplification: DNA amplification was carried out using 2 XTaq Plus Master Mix (Dye Plus), lambda phage DNA as a template, sequences listed in Table 1 as primers, reaction system of Table 2, and reaction conditions of Table 3, respectively.
TABLE 1 primer List
TABLE 2 reaction System
PCR Components
|
Volume (μ L)
|
λDNA,50ng/μL
|
1
|
Forward primer, 10. mu.M
|
1
|
Reverse primer, 10. mu.M
|
1
|
2×Taq Plus Master Mix
|
25
|
Water (W)
|
22
|
Total
|
50 |
TABLE 3 reaction conditions
The PCR product was detected by agarose gel electrophoresis, and as can be seen from FIG. 1, the size of the detected amplified fragment was consistent with the size of the expected target fragment.
1.3 purification of PCR products: PCR products were purified using a 1.4 sample volume of Oxin DNA purification beads, washed twice with 80% ethanol, eluted with 50. mu.L TE and the concentration was determined. The known sequences r1-r15 were obtained, respectively.
1.4 connection: preparing a ligation reaction system as shown in Table 5, and connecting a tag sequence to be tested with a known sequence through a primer sequence under the reaction conditions as shown in Table 6, wherein the nucleotide sequence of the primer sequence is as follows:
CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCCTTGGCACCCGAGAATTC, (SEQ ID NO: 46), wherein NNNNNN represents the tag sequence, and the specific sequence is shown in Table 4. And respectively connecting the primer sequence with the known sequence, namely respectively connecting the known sequence with the to-be-detected tag sequence T1-T15 to obtain the tag sequence-known sequence. Specifically, the compound is T1-r1, T2-r2, T3-r3, T4-r4, T5-r5, T6-r6, T7-r7, T8-r8, T9-r9, T10-r10, T11-r11, T12-r12, T13-r13, T14-r14 and T15-r 15. The sequence of the tag to be detected is a barcode sequence
TABLE 4 tag sequences
Tag number
|
Tag sequences
|
SEQ ID
|
T1
|
TTATAT
|
NO:31
|
T2
|
ACCAAC
|
NO:32
|
T3
|
CTATGC
|
NO:33
|
T4
|
ATTCCT
|
NO:34
|
T5
|
CAACTC
|
NO:35
|
T6
|
TTAGGC
|
NO:36
|
T7
|
AGGATC
|
NO:37
|
T8
|
CAGCAA
|
NO:38
|
T9
|
AAGTAG
|
NO:39
|
T10
|
ACAGTG
|
NO:40
|
T11
|
GGTCCA
|
NO:41
|
T12
|
GATCAG
|
NO:42
|
T13
|
ATTATG
|
NO:43
|
T14
|
GGCTAC
|
NO:44
|
T15
|
GAACCT
|
NO:45 |
TABLE 5 connection System
Name of reagent
|
Volume/. mu.L
|
Assist in saint coupling buffer
|
10
|
Ligase
|
2.5
|
DNA(10ng/μL)
|
3
|
Test tag sequence (barcode sequence)
|
2
|
Water (W)
|
32.5
|
total
|
50 |
TABLE 6 reaction conditions
Reaction temperature
|
Reaction time
|
4℃
|
Hold
|
22℃
|
60min
|
4℃
|
Hold |
After completion of ligation, the ligation product was purified using the DNA purification beads of FIG. 1.0X sample volume, washed twice with 80% ethanol,
elution with 12. mu.L TE;
1.5 library amplification: all tag sequences-known sequences were mixed to prepare a ligation reaction system as shown in Table 7 and linker ligation was performed under the reaction conditions shown in Table 8 (P5/P7).
TABLE 7 connection system
Name of reagent
|
Volume/. mu.L
|
KAPA HiFi Mix
|
12.5
|
P5/P7 linker
|
2
|
DNA (tag sequence)Known sequence)
|
10.5
|
total
|
25 |
TABLE 8 reaction conditions
After the amplification is finished, the ligation product is purified by using the DNA purification beads assist in FIG. 1.2 Xthe sample volume, washed twice with 80% ethanol, and eluted with 30. mu.L of TE to obtain a DNA library to be detected;
1.6 quality control
Measuring the DNA concentration and fragment size in the amplified sequencing library by using a Qubit dye and 1.5% agarose gel electrophoresis;
1.7 dilution of library: diluting the constructed libraries to 10nM and mixing in an amount that yields 1M data per library;
1.8 sequencing on machine: sequencing a sample by using an Illumina CN500 second-generation sequencing platform, and splitting the sample by using a tag sequence after sequencing is completed, wherein the results are shown in tables 9, 10 and 11;
TABLE 9 Credit results-1
TABLE 10 Credit results-2
TABLE 11 Credit results-3
Analysis was performed in the following data: as shown in Table 9, the tag sequence T2 corresponding to r2 is resolved, and as a result, no other known sequences except r2 exist under the tag sequence T2, which indicates that the remaining 14 barcode is not polluted by the barcode corresponding to r 2; as shown in table 10, when the splitting was performed by the tag sequence T14 corresponding to r14, it was found that when r11 was present in addition to r14 in the tag sequence T14, the barcode corresponding to r14 was contaminated with the barcode corresponding to r11, and when the splitting was performed by the tag sequence T11 corresponding to r11 according to table 11, it was found that the contamination ratio was 0.07% instead of 4/5612 when the amount of r11 was 5612.
The above examples are intended to illustrate the disclosed embodiments of the invention and are not to be construed as limiting the invention. In addition, various modifications of the methods and compositions set forth herein, as well as variations of the methods and compositions of the present invention, will be apparent to those skilled in the art without departing from the scope and spirit of the invention. While the invention has been specifically described in connection with various specific preferred embodiments thereof, it should be understood that the invention should not be unduly limited to such specific embodiments. Indeed, various modifications of the above-described embodiments which are obvious to those skilled in the art to which the invention pertains are intended to be covered by the scope of the present invention.
Sequence listing
<110> Shanghai Wehn biomedical science and technology, Inc
<120> detection method for sequencing platform tag sequence pollution
<160>46
<170>SIPOSequenceListing 1.0
<210>1
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>1
gctgacattt tcggt 15
<210>2
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>2
tggcctgccg cagtt 15
<210>3
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>3
cagccaggaa ctatt 15
<210>4
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>4
gttttccagt tccgga 16
<210>5
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>5
atccgtgagg tgaat 15
<210>6
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>6
cagcgacgga atatc 15
<210>7
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>7
gatattgaac aggaa 15
<210>8
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>8
taagatactg ctcct 15
<210>9
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>9
gtcatccgcc agcag 15
<210>10
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>10
agtctttgac aatct 15
<210>11
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>11
tatcgactcc cagct 15
<210>12
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>12
catttctgca ccatt 15
<210>13
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>13
tccgtctacg gaaag 15
<210>14
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>14
tcgggaagtg aacgg 15
<210>15
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>15
gacgcaatga ggcac 15
<210>16
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>16
tcatcctctc cggat 15
<210>17
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>17
atgacctgat gacag 15
<210>18
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>18
atacataaaa tcctg 15
<210>19
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>19
gaatatgccg gttatc 16
<210>20
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>20
cctgatgcag ctggat 16
<210>21
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>21
gaagcggcat ggaaag 16
<210>22
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>22
ctgaccatcc ggaact 16
<210>23
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>23
tattacgtca gcgag 15
<210>24
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>24
tgcccgtcct ccacgg 16
<210>25
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>25
cagcgtgatg gagca 15
<210>26
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>26
ccaatccagc cggtca 16
<210>27
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>27
tgcagacggc tcagga 16
<210>28
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>28
aaagtacgcc cacgac 16
<210>29
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>29
gaaagaagtt cagga 15
<210>30
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>30
gattcaaatg ctgca 15
<210>31
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>31
ttatat 6
<210>32
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>32
accaac 6
<210>33
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>33
ctatgc 6
<210>34
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>34
attcct 6
<210>35
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>35
caactc 6
<210>36
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>36
ttaggc 6
<210>37
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>37
aggatc 6
<210>38
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>38
cagcaa 6
<210>39
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>39
aagtag 6
<210>40
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>40
acagtg 6
<210>41
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>41
ggtcca 6
<210>42
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>42
gatcag 6
<210>43
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>43
attatg 6
<210>44
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>44
ggctac 6
<210>45
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>45
gaacct 6
<210>46
<211>61
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>46
caagcagaag acggcatacg agatnnnnnn gtgactggag ttccttggca cccgagaatt 60
c 61