CN111304309A - Detection method for sequencing platform tag sequence pollution - Google Patents

Detection method for sequencing platform tag sequence pollution Download PDF

Info

Publication number
CN111304309A
CN111304309A CN202010152944.6A CN202010152944A CN111304309A CN 111304309 A CN111304309 A CN 111304309A CN 202010152944 A CN202010152944 A CN 202010152944A CN 111304309 A CN111304309 A CN 111304309A
Authority
CN
China
Prior art keywords
sequence
tag
sequencing
sequences
sequencing platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010152944.6A
Other languages
Chinese (zh)
Inventor
林健
杨敬敏
覃振东
唐嘉婕
朱学萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Wickham Biomedical Technology Co ltd
Original Assignee
Shanghai Wickham Biomedical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Wickham Biomedical Technology Co ltd filed Critical Shanghai Wickham Biomedical Technology Co ltd
Priority to CN202010152944.6A priority Critical patent/CN111304309A/en
Publication of CN111304309A publication Critical patent/CN111304309A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a detection method for sequencing platform tag sequence pollution, which at least comprises the following steps: (1) connecting the tag sequence to be detected with the known sequence to obtain a tag sequence-known sequence, wherein the tag sequence type of the sequencing platform is fixed and known; (2) sequencing the sequence obtained in the step (1) to obtain a sequencing result, wherein the sequencing result comprises the base sequence and the number of the sequence; (3) splitting a sequencing result according to the difference of the types of the tag sequences, and if a known sequence rm appears in the classification results of other tag sequences Tn besides the corresponding tag sequence Tm, polluting the tag sequence Tm by the other tag sequences Tn; wherein m and n are natural numbers, and m is not equal to n. The method ensures the credibility of subsequent offline data by verifying the uniqueness of the tag sequence.

Description

Detection method for sequencing platform tag sequence pollution
Technical Field
The invention relates to the field of bioinformatics and biotechnology, in particular to a detection method for sequencing platform tag sequence pollution.
Background
In recent years, with the development of technology and the reduction of sequencing cost, high-throughput sequencing has been penetrated by scientific research toward people's daily life. At present, the technology is mainly applied to several aspects: genome sequencing, RNA sequencing, DNA methylation and the like, and is more generally applied to tumors, genetic diseases, metagenomes and the like. The data of high-throughput sequencing is that the corresponding samples are sorted by the label sequences, but if the index sequences are single or other contaminations are introduced in the operation process, the label sequences are not pure any more, and errors occur in subsequent sample sorting and biological analysis.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention aims to provide a method for detecting tag sequence contamination of a sequencing platform. The method at least comprises the following steps:
(1) connecting the tag sequence to be detected with the known sequence to obtain a tag sequence-known sequence, wherein the tag sequence type of the sequencing platform is fixed and known;
(2) sequencing the sequence obtained in the step (1) to obtain a sequencing result, wherein the sequencing result comprises the base sequence and the number of the sequence;
(3) splitting a sequencing result according to the difference of the types of the tag sequences, and if a known sequence rm appears in the classification results of other tag sequences Tn besides the corresponding tag sequence Tm, polluting the tag sequence Tm by the other tag sequences Tn; wherein m and n are natural numbers, and m is not equal to n.
The invention also provides a detection method for the pollution of the sequencing platform tag sequence and application of the detection method in gene sequencing.
As mentioned above, the method for detecting the tag sequence contamination of the sequencing platform has the following beneficial effects:
the invention utilizes single DNA sequence to build a library, carry out on-machine sequencing, and test the label sequence by the known sequence in the off-machine data, thereby ensuring that the label sequence is free from other label sequence pollution. And the credibility of subsequent offline data is ensured by verifying the uniqueness of the tag sequence.
Drawings
FIG. 1 shows the electrophoresis results of PCR amplification of known sequences.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Before the present embodiments are further described, it is to be understood that the scope of the invention is not limited to the particular embodiments described below; it is also to be understood that the terminology used in the examples is for the purpose of describing particular embodiments, and is not intended to limit the scope of the present invention; in the description and claims of the present application, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise.
When numerical ranges are given in the examples, it is understood that both endpoints of each of the numerical ranges and any value therebetween can be selected unless the invention otherwise indicated. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In addition to the specific methods, devices, and materials used in the examples, any methods, devices, and materials similar or equivalent to those described in the examples herein can be used in the practice of the invention, as would be known to one skilled in the art and the description of the invention.
Unless otherwise indicated, the experimental methods, detection methods, and preparation methods disclosed herein all employ techniques conventional in the art of molecular biology, biochemistry, chromatin structure and analysis, analytical chemistry, cell culture, recombinant DNA technology, and related arts.
The detection method for the tag sequence pollution of the sequencing platform at least comprises the following steps:
(1) connecting the tag sequence to be detected with the known sequence to obtain a tag sequence-known sequence, wherein the tag sequence type of the sequencing platform is fixed and known;
(2) sequencing the sequence obtained in the step (1) to obtain a sequencing result, wherein the sequencing result comprises the base sequence and the number of the sequence;
(3) splitting a sequencing result according to the difference of the types of the tag sequences, and if a known sequence rm appears in the classification results of other tag sequences Tn besides the corresponding tag sequence Tm, polluting the tag sequence Tm by the other tag sequences Tn; wherein m and n are natural numbers, and m is not equal to n.
The tag sequence refers to a nucleic acid fragment used for distinguishing different samples during mixed sequencing.
The known sequence refers to a nucleic acid fragment having a known base sequence.
The number of sequences refers to the number of nucleic acid fragments.
Further, when detecting whether the sequencing platform tag sequence is contaminated, the steps (1) - (3) can be adopted to directly detect the tag sequence.
The other known sequence rn may be one or more. When the other known sequence rn is plural, it indicates that the tag sequence Tm contaminates plural tag sequences. For example, if the tag sequence T1 contains other known sequences r2 and r5 in addition to the known sequence r1 corresponding to the tag sequence T1, it indicates that the tag sequence T1 corresponding to the known sequence r1 contaminates the tag sequence T2 corresponding to the other known sequence r2 and the tag sequence T5 corresponding to the other known sequence r 5.
Further, in step (1), the known sequence is derived from the genome of any species.
In a further embodiment, in step (1), the known sequence is derived from an archaea or phage genome. The genomes of archaea or bacteriophages are the source of known sequences. Convenient to use, preventing the presence of multiple copies of the sequence.
Preferably, the known sequences are all from the same organism or the same individual organism in a single assay. The method can ensure that each known sequence is different to the maximum extent.
In one embodiment, the bacteriophage is selected from lambda bacteriophage.
In one embodiment, the archaea is selected from the phylum archaeota (Korarchaeota), naarchaeota (Nanoarchaeota), Thaumarchaeota (Thaumarchaeota) or Euryarchaeota (Euryarchaeota). Such as Pyrococcus (Thermococcus) or Pyroluta (Pyroditicum) or Halobacterium (Halobacterium salinum).
The tag sequence may be a single-ended or double-ended barcode.
The invention can detect whether one label sequence or a plurality of label sequences are polluted or not.
In the step (1), when the types of the tag sequences to be detected are more than 1, the known sequences connected with different tag sequences are different. That is, in step 1, the tag sequences and the known sequences need to be linked in a one-to-one correspondence, i.e., a tag sequence to be tested is only linked to a known sequence in a corresponding manner.
Further, the detection method further comprises the following steps: calculating the pollution ratio of the label sequence Tm to the label sequence Tn by adopting the following method:
the number of sequences of the known sequence rm in the class of the tag sequences Tn/the number of sequences of the known sequence rm in the class of the tag sequences Tm is 100%.
Preferably, the length of the known sequence is 100-250 bp.
The tag sequence to be detected and the known sequence can be connected by a PCR mode or a ligase. The tag sequence may be directly linked to the known sequence or indirectly linked to the known sequence. When indirectly linked, the linkage may be by primer sequences. The primer sequence refers to a joint commonly used in sequencing, such as a short Y joint or a long Y joint.
In one embodiment, in the sequencing mode of multiplex pooling, the tag sequence can be linked to a known sequence by PCR methods.
In one embodiment, in the sequencing mode of the whole genome library, if a short Y-linker is used, the tag sequence is first ligated to the short Y-linker and the known sequence is then ligated to the short Y-linker by PCR to obtain the tag sequence-short Y-linker-known sequence.
In one embodiment, the indirect ligation of the known sequence to the tag sequence can be achieved by ligating the long Y-linker to the known sequence by a ligase if a long Y-linker is used, which already contains the tag sequence.
Optionally, the sequencing platform is an Illumina next generation sequencing platform.
The detection method for tag sequence contamination of the sequencing platform can be used for gene sequencing.
Example 1
1.1 reagents used: 2 XTAQ Plus Master Mix (Dye Plus), Ligation Module (assist in san), DNA purification magnetic beads (assist in san), 2 XTKapa Enzyme Mix, NextSeq 500/550 Mix Output reagent card V2, NextSeq Access Box V2, NextSeq 500/550 Mix Output Flow cell card V2, NextSeq 500/550Buffer card V2
1.2 fragment amplification: DNA amplification was carried out using 2 XTaq Plus Master Mix (Dye Plus), lambda phage DNA as a template, sequences listed in Table 1 as primers, reaction system of Table 2, and reaction conditions of Table 3, respectively.
TABLE 1 primer List
Figure BDA0002403073440000041
Figure BDA0002403073440000051
TABLE 2 reaction System
PCR Components Volume (μ L)
λDNA,50ng/μL 1
Forward primer, 10. mu.M 1
Reverse primer, 10. mu.M 1
2×Taq Plus Master Mix 25
Water (W) 22
Total 50
TABLE 3 reaction conditions
Figure BDA0002403073440000052
The PCR product was detected by agarose gel electrophoresis, and as can be seen from FIG. 1, the size of the detected amplified fragment was consistent with the size of the expected target fragment.
1.3 purification of PCR products: PCR products were purified using a 1.4 sample volume of Oxin DNA purification beads, washed twice with 80% ethanol, eluted with 50. mu.L TE and the concentration was determined. The known sequences r1-r15 were obtained, respectively.
1.4 connection: preparing a ligation reaction system as shown in Table 5, and connecting a tag sequence to be tested with a known sequence through a primer sequence under the reaction conditions as shown in Table 6, wherein the nucleotide sequence of the primer sequence is as follows:
CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCCTTGGCACCCGAGAATTC, (SEQ ID NO: 46), wherein NNNNNN represents the tag sequence, and the specific sequence is shown in Table 4. And respectively connecting the primer sequence with the known sequence, namely respectively connecting the known sequence with the to-be-detected tag sequence T1-T15 to obtain the tag sequence-known sequence. Specifically, the compound is T1-r1, T2-r2, T3-r3, T4-r4, T5-r5, T6-r6, T7-r7, T8-r8, T9-r9, T10-r10, T11-r11, T12-r12, T13-r13, T14-r14 and T15-r 15. The sequence of the tag to be detected is a barcode sequence
TABLE 4 tag sequences
Tag number Tag sequences SEQ ID
T1 TTATAT NO:31
T2 ACCAAC NO:32
T3 CTATGC NO:33
T4 ATTCCT NO:34
T5 CAACTC NO:35
T6 TTAGGC NO:36
T7 AGGATC NO:37
T8 CAGCAA NO:38
T9 AAGTAG NO:39
T10 ACAGTG NO:40
T11 GGTCCA NO:41
T12 GATCAG NO:42
T13 ATTATG NO:43
T14 GGCTAC NO:44
T15 GAACCT NO:45
TABLE 5 connection System
Name of reagent Volume/. mu.L
Assist in saint coupling buffer 10
Ligase 2.5
DNA(10ng/μL) 3
Test tag sequence (barcode sequence) 2
Water (W) 32.5
total 50
TABLE 6 reaction conditions
Reaction temperature Reaction time
4℃ Hold
22℃ 60min
4℃ Hold
After completion of ligation, the ligation product was purified using the DNA purification beads of FIG. 1.0X sample volume, washed twice with 80% ethanol,
elution with 12. mu.L TE;
1.5 library amplification: all tag sequences-known sequences were mixed to prepare a ligation reaction system as shown in Table 7 and linker ligation was performed under the reaction conditions shown in Table 8 (P5/P7).
TABLE 7 connection system
Name of reagent Volume/. mu.L
KAPA HiFi Mix 12.5
P5/P7 linker 2
DNA (tag sequence)Known sequence) 10.5
total 25
TABLE 8 reaction conditions
Figure BDA0002403073440000071
After the amplification is finished, the ligation product is purified by using the DNA purification beads assist in FIG. 1.2 Xthe sample volume, washed twice with 80% ethanol, and eluted with 30. mu.L of TE to obtain a DNA library to be detected;
1.6 quality control
Measuring the DNA concentration and fragment size in the amplified sequencing library by using a Qubit dye and 1.5% agarose gel electrophoresis;
1.7 dilution of library: diluting the constructed libraries to 10nM and mixing in an amount that yields 1M data per library;
1.8 sequencing on machine: sequencing a sample by using an Illumina CN500 second-generation sequencing platform, and splitting the sample by using a tag sequence after sequencing is completed, wherein the results are shown in tables 9, 10 and 11;
TABLE 9 Credit results-1
Figure BDA0002403073440000072
Figure BDA0002403073440000081
TABLE 10 Credit results-2
Figure BDA0002403073440000082
TABLE 11 Credit results-3
Figure BDA0002403073440000083
Figure BDA0002403073440000091
Analysis was performed in the following data: as shown in Table 9, the tag sequence T2 corresponding to r2 is resolved, and as a result, no other known sequences except r2 exist under the tag sequence T2, which indicates that the remaining 14 barcode is not polluted by the barcode corresponding to r 2; as shown in table 10, when the splitting was performed by the tag sequence T14 corresponding to r14, it was found that when r11 was present in addition to r14 in the tag sequence T14, the barcode corresponding to r14 was contaminated with the barcode corresponding to r11, and when the splitting was performed by the tag sequence T11 corresponding to r11 according to table 11, it was found that the contamination ratio was 0.07% instead of 4/5612 when the amount of r11 was 5612.
The above examples are intended to illustrate the disclosed embodiments of the invention and are not to be construed as limiting the invention. In addition, various modifications of the methods and compositions set forth herein, as well as variations of the methods and compositions of the present invention, will be apparent to those skilled in the art without departing from the scope and spirit of the invention. While the invention has been specifically described in connection with various specific preferred embodiments thereof, it should be understood that the invention should not be unduly limited to such specific embodiments. Indeed, various modifications of the above-described embodiments which are obvious to those skilled in the art to which the invention pertains are intended to be covered by the scope of the present invention.
Sequence listing
<110> Shanghai Wehn biomedical science and technology, Inc
<120> detection method for sequencing platform tag sequence pollution
<160>46
<170>SIPOSequenceListing 1.0
<210>1
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>1
gctgacattt tcggt 15
<210>2
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>2
tggcctgccg cagtt 15
<210>3
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>3
cagccaggaa ctatt 15
<210>4
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>4
gttttccagt tccgga 16
<210>5
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>5
atccgtgagg tgaat 15
<210>6
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>6
cagcgacgga atatc 15
<210>7
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>7
gatattgaac aggaa 15
<210>8
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>8
taagatactg ctcct 15
<210>9
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>9
gtcatccgcc agcag 15
<210>10
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>10
agtctttgac aatct 15
<210>11
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>11
tatcgactcc cagct 15
<210>12
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>12
catttctgca ccatt 15
<210>13
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>13
tccgtctacg gaaag 15
<210>14
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>14
tcgggaagtg aacgg 15
<210>15
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>15
gacgcaatga ggcac 15
<210>16
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>16
tcatcctctc cggat 15
<210>17
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>17
atgacctgat gacag 15
<210>18
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>18
atacataaaa tcctg 15
<210>19
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>19
gaatatgccg gttatc 16
<210>20
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>20
cctgatgcag ctggat 16
<210>21
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>21
gaagcggcat ggaaag 16
<210>22
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>22
ctgaccatcc ggaact 16
<210>23
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>23
tattacgtca gcgag 15
<210>24
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>24
tgcccgtcct ccacgg 16
<210>25
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>25
cagcgtgatg gagca 15
<210>26
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>26
ccaatccagc cggtca 16
<210>27
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>27
tgcagacggc tcagga 16
<210>28
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>28
aaagtacgcc cacgac 16
<210>29
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>29
gaaagaagtt cagga 15
<210>30
<211>15
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>30
gattcaaatg ctgca 15
<210>31
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>31
ttatat 6
<210>32
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>32
accaac 6
<210>33
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>33
ctatgc 6
<210>34
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>34
attcct 6
<210>35
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>35
caactc 6
<210>36
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>36
ttaggc 6
<210>37
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>37
aggatc 6
<210>38
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>38
cagcaa 6
<210>39
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>39
aagtag 6
<210>40
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>40
acagtg 6
<210>41
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>41
ggtcca 6
<210>42
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>42
gatcag 6
<210>43
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>43
attatg 6
<210>44
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>44
ggctac 6
<210>45
<211>6
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>45
gaacct 6
<210>46
<211>61
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>46
caagcagaag acggcatacg agatnnnnnn gtgactggag ttccttggca cccgagaatt 60
c 61

Claims (10)

1. A method for detecting tag sequence contamination of a sequencing platform at least comprises the following steps:
(1) connecting the tag sequence to be detected with the known sequence to obtain a tag sequence-known sequence, wherein the tag sequence type of the sequencing platform is fixed and known;
(2) sequencing the sequence obtained in the step (1) to obtain a sequencing result, wherein the sequencing result comprises the base sequence and the number of the sequence;
(3) splitting a sequencing result according to the difference of the types of the tag sequences, and if a known sequence rm appears in the classification results of other tag sequences Tn besides the corresponding tag sequence Tm, polluting the tag sequence Tm by the other tag sequences Tn; wherein m and n are natural numbers, and m is not equal to n.
2. The method for detecting tag sequence contamination of a sequencing platform of claim 1, wherein in step (1), the known sequence is derived from a genome of any species.
3. The method for detecting tag sequence contamination of a sequencing platform of claim 2, wherein in step (1), the known sequence is derived from archaea or phage genome.
4. The method for detecting contamination of sequencing platform tag sequences according to claim 3, wherein the bacteriophage is selected from lambda bacteriophages.
5. The method of detecting sequencing platform tag sequence contamination of claim 1, further comprising one or more of the following features:
1) the tag sequence is selected from single-ended barcode and double-ended barcode;
2) in the step (1), when the types of the tag sequences to be detected are more than 1, the known sequences connected with different tag sequences are different.
6. The method for detecting tag sequence contamination of a sequencing platform of claim 1, further comprising the steps of: calculating the pollution ratio of the label sequence Tm to the label sequence Tn by adopting the following method:
the number of sequences of the known sequence rm in the class of the tag sequences Tn/the number of sequences of the known sequence rm in the class of the tag sequences Tm is 100%.
7. The method for detecting tag sequence contamination of a sequencing platform of claim 1, wherein the known sequence has a length of 100-250 bp.
8. The method for detecting the contamination of the sequencing platform tag sequence according to claim 1, wherein the tag sequence to be detected is linked to the known sequence by means of PCR or ligase.
9. The method for detecting tag sequence contamination of a sequencing platform of claim 1, wherein the sequencing platform is an Illumina next generation sequencing platform.
10. The method for detecting tag sequence contamination of the sequencing platform of any one of claims 1 to 9, for use in gene sequencing.
CN202010152944.6A 2020-03-06 2020-03-06 Detection method for sequencing platform tag sequence pollution Pending CN111304309A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010152944.6A CN111304309A (en) 2020-03-06 2020-03-06 Detection method for sequencing platform tag sequence pollution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010152944.6A CN111304309A (en) 2020-03-06 2020-03-06 Detection method for sequencing platform tag sequence pollution

Publications (1)

Publication Number Publication Date
CN111304309A true CN111304309A (en) 2020-06-19

Family

ID=71152274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010152944.6A Pending CN111304309A (en) 2020-03-06 2020-03-06 Detection method for sequencing platform tag sequence pollution

Country Status (1)

Country Link
CN (1) CN111304309A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111944806A (en) * 2020-07-30 2020-11-17 上海韦翰斯生物医药科技有限公司 Molecular tag group for high-throughput sequencing pollution detection and application thereof
CN111944807A (en) * 2020-08-26 2020-11-17 天津诺禾医学检验所有限公司 Human sequencing sample tracking marker, and monitoring method and monitoring device for human sequencing sample cross contamination

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104024426A (en) * 2012-11-21 2014-09-03 北京贝瑞和康生物技术有限公司 Tracing method and detection kit for test sample in next-generation sequencing technique of dna
WO2015002908A1 (en) * 2013-07-01 2015-01-08 Sequenta, Inc. Large-scale biomolecular analysis with sequence tags
CN104395481A (en) * 2012-04-13 2015-03-04 赛昆塔公司 Detection and quantitation of sample contamination in immune repertoire analysis
CN106555008A (en) * 2016-12-11 2017-04-05 天津福德信泰生物科技有限公司 Detection and identification method and system for microorganisms
CN107164464A (en) * 2017-04-27 2017-09-15 武汉华大医学检验所有限公司 A kind of method and primer for detecting the pollution of microarray dataset index sequence
WO2018107481A1 (en) * 2016-12-16 2018-06-21 深圳华大基因股份有限公司 Gene tag for nucleic acid sample identification, kit, and application thereof
CN108932401A (en) * 2018-06-07 2018-12-04 江西海普洛斯生物科技有限公司 It is a kind of be sequenced sample identification method and its application
CN109706219A (en) * 2018-12-20 2019-05-03 臻和(北京)科技有限公司 Construct the method for splitting of the method for sequencing library, kit, upper machine method and sequencing data

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104395481A (en) * 2012-04-13 2015-03-04 赛昆塔公司 Detection and quantitation of sample contamination in immune repertoire analysis
CN104024426A (en) * 2012-11-21 2014-09-03 北京贝瑞和康生物技术有限公司 Tracing method and detection kit for test sample in next-generation sequencing technique of dna
US20150252359A1 (en) * 2012-11-21 2015-09-10 Berry Genomics Co., Ltd Method for tracking test sample by second-generation DNA sequencing technology and detection kit
WO2015002908A1 (en) * 2013-07-01 2015-01-08 Sequenta, Inc. Large-scale biomolecular analysis with sequence tags
CN105658812A (en) * 2013-07-01 2016-06-08 适应生物技术公司 Large-scale biomolecular analysis with sequence tags
CN106555008A (en) * 2016-12-11 2017-04-05 天津福德信泰生物科技有限公司 Detection and identification method and system for microorganisms
WO2018107481A1 (en) * 2016-12-16 2018-06-21 深圳华大基因股份有限公司 Gene tag for nucleic acid sample identification, kit, and application thereof
CN107164464A (en) * 2017-04-27 2017-09-15 武汉华大医学检验所有限公司 A kind of method and primer for detecting the pollution of microarray dataset index sequence
CN108932401A (en) * 2018-06-07 2018-12-04 江西海普洛斯生物科技有限公司 It is a kind of be sequenced sample identification method and its application
CN109706219A (en) * 2018-12-20 2019-05-03 臻和(北京)科技有限公司 Construct the method for splitting of the method for sequencing library, kit, upper machine method and sequencing data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董燕 等: "单细胞测序技术研究进展", vol. 32, no. 01, pages 71 - 78 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111944806A (en) * 2020-07-30 2020-11-17 上海韦翰斯生物医药科技有限公司 Molecular tag group for high-throughput sequencing pollution detection and application thereof
CN111944807A (en) * 2020-08-26 2020-11-17 天津诺禾医学检验所有限公司 Human sequencing sample tracking marker, and monitoring method and monitoring device for human sequencing sample cross contamination

Similar Documents

Publication Publication Date Title
AU2017381296B2 (en) Reagents and methods for the analysis of linked nucleic acids
CN107829146B (en) Primer group for constructing 16SrRNA gene amplicon sequencing library and construction method
CN111073961A (en) High-throughput detection method for gene rare mutation
CN106845155B (en) Device for detecting internal series repetition
CN117551746B (en) Method for detecting target nucleic acid and adjacent region nucleic acid sequence thereof
CN110863056A (en) Method, reagent and application for accurately typing human DNA
CN111304309A (en) Detection method for sequencing platform tag sequence pollution
CN116287357A (en) Respiratory tract pathogenic bacteria detection kit based on targeted amplicon sequencing
CN112795654A (en) Method and kit for organism fusion gene detection and fusion abundance quantification
CN111944806A (en) Molecular tag group for high-throughput sequencing pollution detection and application thereof
CN111549109A (en) High-throughput pathogen microorganism gene detection screening method
CN115948607B (en) Method and kit for simultaneously detecting multiple pathogen genes
CN114807302B (en) Amplicon library construction method and kit for thalassemia mutant and deletion type gene detection
CN114277114B (en) Method for adding unique identifier in amplicon sequencing and application
CN114277096B (en) Method and kit for identifying thalassemia alpha anti4.2 heterozygotes and HK alpha heterozygotes
CN116463408A (en) ABO gene amplification primer, amplification system, amplification method, sequencing library construction method and sequencing method
CN112266963B (en) Detection kit for combined detection of chronic granulocytic leukemia
CN115074422A (en) Detection method of unknown fusion gene
CN111793623A (en) Typing genetic marker composition, kit, identification system and typing method of 62 multi-allelic SNP-NGS
CN112708664B (en) Construction method and kit of multi-gene mutation sequencing library of lung cancer driving gene
CN111172159A (en) Bovine mitochondrial genome capture probe kit
WO2023201487A1 (en) Adapter, adapter ligation reagent, kit, and library construction method
CN116445478B (en) Primer combination for constructing IGHV gene library and application thereof
CN109609694B (en) Kit and method for detecting hepatitis B typing and multiple drug-resistant sites based on Illumina sequencing technology
Urmanov et al. ANALYSIS OF THE EVOLUTION OF TECHNOLOGIES FOR DETERMINING THE NUCLEOTIDE SEQUENCE OF A DNA MOLECULE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Lin Jian

Inventor after: Yang Jingmin

Inventor after: Gao Pengfei

Inventor after: Qin Zhendong

Inventor after: Tang Jiajie

Inventor after: Zhu Xueping

Inventor before: Lin Jian

Inventor before: Yang Jingmin

Inventor before: Qin Zhendong

Inventor before: Tang Jiajie

Inventor before: Zhu Xueping

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination