CN103710336B - Transcript enrichment method from RNA sample and applications thereof - Google Patents

Transcript enrichment method from RNA sample and applications thereof Download PDF

Info

Publication number
CN103710336B
CN103710336B CN201210379402.8A CN201210379402A CN103710336B CN 103710336 B CN103710336 B CN 103710336B CN 201210379402 A CN201210379402 A CN 201210379402A CN 103710336 B CN103710336 B CN 103710336B
Authority
CN
China
Prior art keywords
sequencing
sequence
rna
transcript
tss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210379402.8A
Other languages
Chinese (zh)
Other versions
CN103710336A (en
Inventor
祝珍珍
黄文潘
章文蔚
陈茂山
张艳艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Technology Solutions Co Ltd
Original Assignee
BGI Technology Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Technology Solutions Co Ltd filed Critical BGI Technology Solutions Co Ltd
Priority to CN201210379402.8A priority Critical patent/CN103710336B/en
Priority to PCT/CN2013/081581 priority patent/WO2014048185A1/en
Publication of CN103710336A publication Critical patent/CN103710336A/en
Application granted granted Critical
Publication of CN103710336B publication Critical patent/CN103710336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1096Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biomedical Technology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Plant Pathology (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a transcript enrichment method from an RNA sample and applications thereof. The transcript enrichment method includes treating the RNA sample by utilization of an enrichment agent so as to facilitate transcript enrichment, wherein the enrichment agent has the 5'-monophosphate excision enzyme activity, and the transcript relates to an RNA molecule the tail of which is provided with a cap sequence or a triphosphoric group. The transcript can be effectively enriched by utilization of the method.

Description

It is enriched with method of transcript and application thereof from RNA sample
Technical field
The present invention relates to biological technical field, specifically, the present invention relates to from RNA sample be enriched with transcript method and its Purposes, more specifically, the present invention relates to from RNA sample be enriched with transcript method, build sequencing library method, sequencing literary composition Storehouse, sample of nucleic acid sequence measurement, determine transcriptional start point (transcription start site, TSS) method, for from RNA sample is enriched with enrichment reagents, the device building sequencing library, sample of nucleic acid sequencing equipment and the determination TSS's of transcript System.
Background technology
The transcription of gene is from the beginning of RNA polymerase is combined with the promoter position of DNA profiling, then from transcription Point (transcription start site, referred to herein simply as:TSS) carry out transcription elongation, ultimately form complete RNA.The RNA molecule existing in organism, all from the beginning of TSS, is therefore studied TSS by high-flux sequence and is contributed to us Speculate position and the structure of promoter from full-length genome, thus global knowledge gene transcription regulation network.The research of TSS also helps In the new gene of the original gene annotation of correction or discovery.
However, the research for TSS at present, still have much room for improvement.
Content of the invention
It is contemplated that at least solving one of above-mentioned technical problem to a certain extent or providing at a kind of useful business Industry selects.For this reason, it is an object of the present invention to proposing one kind effectively can be enriched with transcript, and then can effectively determine The means of TSS.
The present invention is to be completed based on the following discovery of inventor:
The method studying TSS currently, with respect to high-flux sequence is typically directed to the RNA with cap sequence, using CAGE Or the method for RACE captures 5 ' ends of RNA molecule.Common are deepCAGE, PEAT, deep-RACE, nanoCAGE and CAGEscan.Wherein deepCAGE, PEAT, deep-RACE and CAGEscan need the troublesome operation such as enzyme action, the requirement to RNA Amount is very high, and the sequencing sequence producing(reads)Shorter(About 20nt), the RNA that is only applicable to there is cap sequence it is impossible to For there is no the research of the TSS of the protokaryon RNA of cap sequence.Although nanoCAGE operation is fairly simple, will to the usage amount of RNA Ask also low, but be also only applicable to the RNA with cap sequence, and in the data producing, false positive is relatively more.Invention Crinis Carbonisatus Now by adopting 5 ' monophosphate excision enzymes, the RNA of 5 ' monophosphates of can specifically degrading, retain and there are 5 ' medicated caps and 5 ' three phosphorus The complete RNA molecule of acid, is effectively applied to be enriched with transcript it is thus possible to apply simultaneously to eucaryon and protokaryon The high-flux sequence of the TSS of RNA, has simple to operate, the high many merits with low cost of accuracy rate.
In a first aspect of the present invention, the present invention proposes a kind of method being enriched with transcript from RNA sample.According to this Bright embodiment, the method that should be enriched with transcript from RNA sample includes:Using enrichment reagents, RNA sample is processed, so that Enrichment transcript, wherein, described enrichment reagents have 5 '-monophosphate 5 prime excision enzyme activity, and described transcript is in its 5 ' end tool There are cap sequence or the RNA molecule of 5 ' triphosphoric acids.Due to 5 ' monophosphate excision enzymes, 5 ' monophosphates of can specifically degrading RNA, has the complete RNA molecule of 5 ' medicated caps and 5 ' triphosphoric acids without degraded, thus circumscribed using having this 5 ' monophosphate The enrichment reagents of enzymatic activity, can be enriched with transcript effectively it is thus possible to apply simultaneously to the TSS of the RNA of eucaryon and protokaryon High-flux sequence, have simple to operate, accuracy rate is high and the many merits of low cost.
In a second aspect of the present invention, the present invention proposes a kind of method building sequencing library.Reality according to the present invention Apply example, the method for this structure sequencing library includes:According to foregoing method, it is enriched with transcript from RNA sample;Remove institute State 5 ' cap sequences or the 5 ' triphosphoric acids of transcript, to obtain the transcript of removal 5 ' cap sequence or 5 ' triphosphoric acids;Going 5 ' the ends except 5 ' cap sequences or the transcript of 5 ' triphosphoric acids connect RNA joint, to obtain the transcription being connected with RNA joint This;Reverse transcription is carried out to the transcript being connected with RNA joint, to obtain cDNA corresponding with described transcript;To described CDNA is expanded, to obtain amplified production;And it is based on described amplified production, build sequencing library.Thus, using the party Method, can effectively be directed to the transcript be enriched with sample of nucleic acid and build sequencing library it is thus possible to apply simultaneously to eucaryon With the high-flux sequence of the TSS of the RNA of protokaryon, have simple to operate, the high many merits with low cost of accuracy rate.
In a third aspect of the present invention, the present invention proposes a kind of sequencing library it is characterised in that being by foregoing Method builds.Using this sequencing library, can effectively rna transcription be originally sequenced, can apply simultaneously to eucaryon and The high-flux sequence of the TSS of the RNA of protokaryon, has simple to operate, the high many merits with low cost of accuracy rate.
In a fourth aspect of the present invention, the present invention proposes a kind of sample of nucleic acid sequence measurement.Enforcement according to the present invention Example, this sample of nucleic acid sequence measurement includes:According to foregoing method, build sequencing library;And to described sequencing library It is sequenced, to obtain sequencing result.Using the method, effectively rna transcription can be originally sequenced, can answer simultaneously For the high-flux sequence of eucaryon and the TSS of the RNA of protokaryon, have simple to operate, the high many merits with low cost of accuracy rate.
In a fifth aspect of the present invention, the present invention proposes a kind of method for determining TSS.Enforcement according to the present invention Example, this determines that the method for TSS includes:Extract RNA sample from host;Using foregoing method, obtain by multiple sequencing sequences The sequencing result that row are constituted;And it is based on described sequencing result, determine TSS.Using the method, can effectively determine and transcribe Beginning site, can apply simultaneously to the high-flux sequence of the TSS of the RNA of eucaryon and protokaryon, have simple to operate, accuracy rate high and The many merits of low cost.
In a sixth aspect of the present invention, the present invention proposes a kind of enrichment reagents for being enriched with transcript from RNA sample. According to embodiments of the invention, enrichment reagents have 5 '-monophosphate 5 prime excision enzyme activity.Using this enrichment reagents, can be effectively Enrichment transcript, it is thus possible to apply simultaneously to the high-flux sequence of the TSS of the RNA of eucaryon and protokaryon, has simple to operate, accurate The really high many merits with low cost of rate.
In a seventh aspect of the present invention, the present invention proposes a kind of device building sequencing library.Reality according to the present invention Apply example, the device of this structure sequencing library includes:Transcript enrichment unit, is provided with above institute in described transcript enriching apparatus The enrichment reagents stated, to be enriched with transcript from RNA sample;End trimming unit, described end trimming unit and described transcription This enrichment unit is connected, and is suitable to remove 5 ' cap sequences of described transcript or 5 ' triphosphoric acids, to obtain removal 5 ' cap Minor structure or the transcript of 5 ' triphosphoric acids;RNA joint connection unit, described RNA joint connection unit and end trimming unit phase Connect, and be suitable to connect RNA joints in 5 ' ends of the transcript removing 5 ' cap sequences or 5 ' triphosphoric acids, obtain to connect There is the transcript of RNA joint;Reverse transcription unit, described reverse transcription unit is connected with described RNA joint connection unit, and is suitable to Reverse transcription is carried out to the transcript being connected with RNA joint, to obtain cDNA corresponding with described transcript;Amplification unit, institute State amplification unit to be connected with described reverse transcription unit, and be suitable to described cDNA is expanded, to obtain amplified production;With And library construction unit, described library construction unit is connected with described amplification unit, and is suitable to based on described amplified production, structure Build sequencing library.Using this device, can effectively be directed to the transcript be enriched with sample of nucleic acid and build sequencing library, thus The high-flux sequence of the TSS of the RNA of eucaryon and protokaryon can be applied simultaneously to, have simple to operate, accuracy rate is high and low cost Many merits.
In a eighth aspect of the present invention, the present invention proposes a kind of sample of nucleic acid sequencing equipment it is characterised in that including: Library construction device, described library construction device is foregoing device, to build sequencing library for sample of nucleic acid;With And sequencing device, described sequencing device is connected with described library construction device, and is suitable to described sequencing library is sequenced, So that acquisition sequencing result.Using this device, effectively rna transcription can be originally sequenced, eucaryon can be applied simultaneously to With the high-flux sequence of the TSS of the RNA of protokaryon, have simple to operate, the high many merits with low cost of accuracy rate.
In a ninth aspect of the present invention, the present invention proposes a kind of system of determination TSS.According to embodiments of the invention, This system includes:Sample extraction equipment, described sample extraction equipment is used for extracting RNA sample from host;Sample of nucleic acid sequencing sets Standby, described sample of nucleic acid sequencing equipment is connected with described sample extraction equipment, and described sequencing equipment is foregoing core Acid sample sequencing equipment, to be sequenced for described RNA sample, thus obtain the sequencing knot being made up of multiple sequencing sequences Really;And TSS determines equipment, described TSS determines that device is connected with described sequencing equipment, and is suitable to based on described sequencing knot Really, determine TSS.According to embodiments of the invention, can effectively determine the TSS in sample of nucleic acid using this system.
The additional aspect of the present invention and advantage will be set forth in part in the description, and partly will become from the following description Obtain substantially, or recognized by the practice of the present invention.
Brief description
The above-mentioned and/or additional aspect of the present invention and advantage will become from reference to the description to embodiment for the accompanying drawings below Substantially and easy to understand, wherein:
Fig. 1 shows the schematic flow sheet of the method building sequencing library according to an embodiment of the invention;
Fig. 2 shows the bioinformatics analysis schematic flow sheet of determination TSS sequence according to an embodiment of the invention;
Fig. 3 shows the schematic diagram of the system of determination TSS having an embodiment according to the present invention;
Fig. 4 shows the schematic diagram of sample of nucleic acid sequencing equipment according to an embodiment of the invention;
Fig. 5 shows the schematic diagram of the device building sequencing library according to an embodiment of the invention;
Fig. 6 shows the schematic diagram of the equipment of determination TSS having an embodiment according to the present invention;
Fig. 7 shows according to one embodiment of the invention, distribution on genome for the TSS after screening, upper figure and figure below It is the TSS scattergram of people RNA and e. coli rna sample respectively, wherein 0 is the initiation site of gene coding region, and its upstream is just It is the site of transcription initiation, it can be seen that most sequence all falls in the upstream of gene coding region;
Fig. 8 shows according to one embodiment of the invention, illustrates the TSS collection of illustrative plates of 8 people's RNA sample, permissible from figure See the distribution situation of TSS in different samples;Fig. 9 shows according to one embodiment of the invention, the base distribution figure of TSS upstream Shape, wherein abscissa 1 be corresponding be exactly TSS position, based on purine(A/G).Upper figure is the TSS upstream alkaline of people's RNA sample Base scattergram, has obvious GC enrichment region, and this is also the main promoter and enhancer of eukaryote;Figure below is e. coli rna sample The TSS upstream base scattergram of product, also can find typical TATA box at its upstream at -10th area;
Figure 10 show according to one embodiment of the invention, the distribution of lengths of 5 ' UTR, that is, TSS to coding region away from From.Upper figure is the distribution of lengths of people's RNA sample 5 ' UTR, and figure below is the distribution of lengths of e. coli rna sample 5 ' UTR;
Figure 11 shows that correlation analysiss can obtain the assessment to experimental result reliability and operational stability, as Figure 11 institute Show, upper figure is the repetition twice of people's RNA sample, figure below is the repetition twice of e. coli rna sample;And
Figure 12 is it is shown that result schematic diagram according to embodiments of the invention predicted gene.Upper figure is two genes of people The TSS distribution of NM_018997 and NM_031901, they are the genes that there occurs variable sheer, and in figure redness vertical line represents screening TSS, the vertical line of black be filter before the sequence that obtains, blue horizontal line represents the exon of gene, and yellow horizontal line is gene Intron;Figure below be one operator of escherichia coli TSS distribution, protokaryon there is not intron, so only representing gene Blue horizontal line, 4 genes of this operator have a TSS.
Specific embodiment
Embodiments of the invention are described below in detail, the example of described embodiment is shown in the drawings, wherein from start to finish The element that same or similar label represents same or similar element or has same or like function.Below with reference to attached The embodiment of figure description is exemplary it is intended to be used for explaining the present invention, and is not considered as limiting the invention.
In the present invention, unless otherwise clearly defined and limited, term " installation ", " being connected ", " connection ", " fixation " etc. Term should be interpreted broadly, for example, it may be being fixedly connected or being detachably connected, or is integrally connected;It can be machine Tool connects or electrically connects;Can be to be joined directly together it is also possible to be indirectly connected to by intermediary, can be two units Connection within part.For the ordinary skill in the art, above-mentioned term can be understood as the case may be at this Concrete meaning in bright.In addition, the term " upstream " " downstream " herein being used is to hold to 3 ' the direction institutes held according to 5 ' Determine.
The present invention is to be completed based on the following discovery of inventor:
The method studying TSS currently, with respect to high-flux sequence is typically directed to the RNA with cap sequence, using CAGE Or the method for RACE captures 5 ' ends of RNA molecule.Common are deepCAGE, PEAT, deep-RACE, nanoCAGE and CAGEscan.Wherein deepCAGE, PEAT, deep-RACE and CAGEscan need the troublesome operation such as enzyme action, the requirement to RNA Amount is very high, and the sequencing sequence producing(reads)Shorter(About 20nt), the RNA that is only applicable to there is cap sequence it is impossible to For there is no the research of the TSS of the protokaryon RNA of cap sequence.Although nanoCAGE operation is fairly simple, will to the usage amount of RNA Ask also low, but be also only applicable to the RNA with cap sequence, and in the data producing, false positive is relatively more.Invention Crinis Carbonisatus Now by adopting 5 ' monophosphate excision enzymes, the RNA of 5 ' monophosphates of can specifically degrading, retain and there are 5 ' medicated caps and 5 ' three phosphorus The complete RNA molecule of acid, is effectively applied to be enriched with transcript it is thus possible to apply simultaneously to eucaryon and protokaryon The high-flux sequence of the TSS of RNA, has simple to operate, the high many merits with low cost of accuracy rate.
In a first aspect of the present invention, the present invention proposes a kind of method being enriched with transcript from RNA sample.According to this Inventive embodiment, the method that should be enriched with transcript from RNA sample includes:Using enrichment reagents, RNA sample is processed, with Just it is enriched with transcript, wherein, described enrichment reagents have 5 '-monophosphate 5 prime excision enzyme activity, and described transcript is in its 5 ' end There is the RNA molecule of cap sequence or triphosphoric acid.According to embodiments of the present invention, there is the example of the enzyme of 5 ' monophosphate 5 prime excision enzyme activities Son can include:Exoribonuclease XRN-1, TerminatorTMDepend on the exonuclease of 5 ' phosphoric acid (TerminatorTM5′-Phosphate-Dependent Exonuclease)Or TAKARATMAlkali phosphatase (TAKARATMAlkaline Phosphatase).Due to 5 ' monophosphate excision enzymes, 5 ' monophosphates of can specifically degrading RNA, has the complete RNA molecule of 5 ' medicated caps and 5 ' triphosphoric acids without degraded, thus circumscribed using having this 5 ' monophosphate The enrichment reagents of enzymatic activity, can be enriched with transcript effectively it is thus possible to apply simultaneously to the TSS of the RNA of eucaryon and protokaryon High-flux sequence, have simple to operate, accuracy rate is high and the many merits of low cost.
According to embodiments of the invention, the method that should be enriched with transcript from RNA sample can have 5 ' single phosphorus using any The enrichment reagents of sour 5 prime excision enzyme activity.According to embodiments of the present invention, the example with the enzyme of 5 ' monophosphate 5 prime excision enzyme activities is permissible Including:Exoribonuclease XRN-1, TerminatorTMDepend on exonuclease or the TAKARA of 5 ' phosphoric acidTMAlkalescence Phosphatase.According to one embodiment of present invention, described enrichment reagents contain DNase I.Thus, it is possible to improve degraded further The specificity and efficiency of the RNA of 5 ' monophosphates, thus improve the efficiency of enrichment transcription this method further.According to the present invention one Individual embodiment, described enrichment reagents can also contain buffer and soluble-salt further, to improve DNase I's further Enzymatic activity.According to one embodiment of present invention, the pH of described enrichment reagents is 8.0.According to one embodiment of present invention, institute Stating buffer is Tris-HCl, and described soluble-salt is at least one selected from sodium chloride and magnesium chloride.According to the present invention one Individual embodiment, under 30 degrees Celsius, is processed to described RNA sample using described enrichment reagents.It is thus possible to carry further Usury enrichment reagents according to embodiments of the present invention carry out being enriched with the efficiency of transcript.According to embodiments of the present invention, have 5 ' The example of the enzyme of monophosphate 5 prime excision enzyme activity can include:Exoribonuclease XRN-1, TerminatorTMDepend on 5 ' phosphorus The exonuclease of acid or TAKARATMAlkali phosphatase.
In a second aspect of the present invention, the present invention proposes a kind of method building sequencing library.With reference to Fig. 1, according to this Inventive embodiment, the method for this structure sequencing library includes:
S100(Enrichment transcript):According to foregoing method, it is enriched with transcript from RNA sample.With regard to this step, front Face has been carried out describing in detail, will not be described here.
S200(End is repaired):Remove 5 ' cap sequences or the 5 ' triphosphoric acids of described transcript, to obtain removal 5 ' cap Minor structure or the transcript of 5 ' triphosphoric acids.According to one embodiment of present invention, remove described transcription using end finishing reagent This 5 ' cap sequences or 5 ' triphosphoric acids, wherein, described end finishing reagent has tobacco acid pyrophosphatase activity.According to this A bright embodiment, described finishing reagent comprises:Tobacco acid pyrophosphatase, soluble-salt, EDTA, beta -mercaptoethanol and Triton-X 100.According to one embodiment of present invention, described soluble-salt is Sodium Acetate Trihydrate.An enforcement according to the present invention Example, the pH of described finishing reagent is 7.5.Thus, it is possible to improve the effect that RNA is carried out with end finishing further, can have 5 ' the cap sequences of removal transcript of effect or 5 ' triphosphoric acids, thus improve the efficiency building sequencing library.
S300(Jointing):Connect RNA joint in 5 ' ends of the transcript removing 5 ' cap sequences or 5 ' triphosphoric acids, To obtain the transcript being connected with RNA joint.According to one embodiment of present invention, using connecting reagent, 5 ' caps are being removed 5 ' ends of the transcript of minor structure or 5 ' triphosphoric acids connect RNA joint, and wherein, described connection reagent has T4RNA ligase Activity.According to one embodiment of present invention, described connection reagent comprises:T4RNA ligase, buffer, soluble-salt, two sulfur Threitol.According to one embodiment of present invention, the described pH connecting reagent is 7.5.According to one embodiment of present invention, institute Stating buffer is Tris-HCl.According to one embodiment of present invention, described soluble-salt is magnesium chloride.According to the present invention one Individual embodiment, under 30 degrees Celsius, using connecting reagent, at 5 ' ends of the transcript removing 5 ' cap sequences or 5 ' triphosphoric acids End connects RNA joint.Thus, it is possible to improve the efficiency of jointing, thus improving the efficiency building sequencing library.
S400(Reverse transcription):Reverse transcription is carried out to the transcript being connected with RNA joint, to obtain and described transcript pair The cDNA answering.According to embodiments of the invention, carry out the reverse transcription primer that reverse transcription adopted and have and described RNA in its end The corresponding sequence of joint, thus, obtained cDNA also will have joint in its end, consequently facilitating follow-up library build and Sequencing.The implication of the term " corresponding with RNA joint " herein being used refers to, the sequence energy comprising in reverse transcription primer Enough mate with RNA joint, and amplified reaction can be carried out, thus obtaining the cDNA two ends with joint.For example, exist Carry out comprising in one of two reverse transcription primer of reverse transcription and one of RNA joint identical sequence, and in another reverse transcription In primer, then it is contained in the complementary sequence of another RNA joint.According to one embodiment of present invention, described reverse transcription adopts There is SEQ ID NO:The oligonucleotide of sequence shown in 1 is as reverse transcription primer.According to one embodiment of present invention, described anti- Transcription primers(SEQ ID NO:1)In at least one N by thio-modification, such that it is able to prevent this primer by nuclease degradation.Root According to one embodiment of the present of invention, described reverse transcription primer(SEQ ID NO:1)Middle penultimate N is by thio-modification.
S500(Amplification):Described cDNA is expanded, to obtain amplified production.Those skilled in the art can pass through Any of method is expanded, for example can be by conventional PCR method it is only necessary to be carried out using the sequence according to joint Design corresponding primer.
S600(Library construction):Based on described amplified production, build sequencing library.Those skilled in the art can be according to institute Expect using sequence measurement come for amplified production, the operation that those skilled in the art may be referred to manufacturer and provided is said Bright, here is not repeating.It should be noted that the amplified production obtained by being processed using the method according to the invention, can To be applied to Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology and nano-pore sequencing technology, such that it is able to realize High-flux sequence.
Thus, using the method, can effectively be directed to the transcript be enriched with sample of nucleic acid and build sequencing library, because And the high-flux sequence of the TSS of the RNA of eucaryon and protokaryon can be applied simultaneously to, have simple to operate, accuracy rate is high and cost Low many merits.In addition, it is necessary to explanation, between above-mentioned various processes, can optionally include purified product Step, according to embodiments of the invention, purifying RNA can adopt phenol/chloroform/isoamyl alcohol(Volume ratio is 25:24:1)Extracting, Ethanol precipitation, is to remove the enzyme in reactant mixture, in order to avoid the reaction of impact next step, and use ethanol precipitation, also The transcript of some small molecules can be retained, such as microRNA, so that the TSS information of this part of non-coding RNA is obtained, thus Help understand transcriptional control state.
In a third aspect of the present invention, the present invention proposes a kind of sequencing library it is characterised in that being by foregoing Method builds.Using this sequencing library, can effectively rna transcription be originally sequenced, can apply simultaneously to eucaryon and The high-flux sequence of the TSS of the RNA of protokaryon, has simple to operate, the high many merits with low cost of accuracy rate.
In a fourth aspect of the present invention, the present invention proposes a kind of sample of nucleic acid sequence measurement.Enforcement according to the present invention Example, this sample of nucleic acid sequence measurement includes:According to foregoing method, build sequencing library;And to described sequencing library It is sequenced, to obtain sequencing result.Using the method, effectively rna transcription can be originally sequenced, can answer simultaneously For the high-flux sequence of eucaryon and the TSS of the RNA of protokaryon, have simple to operate, the high many merits with low cost of accuracy rate. According to embodiments of the invention, described sequencing utilizes Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system System, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology and nano-pore are surveyed At least one of sequence technology is carried out.Thereby, it is possible to the feature using the high flux of these sequencing devices, deep sequencing, enter one Step improves the efficiency of sequencing.In one embodiment of the invention, described sequencing is to be entered using Illumina Hiseq2000 Row.
According to the fifth aspect of the invention, the present invention proposes a kind of method of determination TSS.Enforcement according to the present invention Example, this determines that the method for TSS includes:Extract RNA sample from host;Using foregoing method, obtain by multiple sequencing sequences The sequencing result that row are constituted;And it is based on described sequencing result, determine TSS.Using the method, can effectively determine nucleic acid sample TSS in this.
According to one embodiment of present invention, described RNA sample is at least a portion of the total serum IgE of host.According to this Bright embodiment, host can be eukaryote, such as people or prokaryote, such as escherichia coli.
According to one embodiment of present invention, based on described sequencing result, determine TSS, further include:By described sequencing Data is compared with reference sequences;
Based on comparison result, determine described transcriptional start point,
Wherein, comprise at least a portion of 5 '-UTR sequence of intended gene in described reference sequences, be selected to and institute State reference sequences to upper and sequencing sequence in described reference sequences most upstream as positive sequence, and determine the described positive First bit base of sequence is as described transcriptional start site.Term " intended gene " used herein above refers in reference On genome, preset the scope potentially including of series of genes, these genes are probably known or not Know and speculate out by bioinformatics.According to embodiments of the invention, the length of reference sequences is not particularly restricted, According to embodiments of the invention, reference sequences include at least the translation initiation site of intended gene and its sequence of upstream predetermined length Row.Because transcriptional start site is in the upstream in translation site, thus, by selecting the length of reference sequences, can will transcribe Beginning site is included therein.For example, according to embodiments of the invention, for prokaryotic hosts, described reference sequences comprise described pre- Determine the nucleotide sequence between the translation initiation site of gene and this translation initiation site upstream 700bp site, for eucaryon host, Described reference sequences comprise between the translation initiation site of described intended gene and this translation initiation site upstream 5000bp site Nucleotide sequence.
According to one embodiment of present invention, described comparison can be carried out using SOAPAlignment.In the present invention, By a kind of short sequence mapping Programm oapalignmentv2.2, the clean sequence fragment that high throughput sequencing technologies are obtained Compare the mispairing that base is not allowed on reference gene group and reference gene sequence respectively.Reference gene group sequence and reference base Because sequence can be taken at public database.
According to one embodiment of present invention, further include described positive sequence is screened, wherein said screening Principle be:The number of described positive sequence is N times comparing to the sequencing sequence number meansigma methodss within described intended gene More than, wherein said N is real number more than 1 it is preferable that described N is at least 10 real number.According to embodiments of the invention, than To rear, can first comparison result be screened, to obtain reliably TSS information.Screening technique is:Assume clean sequence ratio To gene(Sequence section corresponding with intended gene)First position be original TSS, but these sequences are likely to be The inside comparing gene becomes false-positive TSS, so needing to be further carried out filtering.The method can make the sequence of acquisition It is listed in 5 ' end enrichments of gene, the sequence number of therefore real TSS can be higher than the average falling in the sequence of gene internal, in It is to introduce a multiple N between them to filter TSS, if the sequence number of the TSS screening falls in corresponding gene internal sequence N times of columns meansigma methodss is just regarded as real TSS.According to embodiments of the invention, N can be at least 10 reality Number.
According to one embodiment of present invention, further include to carry out X 2 test to the selection result.According to the present invention's One embodiment, when the test value of described X 2 test is more than 3.84, that is, confidence level is more than 95%.Enforcement according to the present invention Example, after filtration, verifies its reliability for filter result this method, specifically, in a upper embodiment using X 2 test On the basis of, calculate the meansigma methodss of the corresponding multiple of all of TSS, and their standard deviation, after standardization, use following formula Calculate chi-square value:Work as according to finding in X 2 test table Confidence level is that when 0.95, chi-square value is 3.84 it is possible to obtain the TSS that reliability is more than 95%, the card side calculating according to formula Value is necessary for more than 3.84.
In addition, according to embodiments of the invention, can also can also include to sequencing after carrying out obtaining sequencing result Sequence removes underproof sequence, the step obtaining clean sequencing sequence.According to embodiments of the invention, underproof sequence Including:
Sequencing quality is considered unqualified less than 50% that the base number of a certain threshold values exceedes whole piece series number Sequence.Depending on low quality threshold values is by concrete sequencing technologies and sequencing environment;
The uncertain base of sequencing result in sequence(As the N in Illumina Hiseq2000 sequencing result)Number exceedes 10% of whole piece series number is considered unqualified sequence;
In addition to sample joint sequence, the exogenous array being introduced with other experiments is compared, such as various terminal sequence.If in sequence There is exogenous array and be then considered unqualified sequence.
Original sequence data after removing unqualified series processing the sequence data that obtains we be referred to as clean sequence Column-slice section(clean reads), can be as the basis of subsequent analysis, thus, it is possible to improve the effectiveness of subsequent analysis.
In addition, after X 2 test, a series of analysis of biological information is carried out to the reliable result of checking, such as:
1)TSS(Transcriptional start site)Classification:According to embodiments of the invention, the TSS of screening can be divided into two big Class, a class is to compare on genome and have the TSS of corresponding gene annotation, referred to as annotated TSS;Another kind of is energy Compare on genome but do not having annotated gene information about, the TSS referred to as not annotated, can be used for new gene Prediction.
2)TSS annotates:Here mainly the TSS falling in known is annotated, including the expression of TSS, residing for TSS Position, and corresponding gene annotation information.
3)Build TSS collection of illustrative plates:According to embodiments of the invention, the TSS that same species find in the method can be used Displaying of the formal intuition of picture forms TSS collection of illustrative plates, can intuitively find out very much the position that each TSS is located from collection of illustrative plates And their expression.Simultaneously it can also be seen that the TSS in different sample expresses, the difference of distribution.
4)Promoter region is found and 5 ' UTR length statistics.
5)Experimental repeatability is analyzed:According to embodiments of the invention, the results relevance analysis to parallel laboratory test twice can Obtain the assessment to experimental result reliability and operational stability.
6)New gene is predicted:According to embodiments of the invention, for the TSS nearby not finding reference gene, can be by Sequential extraction procedures near these TSS out carry out predictive genes.Prokaryote is predicted with glimmer, and eukaryote is used Genscan is predicted.
7)Data visualization:According to embodiments of the invention, using analysis result, can for gene interested or The TSS distribution mapping in region is observed.
In a sixth aspect of the present invention, the present invention proposes a kind of enrichment reagents for being enriched with transcript from RNA sample. According to embodiments of the invention, enrichment reagents have 5 '-monophosphate 5 prime excision enzyme activity.Using this enrichment reagents, can be effectively Enrichment transcript, it is thus possible to apply simultaneously to the high-flux sequence of the TSS of the RNA of eucaryon and protokaryon, has simple to operate, accurate The really high many merits with low cost of rate.According to one embodiment of present invention, described enrichment reagents contain DNase I.Thus, The specificity and efficiency of the RNA of degraded 5 ' monophosphate can be improved further, thus improving the method for enrichment transcript further Efficiency.According to one embodiment of present invention, described enrichment reagents can also contain buffer and soluble-salt further, with Just improve the enzymatic activity of DNase I further.According to one embodiment of present invention, the pH of described enrichment reagents is 8.0.According to One embodiment of the present of invention, described buffer is Tris-HCl, and described soluble-salt is selected from sodium chloride and magnesium chloride extremely Few one kind.According to one embodiment of present invention, under 30 degrees Celsius, using described enrichment reagents, described RNA sample is carried out Process.It is thus possible to improve further carry out being enriched with the efficiency of transcript using enrichment reagents according to embodiments of the present invention.Root According to the embodiment of the present invention, the example with the enzyme of 5 ' monophosphate 5 prime excision enzyme activities can include:Exoribonuclease XRN-1, TerminatorTMDepend on exonuclease or the TAKARA of 5 ' phosphoric acidTMAlkali phosphatase.
In a seventh aspect of the present invention, the present invention proposes a kind of device building sequencing library.With reference to Fig. 5, according to this Inventive embodiment, the device of this structure sequencing library includes:Transcript enrichment unit 211, end trimming unit 212, RNA connect Head connection unit 213, reverse transcription unit 214, amplification unit 215 and library construction unit 216.Enforcement according to the present invention Example, is provided with foregoing enrichment reagents in transcript enrichment unit 211, to be enriched with transcript from RNA sample;End is repaiied Whole unit 212 is connected with described transcript enrichment unit 211, and is suitable to remove 5 ' cap sequences of described transcript or 5 ' three Phosphoric acid, to obtain the transcript of removal 5 ' cap sequence or 5 ' triphosphoric acids;RNA joint connection unit 213 is single with end finishing Unit 212 is connected, and is suitable to connect RNA joint in 5 ' ends of the transcript removing 5 ' cap sequences or 5 ' triphosphoric acids, so that Obtain the transcript being connected with RNA joint;Reverse transcription unit 214 is connected with described RNA joint connection unit 213, and is suitable to Reverse transcription is carried out to the transcript being connected with RNA joint, to obtain cDNA corresponding with described transcript;Amplification unit 215 It is connected with described reverse transcription unit 214, and is suitable to described cDNA is expanded, to obtain amplified production;Library construction Unit 216 is connected with described amplification unit 215, and is suitable to, based on described amplified production, build sequencing library.Using this dress Put, can effectively be directed to the transcript be enriched with sample of nucleic acid and build sequencing library it is thus possible to apply simultaneously to eucaryon With the high-flux sequence of the TSS of the RNA of protokaryon, have simple to operate, the high many merits with low cost of accuracy rate.According to this A bright embodiment, is provided with end finishing reagent, wherein, reagent is repaired in described end in described end trimming unit 212 There is tobacco acid pyrophosphatase activity.According to one embodiment of the present of invention, described finishing reagent comprises:Tobacco acid pyrophosphatase, Soluble-salt, EDTA, beta -mercaptoethanol and Triton-X 100.According to one embodiment of the present of invention, described soluble-salt is vinegar Sour sodium.According to one embodiment of the present of invention, the pH of described finishing reagent is 7.5.According to one embodiment of the present of invention, described reversion It is provided with SEQ ID NO in record unit 214:The oligonucleotide of sequence shown in 1 is as reverse transcription primer.According to the present invention one Individual embodiment, in described reverse transcription primer, at least one N is by thio-modification.According to one embodiment of the present of invention, described reverse transcription In primer, penultimate N is by thio-modification.According to one embodiment of the present of invention, set in described RNA joint connection unit 213 It is equipped with connection reagent, wherein, described connection reagent has T4RNA and connects enzymatic activity.According to one embodiment of the present of invention, described company Connect reagent to comprise:T4RNA ligase, buffer, soluble-salt, dithiothreitol, DTT.According to one embodiment of the present of invention, described company The pH connecing reagent is 7.5.According to one embodiment of the present of invention, described buffer is Tris-HCl.An enforcement according to the present invention Example, described soluble-salt is magnesium chloride.
In a eighth aspect of the present invention, the present invention proposes a kind of sample of nucleic acid sequencing equipment.With reference to Fig. 4, according to this Bright embodiment, this equipment includes:Library construction device 210, described library construction device 210 is foregoing device, with Just it is directed to sample of nucleic acid and build sequencing library;And sequencing device 220, described sequencing device 220 and described library construction device 210 are connected, and are suitable to described sequencing library is sequenced, to obtain sequencing result.Using this equipment, can be effective Rna transcription is originally sequenced, the high-flux sequence of the TSS of the RNA of eucaryon and protokaryon can be applied simultaneously to, there is operation letter Single, the high many merits with low cost of accuracy rate.According to embodiments of the invention, described sequencing equipment is Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS At least one of sequencing system, Helicos tSMS system and nano-pore sequencing system.
In a ninth aspect of the present invention, the present invention proposes a kind of system of determination TSS.With reference to Fig. 3, according to the present invention's Embodiment, this system includes:Sample extraction equipment 100, described sample extraction equipment is used for extracting RNA sample from host;Nucleic acid Sample sequencing equipment 200, described sample of nucleic acid sequencing equipment is connected with described sample extraction equipment, and described sequencing equipment is Foregoing sample of nucleic acid sequencing equipment, to be sequenced for described RNA sample, thus obtain by multiple sequencing sequences The sequencing result constituting;And TSS determines equipment 300, described TSS determines that equipment 300 is connected with described sequencing equipment 200, and And be suitable to, based on described sequencing result, determine TSS.According to embodiments of the invention, can effectively determine core using this system TSS in acid sample.With reference to Fig. 6, according to one embodiment of present invention, described TSS determines that equipment further includes:Compare Device 310, described comparison device is used for described sequencing data and reference sequences are compared;Determine device 320, described determination Device is suitable to based on comparison result, determines described TSS, wherein, comprises 5 '-UTR sequence of intended gene in described reference sequences At least one, and, described determination device 320 be suitable to:It is selected to compare described sequence section corresponding with intended gene And the sequencing sequence that closest described sequence section 5 ' corresponding with intended gene is held is as positive sequence, and determines the described positive First base of sequence is transcriptional start site.According to one embodiment of present invention, described comparison device is suitable for use with SOAP Alignment carries out described comparison.According to one embodiment of present invention, described determination device further includes screening unit, Described screening unit is suitable to described positive sequence is screened, and the principle of wherein said screening is:The sequence of described positive sequence Column number is more than N times of described sequence intrasegmental part sequence number meansigma methodss corresponding with intended gene, and wherein said N is more than 1 Real number it is preferable that N can be at least 10 real number.According to one embodiment of present invention, described determination device is further Including verification unit, described verification unit is suitable to carry out X 2 test to the selection result.According to one embodiment of present invention, institute The test value stating X 2 test is more than 3.84, and corresponding confidence level is more than 95%.
The term " intended gene " being used in the present invention should be interpreted broadly, and it can refer to any of gene, Can also refer to by known method, prediction may encoding proteins nucleotide sequence.
Below in conjunction with embodiment, the solution of the present invention is explained.It will be understood to those of skill in the art that it is following Embodiment is merely to illustrate the present invention, and should not be taken as limiting the scope of the invention.Unreceipted particular technique or bar in embodiment Part, according to the technology described by document in the art or condition(For example write with reference to J. Pehanorm Brooker etc., Huang Peitang etc. is translated 's《Molecular Cloning:A Laboratory guide》, the third edition, Science Press)Or carry out according to product description.Agents useful for same or instrument Unreceipted production firm person, be can by city available from conventional products, for example can purchase from Illumina company.
Conventional method
The method being adopted in an embodiment mainly includes TSS library construction and sequencing post analysis, wherein TSS library structure Construction method mainly comprises the steps:
(1)Take total serum IgE sample, after DNaseI digestion, the postdigestive RNA of ethanol precipitation purification;
(2)Will(1)The RNA obtaining and reagent I mixes reaction, and enrichment contains 5 ' medicated caps or the RNA of 5 ' triphosphoric acids;
(3)Phenol/chloroform/isoamyl alcohol(25:24:1)Extracting and purifying(2)The RNA obtaining;
(4)Will(3)RNA after purification and reagent II mixes reaction, removes 5 ' medicated caps or 5 ' triphosphoric acids obtain 5 ' monophosphates;
(5)Phenol/chloroform/isoamyl alcohol(25:24:1)Extracting and purifying(4)The RNA obtaining;
(6)Will(5)RNA add RNA joint, and with reagent III mix reaction, obtaining 5 ' end plus joint RNA;
(7)Will with specific reverse transcription primer(6)RNA reverse transcription, obtaining two ends has the cDNA of particular sequence joint And use magnetic beads for purifying;
(8)Using polymerase chain reaction(PCR)Amplification(7)The cDNA fragment of gained two ends adjunction head, pure using magnetic bead Change PCR primer;
(9)Using Agilent Bioanalyzer 2100 and Q-PCR detection library concentration and clip size.
Step(1)In, the amount of total serum IgE is 5 μ g.
Step(2)In, reagent I, contains:1 μ L 5 ' monophosphate excision enzyme(1U/μL), 50mM buffer salt, 2mM-100mM can Soluble, pH 8.0, solvent is water.In reagent I, buffer salt is Tris-HCl.In reagent I, soluble-salt is sodium chloride or chlorination Magnesium.Step(2)Middle gained RNA is 30 DEG C with reagent I mixing temperature.
Described step(4)In, reagent II contains:0.2 μ L tobacco acid pyrophosphatase(10U/μL), 50mM soluble-salt, pH 6.0,1mM EDTA, 0.1% beta -mercaptoethanol, 0.01%Triton X-100, solvent is water.In reagent II, soluble-salt is vinegar Sour sodium.Sample is 37 DEG C with reagent II mixing temperature.
Described step(6)In, reagent III contains:1 μ L T4 RNA ligase 1,50mM buffer salt, 10mM soluble-salt, 1mM dithiothreitol, DTT, pH 7.5, solvent is water.In reagent III, buffer salt is Tris-HCl.In reagent III, soluble-salt is chlorine Change magnesium.Step(6)Middle gained RNA is 20 DEG C with reagent III mixing temperature.
Described step(7)In specific reverse transcription primer sequence used be:5- GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNN-3, wherein penultimate N does thio-modification.
Described step(3)With(5)Afterwards, purifying RNA is all using phenol/chloroform/isoamyl alcohol extraction, ethanol precipitation, be for Remove the enzyme in reactant mixture, in order to avoid the reaction of impact next step, and use ethanol precipitation moreover it is possible to retain some little point The transcript of son, such as microRNA, so that the TSS information of this part of non-coding RNA is obtained, thus helping understand that transcription is adjusted Control state.
With reference to Fig. 2, to data produced by the sequencing of TSS library, carry out analysis of biological information, comprise the following steps:
(1)Filter sequencing sequence;
In the present invention, after receiving high-flux sequence sequence, sequencing sequence is filtered, remove underproof sequence Row.Wherein high throughput sequencing technologies can be Illumina Hiseq2000 sequencing technologies or other high passes existing Amount sequencing technologies.
Unqualified sequence includes:The base number that sequencing quality is less than a certain threshold values exceedes whole piece series number 50% is considered unqualified sequence.Depending on low quality threshold values is by concrete sequencing technologies and sequencing environment;In sequence, sequencing result is not The base determining(As the N in Illumina Hiseq2000 sequencing result)Number exceedes 10% of whole piece series number It is considered unqualified sequence;In addition to sample joint sequence, the exogenous array being introduced with other experiments is compared, such as various terminal sequence Row.If there is exogenous array in sequence, it is considered unqualified sequence.Original sequence data is through removing at unqualified sequence The sequence data obtaining after reason we be referred to as clean sequence fragment(clean reads), as the basis of subsequent analysis.
(2)Clean sequence fragment and reference sequences compare;
In the present invention, by a kind of short sequence mapping Programm oapalignment v2.2, by high throughput sequencing technologies The clean sequence fragment obtaining compares the mispairing not allowing base in reference gene group and reference gene sequence respectively.Ginseng Examine genome sequence and reference gene sequence can be taken at public database.
(3)After comparison, first comparison result is screened, to obtain reliably TSS information.Screening technique is:Assume First position of clean sequence alignment to genome is original TSS, but these sequences are likely to be comparison and arrive gene Inside become false-positive TSS, so need be further carried out filter.The method can make the sequence of our acquisitions in base 5 ' end enrichments of cause, the sequence number of therefore real TSS can be higher than the average falling in the sequence of gene internal, then at him Between introduce multiple N and filter TSS, if the sequence number of the TSS screening falls putting down in corresponding gene internal sequence number N times of average is just regarded as real TSS.
(4)After filtration, its reliability is verified using X 2 test for filter result this method, that is, X 2 test value should Should be that confidence level is more than 95% more than 3.84.
(5)After X 2 test, a series of analysis of biological information is carried out to the reliable result of checking, such as:
1)TSS(Transcriptional start site)Classification:According to embodiments of the invention, the TSS of screening can be divided into two big Class, a class is to compare on genome and have the TSS of corresponding gene annotation, referred to as annotated TSS;Another kind of is energy Compare on genome but do not having annotated gene information about, the TSS referred to as not annotated, can be used for new gene Prediction.
2)TSS annotates:Here mainly the TSS falling in known is annotated, including the expression of TSS, residing for TSS Position, and corresponding gene annotation information.
3)Build TSS collection of illustrative plates:According to embodiments of the invention, the TSS that same species find in the method can be used Displaying of the formal intuition of picture forms TSS collection of illustrative plates, can intuitively find out very much the position that each TSS is located from collection of illustrative plates And their expression.Simultaneously it can also be seen that the TSS in different sample expresses, the difference of distribution.
4)Promoter region is found and 5 ' UTR length statistics.
5)Experimental repeatability is analyzed:According to embodiments of the invention, the results relevance analysis to parallel laboratory test twice can Obtain the assessment to experimental result reliability and operational stability.
6)New gene is predicted:According to embodiments of the invention, for the TSS nearby not finding reference gene, can be by Sequential extraction procedures near these TSS out carry out predictive genes.Prokaryote is predicted with glimmer, and eukaryote is used Genscan is predicted.
7)Data visualization:According to embodiments of the invention, using analysis result, can for gene interested or The TSS distribution mapping in region is observed.
Embodiment 1 people's RNA sample and the transcriptional start site sequence analysis of e. coli rna sample
People's RNA sample(Sample one)Purchased from Agilent company, e. coli rna(Sample two)Be by escherichia coli cultivate to The RNA extracting after exponential phase.Take the total serum IgE of 1-5 μ g, digested with DNaseI, ethanol precipitation purification, after purification RNA and reagent I mixes reaction, and enrichment obtains the global RNA containing 5 ' medicated caps or 5 ' triphosphoric acids, taken out with phenol/chloroform/isoamyl alcohol After purification, mix reaction with reagent II, remove the 5 ' medicated caps held or triphosphoric acid is allowed to become monophosphate, with phenol/chloroform/different Amylalcohol extracting and purifying, the RNA of 5 ' monophosphates and reagent III and RNA joint are mixed reaction, add joint at RNA 5 ' end, use Specific reverse transcription primer will carry the cDNA of fixed sequence program added with the RNA reverse transcription of 5 ' joints for two ends, using magnetic beads for purifying CDNA product, using polymerase chain reaction(PCR)Amplification gained cDNA fragment, magnetic beads for purifying PCR primer, upper machine sequencing.Survey Sequence uses Illumina Hiseq2000.
According to the information analysiss flow process of conventional method, screening has obtained a series of TSS information, and Fig. 7 is that the TSS after screening exists Distribution on genome, upper figure and figure below are the TSS scattergram of people RNA and e. coli rna sample respectively, and wherein 0 is gene The initiation site of coding region, its upstream is exactly the site of transcription initiation, it can be seen that most sequence all falls in base Upstream because of coding region.
In addition, in the present embodiment, carried out a series of analysis for these TSS information.
It is the classification of TSS first, the TSS of screening is divided into two big class, a class is to compare on genome and have correspondence Gene annotation TSS, referred to as annotated TSS;Another kind of is to compare still do not noting about on genome The gene information released, the TSS referred to as not annotated, can be used for the prediction of new gene.
Next has done the annotation of TSS, mainly the TSS falling in known is annotated here, including the expression of TSS Amount, TSS location, and corresponding gene annotation information.Then build TSS collection of illustrative plates, inventor is by same species at this Displaying of the formal intuition of TSS picture finding in method forms TSS collection of illustrative plates, can intuitively find out very much from collection of illustrative plates Position and their expression that each TSS is located.Simultaneously it can also be seen that the TSS in different sample expresses, the difference of distribution Different.As shown in figure 8, each is the TSS collection of illustrative plates of the sample of 8 people, as we can see from the figure in different samples TSS distribution feelings Condition.
Searching followed by promoter region and 5 ' UTR length statistics, Fig. 9 is the base distribution figure of TSS upstream, wherein horizontal Coordinate 1 is corresponding be exactly TSS position, based on purine(A/G), upper figure is shown that the TSS upstream base scattergram of people, has Significantly GC enrichment region, this is also the main promoter and enhancer of eukaryote, bottom panel show colibacillary base distribution figure, Also typical TATA box can be found at its TSS upstream -10 area;Figure 10 shows people(Upper figure)And escherichia coli(Figure below)5 ' The distribution of lengths of UTR, that is, TSS is to the distance of coding region, the performance of the effect length gene function of 5 ' UTR, the 5 ' of eucaryon UTR is longer than protokaryon.
In the present embodiment, also the result of parallel laboratory test twice has been done with correlation analysiss can obtain to experimental result reliability Property and the assessment of operational stability, as shown in figure 11, dependency between the parallel laboratory test twice of same sample closer to 1, explanation Repeatable high.
, for the TSS nearby not finding reference gene, extracting the sequence near these TSS, to carry out gene pre- for the present invention Survey.Colibacillary glimmer is predicted, and people is predicted with genscan.Finally, the present invention utilizes analysis result, TSS distribution mapping for gene interested or region is observed, and as shown in figure 12, upper figure is two gene NM_ of people 018997 and NM_031901 TSS distribution, they are the genes that there occurs variable sheer, and in figure redness vertical line represents screening TSS, the vertical line of black be filter before the sequence that obtains, blue horizontal line represents the exon of gene, and yellow horizontal line is the interior of gene Containing son, figure below be one operator of escherichia coli TSS distribution, protokaryon there is not intron, so only representing gene Blue horizontal line, 4 genes of this operator have a TSS.Can see that the TSS that inventor screens is in gene Upstream, be also reliable.
Description of the invention is given for the sake of example and description, and is not exhaustively or by the present invention It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Select and retouch Stating embodiment is in order to the principle of the present invention and practical application are more preferably described, and so that those of ordinary skill in the art is managed The solution present invention is thus design is suitable to the various embodiments with various modifications of special-purpose.
In the description of this specification, reference term " embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or the spy describing with reference to this embodiment or example Point is contained at least one embodiment or the example of the present invention.In this manual, to the schematic representation of above-mentioned term not Necessarily refer to identical embodiment or example.And, the specific features of description, structure, material or feature can be any One or more embodiments or example in combine in an appropriate manner.
Although embodiments of the invention have been shown and described above it is to be understood that above-described embodiment is example Property it is impossible to be interpreted as limitation of the present invention, those of ordinary skill in the art is in the principle without departing from the present invention and objective In the case of above-described embodiment can be changed within the scope of the invention, change, replace and modification.

Claims (15)

1. a kind of method determining transcriptional start point is it is characterised in that include:
Extract RNA sample from host, using enrichment reagents, described RNA sample is processed, so that enrichment transcript, wherein, Described enrichment reagents have 5 '-monophosphate 5 prime excision enzyme activity, and described transcript is to have cap sequence or three phosphorus in its 5 ' end The RNA molecule of acid groups;
Sequencing library structure is carried out to the RNA sample after processing, including,
Remove cap sequence or the triphosphoric acid group of described transcript, obtain to remove turning of cap sequence or triphosphoric acid group Record this,
Connect RNA joint in 5 ' ends of the transcript removing cap sequence or triphosphoric acid group, connect to obtain and to be connected with RNA The transcript of head,
Reverse transcription is carried out to the transcript being connected with RNA joint, to obtain cDNA corresponding with described transcript, described reversion The reverse transcription primer of record has the sequence corresponding with described RNA joint in its end, and described reverse transcription adopts sequence such as SEQ ID NO:Oligonucleotide 5 '-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNN-3 ' shown in 1 is as reverse transcription Primer, in described reverse transcription primer at least one N by thio-modification,
Described cDNA is expanded, to obtain amplified production,
Based on described amplified production, build sequencing library;
Described sequencing library is sequenced, obtains the sequencing result being made up of multiple sequencing sequences;
Based on described sequencing result, determine transcriptional start point, including,
Described sequencing result and reference sequences are compared, in described reference sequences, comprises 5 '-UTR sequence of intended gene At least partially:For prokaryotic hosts, the translation initiation site that described reference sequences comprise described intended gene is risen with this translation Nucleotide sequence between beginning site upstream 700bp site, for eucaryon host, described reference sequences comprise described intended gene Nucleotide sequence between translation initiation site and this translation initiation site upstream 5000bp site,
Based on the comparison result obtaining, it is selected to described reference sequences to upper and in described reference sequences most upstream survey Sequence sequence, as positive sequence, is screened to described positive sequence, the principle of described screening is:The number of described positive sequence It is more than N times comparing to the sequencing sequence number meansigma methodss within described intended gene, wherein said N is at least 10 reality Number,
First bit base of the positive sequence in the selection result that determination obtains is as described transcriptional start site.
2. method according to claim 1 is it is characterised in that repair, using end, the medicated cap that reagent removes described transcript Structure or triphosphoric acid group, wherein,
Described end finishing reagent has tobacco acid pyrophosphatase activity.
3. method according to claim 1 it is characterised in that in described reverse transcription primer penultimate N repaiied by thio Decorations.
4. method according to claim 1 it is characterised in that described sequencing utilize Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, At least one of Helicos tSMS technology and nano-pore sequencing technology is carried out.
5. method according to claim 1 is it is characterised in that described RNA sample is at least of the total serum IgE of host Point.
6. method according to claim 1 is it is characterised in that carry out described comparison using SOAP Alignment.
7. method according to claim 1 is it is characterised in that further include the selection result is carried out X 2 test, institute The test value stating X 2 test is more than 3.84.
8. a kind of system determining transcriptional start point is it is characterised in that include:
Sample extraction equipment, described sample extraction device is used for extracting RNA sample from host;
Sample of nucleic acid sequencing equipment, described sample of nucleic acid sequencing equipment is connected with described sample extraction device, described sample of nucleic acid Sequencing equipment includes library construction device and sequencing device,
Described library construction device includes,
Transcript enrichment unit, is provided with enrichment reagents in described transcript enriching apparatus, and described enrichment reagents have 5 '-mono- phosphorus Sour 5 prime excision enzyme activity, to be enriched with transcript from RNA sample,
End trimming unit, described end trimming unit is connected with described transcript enrichment unit, and is suitable to remove described turning Record this 5 ' cap sequences or 5 ' triphosphoric acids, to obtain the transcript of removal 5 ' cap sequence or 5 ' triphosphoric acids,
RNA joint connection unit, described RNA joint connection unit is connected with end trimming unit, and is suitable to removing 5 ' caps 5 ' ends of the transcript of minor structure or 5 ' triphosphoric acids connect RNA joint, to obtain the transcript being connected with RNA joint,
Reverse transcription unit, described reverse transcription unit is connected with described RNA joint connection unit, and is suitable to connect to being connected with RNA The transcript of head carries out reverse transcription, to obtain cDNA corresponding with described transcript, is provided with anti-in described reverse transcription unit Transcription primers, described reverse transcription primer has the sequence corresponding with described RNA adapter-primer in its end, and described reverse transcription is drawn The sequence of thing such as SEQ ID NO:Shown in 1:5 '-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNN-3 ', described In reverse transcription primer, at least one N is by thio-modification,
Amplification unit, described amplification unit is connected with described reverse transcription unit, and is suitable to described cDNA is expanded, so that Obtain amplified production, and
Library construction unit, described library construction unit is connected with described amplification unit, and is suitable to based on described amplified production, Build sequencing library,
Described sequencing device is connected with described library construction device, and is suitable to described sequencing library is sequenced, to obtain The sequencing result being made up of multiple sequencing sequences must be obtained;And
Transcriptional start point determines equipment, and described transcriptional start point determines that device is connected with described sequencing device, and is suitable to based on described Sequencing result, determines transcriptional start point, and described transcriptional start point determines that equipment includes:
Comparison device, described comparison device is used for described sequencing result and reference sequences being compared, in described reference sequences Comprise at least a portion of 5 '-UTR sequence of intended gene,
Determine device, described determination device is suitable to based on comparison result, determine described transcriptional start point, described determination device is suitable to: It is selected to described reference sequences to upper and sequencing sequence in described reference sequences most upstream as positive sequence, described Determine that device further includes screening unit, described screening unit is suitable to described positive sequence is screened, wherein said sieve Choosing principle be:The number of described positive sequence is the N comparing to the sequencing sequence number meansigma methodss within described intended gene More than times, wherein said N is the real number more than 1,
First bit base of the positive sequence in the selection result that determination obtains from described screening unit is as described transcription Beginning site.
9. system according to claim 8 is it is characterised in that described N is at least 10 real number.
10. system according to claim 8 is it is characterised in that be provided with end finishing examination in the trimming unit of described end Agent, wherein, described end finishing reagent has tobacco acid pyrophosphatase activity.
11. systems according to claim 8 it is characterised in that in described reverse transcription primer penultimate N repaiied by thio Decorations.
12. systems according to claim 8 are it is characterised in that be provided with connection examination in described RNA joint connection unit Agent, wherein, described connection reagent has T4RNA and connects enzymatic activity.
13. systems according to claim 8 it is characterised in that described sequencing device be selected from Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, At least one in Helicos tSMS system and nano-pore sequencing system.
14. systems according to claim 8 are it is characterised in that described comparison device is suitable for use with SOAP Alignment Carry out described comparison.
15. systems according to claim 8 are it is characterised in that described determination device further includes verification unit, described Verification unit is suitable to the selection result is carried out X 2 test, and the test value of described X 2 test is more than 3.84.
CN201210379402.8A 2012-09-29 2012-09-29 Transcript enrichment method from RNA sample and applications thereof Active CN103710336B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201210379402.8A CN103710336B (en) 2012-09-29 2012-09-29 Transcript enrichment method from RNA sample and applications thereof
PCT/CN2013/081581 WO2014048185A1 (en) 2012-09-29 2013-08-15 Method for enriching transcript from rna sample and use thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210379402.8A CN103710336B (en) 2012-09-29 2012-09-29 Transcript enrichment method from RNA sample and applications thereof

Publications (2)

Publication Number Publication Date
CN103710336A CN103710336A (en) 2014-04-09
CN103710336B true CN103710336B (en) 2017-02-22

Family

ID=50386954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210379402.8A Active CN103710336B (en) 2012-09-29 2012-09-29 Transcript enrichment method from RNA sample and applications thereof

Country Status (2)

Country Link
CN (1) CN103710336B (en)
WO (1) WO2014048185A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106319639B (en) * 2015-06-17 2018-09-04 深圳华大智造科技有限公司 Build the method and apparatus of sequencing library
CN113463202B (en) * 2020-03-31 2022-04-15 广州序科码生物技术有限责任公司 Novel RNA high-throughput sequencing method, primer group and kit and application thereof

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1163357A4 (en) * 1999-03-19 2004-11-10 Inst Genetics Llc Primers-attached vector elongation (pave): a 5'-directed cdna cloning strategy
US20040166499A1 (en) * 2000-10-05 2004-08-26 Yoshihide Hayashizaki Oligonucleotide linkers comprising a variable cohesive portion and method for the praparation of polynucleotide libraries by using said linkers
JP2009072062A (en) * 2006-04-07 2009-04-09 Institute Of Physical & Chemical Research Method for isolating 5'-terminals of nucleic acid and its application
WO2009135212A2 (en) * 2008-05-02 2009-11-05 Epicentre Technologies Corporation Selective 5' ligation tagging of rna
CN101967476B (en) * 2010-09-21 2012-11-14 深圳华大基因科技有限公司 Joint connection-based deoxyribonucleic acid (DNA) polymerase chain reaction (PCR)-free tag library construction method
WO2013063308A1 (en) * 2011-10-25 2013-05-02 University Of Massachusetts An enzymatic method to enrich for capped rna, kits for performing same, and compositions derived therefrom
CN102534813B (en) * 2011-11-15 2013-09-04 杭州联川生物技术有限公司 Method for constructing sequencing library of middle-small-segment RNA (Ribonucleic Acid)
CN102533752B (en) * 2012-02-28 2015-01-21 盛司潼 Oligo dT primer and method for constructing cDNA library

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CRISPRRNAmaturationbytrans-encoded small RNA and host factor RNase III;Elitza Deltcheva等;《NATURE》;20110311;第471卷;全文 *
Dynamics of transcriptional start site selection during nitrogen stress-induced cell differentiation in Anabaena sp. PCC7120;Jan Mitschke等;《PNAS》;20111113;第108卷(第50期);全文 *
The primary transcriptome of the major human pathogen Helicobacter pylori;Cynthia M. Sharma等;《nature》;20100311;第464卷;全文 *
The transcriptional landscape and small RNAs of Salmonella enterica serovar Typhimurium;Carsten Kröger等;《PNAS》;20120425;全文 *

Also Published As

Publication number Publication date
CN103710336A (en) 2014-04-09
WO2014048185A1 (en) 2014-04-03

Similar Documents

Publication Publication Date Title
US20220316010A1 (en) Methods for copy number determination
US11999951B2 (en) Massively parallel contiguity mapping
US20210363583A1 (en) Methods for assessing a genomic region of a subject
ES2866044T3 (en) Methods and compositions for DNA profiling
De Paoli-Iseppi et al. Isoform age-splice isoform profiling using long-read technologies
ES2829295T3 (en) High-throughput detection of AFLP-based molecular markers and high-throughput sequencing
McCarty et al. Mu-seq: sequence-based mapping and identification of transposon induced mutations
US9334532B2 (en) Complexity reduction method
JP2020014478A (en) Methods and products for quantifying RNA transcript variants
CN103060924A (en) Library preparation method of trace nucleic acid sample and application thereof
WO2017054302A1 (en) Sequencing library, and preparation and use thereof
US11761037B1 (en) Probe and method of enriching target region applicable to high-throughput sequencing using the same
CN106574287A (en) Sample preparation for nucleic acid amplification
CN104153003A (en) Method for establishing DNA (Deoxyribose Nucleic Acid) library based on illumina sequencing platform
CN110869515B (en) Sequencing method for genome rearrangement detection
CN105039322B (en) DNA sequence labels and sequencing library construction method and kit
CN103710336B (en) Transcript enrichment method from RNA sample and applications thereof
Calvo-Roitberg et al. Challenges in identifying mRNA transcript starts and ends from long-read sequencing data
CN108728515A (en) A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods
US20190112594A1 (en) Compositions and methods that are useful for identifying allele variants that modulate gene expression
US20150120204A1 (en) Transcriptome assembly method and system
CN110651050A (en) Targeted enrichment method and kit for detecting low-frequency mutation
CN110359096A (en) A method of library is targeted using biological sample direct construction
WO2019052322A1 (en) Method for analyzing oligonucleotide sequence impurity based on high throughput sequencing and use
WO2022051532A1 (en) Systems and methods for identifying feature linkages in multi-genomic feature data from single-cell partitions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant