CN103710336B - Transcript enrichment method from RNA sample and applications thereof - Google Patents
Transcript enrichment method from RNA sample and applications thereof Download PDFInfo
- Publication number
- CN103710336B CN103710336B CN201210379402.8A CN201210379402A CN103710336B CN 103710336 B CN103710336 B CN 103710336B CN 201210379402 A CN201210379402 A CN 201210379402A CN 103710336 B CN103710336 B CN 103710336B
- Authority
- CN
- China
- Prior art keywords
- sequencing
- sequence
- rna
- transcript
- tss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1096—Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biomedical Technology (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Plant Pathology (AREA)
- Immunology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a transcript enrichment method from an RNA sample and applications thereof. The transcript enrichment method includes treating the RNA sample by utilization of an enrichment agent so as to facilitate transcript enrichment, wherein the enrichment agent has the 5'-monophosphate excision enzyme activity, and the transcript relates to an RNA molecule the tail of which is provided with a cap sequence or a triphosphoric group. The transcript can be effectively enriched by utilization of the method.
Description
Technical field
The present invention relates to biological technical field, specifically, the present invention relates to from RNA sample be enriched with transcript method and its
Purposes, more specifically, the present invention relates to from RNA sample be enriched with transcript method, build sequencing library method, sequencing literary composition
Storehouse, sample of nucleic acid sequence measurement, determine transcriptional start point (transcription start site, TSS) method, for from
RNA sample is enriched with enrichment reagents, the device building sequencing library, sample of nucleic acid sequencing equipment and the determination TSS's of transcript
System.
Background technology
The transcription of gene is from the beginning of RNA polymerase is combined with the promoter position of DNA profiling, then from transcription
Point (transcription start site, referred to herein simply as:TSS) carry out transcription elongation, ultimately form complete
RNA.The RNA molecule existing in organism, all from the beginning of TSS, is therefore studied TSS by high-flux sequence and is contributed to us
Speculate position and the structure of promoter from full-length genome, thus global knowledge gene transcription regulation network.The research of TSS also helps
In the new gene of the original gene annotation of correction or discovery.
However, the research for TSS at present, still have much room for improvement.
Content of the invention
It is contemplated that at least solving one of above-mentioned technical problem to a certain extent or providing at a kind of useful business
Industry selects.For this reason, it is an object of the present invention to proposing one kind effectively can be enriched with transcript, and then can effectively determine
The means of TSS.
The present invention is to be completed based on the following discovery of inventor:
The method studying TSS currently, with respect to high-flux sequence is typically directed to the RNA with cap sequence, using CAGE
Or the method for RACE captures 5 ' ends of RNA molecule.Common are deepCAGE, PEAT, deep-RACE, nanoCAGE and
CAGEscan.Wherein deepCAGE, PEAT, deep-RACE and CAGEscan need the troublesome operation such as enzyme action, the requirement to RNA
Amount is very high, and the sequencing sequence producing(reads)Shorter(About 20nt), the RNA that is only applicable to there is cap sequence it is impossible to
For there is no the research of the TSS of the protokaryon RNA of cap sequence.Although nanoCAGE operation is fairly simple, will to the usage amount of RNA
Ask also low, but be also only applicable to the RNA with cap sequence, and in the data producing, false positive is relatively more.Invention Crinis Carbonisatus
Now by adopting 5 ' monophosphate excision enzymes, the RNA of 5 ' monophosphates of can specifically degrading, retain and there are 5 ' medicated caps and 5 ' three phosphorus
The complete RNA molecule of acid, is effectively applied to be enriched with transcript it is thus possible to apply simultaneously to eucaryon and protokaryon
The high-flux sequence of the TSS of RNA, has simple to operate, the high many merits with low cost of accuracy rate.
In a first aspect of the present invention, the present invention proposes a kind of method being enriched with transcript from RNA sample.According to this
Bright embodiment, the method that should be enriched with transcript from RNA sample includes:Using enrichment reagents, RNA sample is processed, so that
Enrichment transcript, wherein, described enrichment reagents have 5 '-monophosphate 5 prime excision enzyme activity, and described transcript is in its 5 ' end tool
There are cap sequence or the RNA molecule of 5 ' triphosphoric acids.Due to 5 ' monophosphate excision enzymes, 5 ' monophosphates of can specifically degrading
RNA, has the complete RNA molecule of 5 ' medicated caps and 5 ' triphosphoric acids without degraded, thus circumscribed using having this 5 ' monophosphate
The enrichment reagents of enzymatic activity, can be enriched with transcript effectively it is thus possible to apply simultaneously to the TSS of the RNA of eucaryon and protokaryon
High-flux sequence, have simple to operate, accuracy rate is high and the many merits of low cost.
In a second aspect of the present invention, the present invention proposes a kind of method building sequencing library.Reality according to the present invention
Apply example, the method for this structure sequencing library includes:According to foregoing method, it is enriched with transcript from RNA sample;Remove institute
State 5 ' cap sequences or the 5 ' triphosphoric acids of transcript, to obtain the transcript of removal 5 ' cap sequence or 5 ' triphosphoric acids;Going
5 ' the ends except 5 ' cap sequences or the transcript of 5 ' triphosphoric acids connect RNA joint, to obtain the transcription being connected with RNA joint
This;Reverse transcription is carried out to the transcript being connected with RNA joint, to obtain cDNA corresponding with described transcript;To described
CDNA is expanded, to obtain amplified production;And it is based on described amplified production, build sequencing library.Thus, using the party
Method, can effectively be directed to the transcript be enriched with sample of nucleic acid and build sequencing library it is thus possible to apply simultaneously to eucaryon
With the high-flux sequence of the TSS of the RNA of protokaryon, have simple to operate, the high many merits with low cost of accuracy rate.
In a third aspect of the present invention, the present invention proposes a kind of sequencing library it is characterised in that being by foregoing
Method builds.Using this sequencing library, can effectively rna transcription be originally sequenced, can apply simultaneously to eucaryon and
The high-flux sequence of the TSS of the RNA of protokaryon, has simple to operate, the high many merits with low cost of accuracy rate.
In a fourth aspect of the present invention, the present invention proposes a kind of sample of nucleic acid sequence measurement.Enforcement according to the present invention
Example, this sample of nucleic acid sequence measurement includes:According to foregoing method, build sequencing library;And to described sequencing library
It is sequenced, to obtain sequencing result.Using the method, effectively rna transcription can be originally sequenced, can answer simultaneously
For the high-flux sequence of eucaryon and the TSS of the RNA of protokaryon, have simple to operate, the high many merits with low cost of accuracy rate.
In a fifth aspect of the present invention, the present invention proposes a kind of method for determining TSS.Enforcement according to the present invention
Example, this determines that the method for TSS includes:Extract RNA sample from host;Using foregoing method, obtain by multiple sequencing sequences
The sequencing result that row are constituted;And it is based on described sequencing result, determine TSS.Using the method, can effectively determine and transcribe
Beginning site, can apply simultaneously to the high-flux sequence of the TSS of the RNA of eucaryon and protokaryon, have simple to operate, accuracy rate high and
The many merits of low cost.
In a sixth aspect of the present invention, the present invention proposes a kind of enrichment reagents for being enriched with transcript from RNA sample.
According to embodiments of the invention, enrichment reagents have 5 '-monophosphate 5 prime excision enzyme activity.Using this enrichment reagents, can be effectively
Enrichment transcript, it is thus possible to apply simultaneously to the high-flux sequence of the TSS of the RNA of eucaryon and protokaryon, has simple to operate, accurate
The really high many merits with low cost of rate.
In a seventh aspect of the present invention, the present invention proposes a kind of device building sequencing library.Reality according to the present invention
Apply example, the device of this structure sequencing library includes:Transcript enrichment unit, is provided with above institute in described transcript enriching apparatus
The enrichment reagents stated, to be enriched with transcript from RNA sample;End trimming unit, described end trimming unit and described transcription
This enrichment unit is connected, and is suitable to remove 5 ' cap sequences of described transcript or 5 ' triphosphoric acids, to obtain removal 5 ' cap
Minor structure or the transcript of 5 ' triphosphoric acids;RNA joint connection unit, described RNA joint connection unit and end trimming unit phase
Connect, and be suitable to connect RNA joints in 5 ' ends of the transcript removing 5 ' cap sequences or 5 ' triphosphoric acids, obtain to connect
There is the transcript of RNA joint;Reverse transcription unit, described reverse transcription unit is connected with described RNA joint connection unit, and is suitable to
Reverse transcription is carried out to the transcript being connected with RNA joint, to obtain cDNA corresponding with described transcript;Amplification unit, institute
State amplification unit to be connected with described reverse transcription unit, and be suitable to described cDNA is expanded, to obtain amplified production;With
And library construction unit, described library construction unit is connected with described amplification unit, and is suitable to based on described amplified production, structure
Build sequencing library.Using this device, can effectively be directed to the transcript be enriched with sample of nucleic acid and build sequencing library, thus
The high-flux sequence of the TSS of the RNA of eucaryon and protokaryon can be applied simultaneously to, have simple to operate, accuracy rate is high and low cost
Many merits.
In a eighth aspect of the present invention, the present invention proposes a kind of sample of nucleic acid sequencing equipment it is characterised in that including:
Library construction device, described library construction device is foregoing device, to build sequencing library for sample of nucleic acid;With
And sequencing device, described sequencing device is connected with described library construction device, and is suitable to described sequencing library is sequenced,
So that acquisition sequencing result.Using this device, effectively rna transcription can be originally sequenced, eucaryon can be applied simultaneously to
With the high-flux sequence of the TSS of the RNA of protokaryon, have simple to operate, the high many merits with low cost of accuracy rate.
In a ninth aspect of the present invention, the present invention proposes a kind of system of determination TSS.According to embodiments of the invention,
This system includes:Sample extraction equipment, described sample extraction equipment is used for extracting RNA sample from host;Sample of nucleic acid sequencing sets
Standby, described sample of nucleic acid sequencing equipment is connected with described sample extraction equipment, and described sequencing equipment is foregoing core
Acid sample sequencing equipment, to be sequenced for described RNA sample, thus obtain the sequencing knot being made up of multiple sequencing sequences
Really;And TSS determines equipment, described TSS determines that device is connected with described sequencing equipment, and is suitable to based on described sequencing knot
Really, determine TSS.According to embodiments of the invention, can effectively determine the TSS in sample of nucleic acid using this system.
The additional aspect of the present invention and advantage will be set forth in part in the description, and partly will become from the following description
Obtain substantially, or recognized by the practice of the present invention.
Brief description
The above-mentioned and/or additional aspect of the present invention and advantage will become from reference to the description to embodiment for the accompanying drawings below
Substantially and easy to understand, wherein:
Fig. 1 shows the schematic flow sheet of the method building sequencing library according to an embodiment of the invention;
Fig. 2 shows the bioinformatics analysis schematic flow sheet of determination TSS sequence according to an embodiment of the invention;
Fig. 3 shows the schematic diagram of the system of determination TSS having an embodiment according to the present invention;
Fig. 4 shows the schematic diagram of sample of nucleic acid sequencing equipment according to an embodiment of the invention;
Fig. 5 shows the schematic diagram of the device building sequencing library according to an embodiment of the invention;
Fig. 6 shows the schematic diagram of the equipment of determination TSS having an embodiment according to the present invention;
Fig. 7 shows according to one embodiment of the invention, distribution on genome for the TSS after screening, upper figure and figure below
It is the TSS scattergram of people RNA and e. coli rna sample respectively, wherein 0 is the initiation site of gene coding region, and its upstream is just
It is the site of transcription initiation, it can be seen that most sequence all falls in the upstream of gene coding region;
Fig. 8 shows according to one embodiment of the invention, illustrates the TSS collection of illustrative plates of 8 people's RNA sample, permissible from figure
See the distribution situation of TSS in different samples;Fig. 9 shows according to one embodiment of the invention, the base distribution figure of TSS upstream
Shape, wherein abscissa 1 be corresponding be exactly TSS position, based on purine(A/G).Upper figure is the TSS upstream alkaline of people's RNA sample
Base scattergram, has obvious GC enrichment region, and this is also the main promoter and enhancer of eukaryote;Figure below is e. coli rna sample
The TSS upstream base scattergram of product, also can find typical TATA box at its upstream at -10th area;
Figure 10 show according to one embodiment of the invention, the distribution of lengths of 5 ' UTR, that is, TSS to coding region away from
From.Upper figure is the distribution of lengths of people's RNA sample 5 ' UTR, and figure below is the distribution of lengths of e. coli rna sample 5 ' UTR;
Figure 11 shows that correlation analysiss can obtain the assessment to experimental result reliability and operational stability, as Figure 11 institute
Show, upper figure is the repetition twice of people's RNA sample, figure below is the repetition twice of e. coli rna sample;And
Figure 12 is it is shown that result schematic diagram according to embodiments of the invention predicted gene.Upper figure is two genes of people
The TSS distribution of NM_018997 and NM_031901, they are the genes that there occurs variable sheer, and in figure redness vertical line represents screening
TSS, the vertical line of black be filter before the sequence that obtains, blue horizontal line represents the exon of gene, and yellow horizontal line is gene
Intron;Figure below be one operator of escherichia coli TSS distribution, protokaryon there is not intron, so only representing gene
Blue horizontal line, 4 genes of this operator have a TSS.
Specific embodiment
Embodiments of the invention are described below in detail, the example of described embodiment is shown in the drawings, wherein from start to finish
The element that same or similar label represents same or similar element or has same or like function.Below with reference to attached
The embodiment of figure description is exemplary it is intended to be used for explaining the present invention, and is not considered as limiting the invention.
In the present invention, unless otherwise clearly defined and limited, term " installation ", " being connected ", " connection ", " fixation " etc.
Term should be interpreted broadly, for example, it may be being fixedly connected or being detachably connected, or is integrally connected;It can be machine
Tool connects or electrically connects;Can be to be joined directly together it is also possible to be indirectly connected to by intermediary, can be two units
Connection within part.For the ordinary skill in the art, above-mentioned term can be understood as the case may be at this
Concrete meaning in bright.In addition, the term " upstream " " downstream " herein being used is to hold to 3 ' the direction institutes held according to 5 '
Determine.
The present invention is to be completed based on the following discovery of inventor:
The method studying TSS currently, with respect to high-flux sequence is typically directed to the RNA with cap sequence, using CAGE
Or the method for RACE captures 5 ' ends of RNA molecule.Common are deepCAGE, PEAT, deep-RACE, nanoCAGE and
CAGEscan.Wherein deepCAGE, PEAT, deep-RACE and CAGEscan need the troublesome operation such as enzyme action, the requirement to RNA
Amount is very high, and the sequencing sequence producing(reads)Shorter(About 20nt), the RNA that is only applicable to there is cap sequence it is impossible to
For there is no the research of the TSS of the protokaryon RNA of cap sequence.Although nanoCAGE operation is fairly simple, will to the usage amount of RNA
Ask also low, but be also only applicable to the RNA with cap sequence, and in the data producing, false positive is relatively more.Invention Crinis Carbonisatus
Now by adopting 5 ' monophosphate excision enzymes, the RNA of 5 ' monophosphates of can specifically degrading, retain and there are 5 ' medicated caps and 5 ' three phosphorus
The complete RNA molecule of acid, is effectively applied to be enriched with transcript it is thus possible to apply simultaneously to eucaryon and protokaryon
The high-flux sequence of the TSS of RNA, has simple to operate, the high many merits with low cost of accuracy rate.
In a first aspect of the present invention, the present invention proposes a kind of method being enriched with transcript from RNA sample.According to this
Inventive embodiment, the method that should be enriched with transcript from RNA sample includes:Using enrichment reagents, RNA sample is processed, with
Just it is enriched with transcript, wherein, described enrichment reagents have 5 '-monophosphate 5 prime excision enzyme activity, and described transcript is in its 5 ' end
There is the RNA molecule of cap sequence or triphosphoric acid.According to embodiments of the present invention, there is the example of the enzyme of 5 ' monophosphate 5 prime excision enzyme activities
Son can include:Exoribonuclease XRN-1, TerminatorTMDepend on the exonuclease of 5 ' phosphoric acid
(TerminatorTM5′-Phosphate-Dependent Exonuclease)Or TAKARATMAlkali phosphatase
(TAKARATMAlkaline Phosphatase).Due to 5 ' monophosphate excision enzymes, 5 ' monophosphates of can specifically degrading
RNA, has the complete RNA molecule of 5 ' medicated caps and 5 ' triphosphoric acids without degraded, thus circumscribed using having this 5 ' monophosphate
The enrichment reagents of enzymatic activity, can be enriched with transcript effectively it is thus possible to apply simultaneously to the TSS of the RNA of eucaryon and protokaryon
High-flux sequence, have simple to operate, accuracy rate is high and the many merits of low cost.
According to embodiments of the invention, the method that should be enriched with transcript from RNA sample can have 5 ' single phosphorus using any
The enrichment reagents of sour 5 prime excision enzyme activity.According to embodiments of the present invention, the example with the enzyme of 5 ' monophosphate 5 prime excision enzyme activities is permissible
Including:Exoribonuclease XRN-1, TerminatorTMDepend on exonuclease or the TAKARA of 5 ' phosphoric acidTMAlkalescence
Phosphatase.According to one embodiment of present invention, described enrichment reagents contain DNase I.Thus, it is possible to improve degraded further
The specificity and efficiency of the RNA of 5 ' monophosphates, thus improve the efficiency of enrichment transcription this method further.According to the present invention one
Individual embodiment, described enrichment reagents can also contain buffer and soluble-salt further, to improve DNase I's further
Enzymatic activity.According to one embodiment of present invention, the pH of described enrichment reagents is 8.0.According to one embodiment of present invention, institute
Stating buffer is Tris-HCl, and described soluble-salt is at least one selected from sodium chloride and magnesium chloride.According to the present invention one
Individual embodiment, under 30 degrees Celsius, is processed to described RNA sample using described enrichment reagents.It is thus possible to carry further
Usury enrichment reagents according to embodiments of the present invention carry out being enriched with the efficiency of transcript.According to embodiments of the present invention, have 5 '
The example of the enzyme of monophosphate 5 prime excision enzyme activity can include:Exoribonuclease XRN-1, TerminatorTMDepend on 5 ' phosphorus
The exonuclease of acid or TAKARATMAlkali phosphatase.
In a second aspect of the present invention, the present invention proposes a kind of method building sequencing library.With reference to Fig. 1, according to this
Inventive embodiment, the method for this structure sequencing library includes:
S100(Enrichment transcript):According to foregoing method, it is enriched with transcript from RNA sample.With regard to this step, front
Face has been carried out describing in detail, will not be described here.
S200(End is repaired):Remove 5 ' cap sequences or the 5 ' triphosphoric acids of described transcript, to obtain removal 5 ' cap
Minor structure or the transcript of 5 ' triphosphoric acids.According to one embodiment of present invention, remove described transcription using end finishing reagent
This 5 ' cap sequences or 5 ' triphosphoric acids, wherein, described end finishing reagent has tobacco acid pyrophosphatase activity.According to this
A bright embodiment, described finishing reagent comprises:Tobacco acid pyrophosphatase, soluble-salt, EDTA, beta -mercaptoethanol and
Triton-X 100.According to one embodiment of present invention, described soluble-salt is Sodium Acetate Trihydrate.An enforcement according to the present invention
Example, the pH of described finishing reagent is 7.5.Thus, it is possible to improve the effect that RNA is carried out with end finishing further, can have
5 ' the cap sequences of removal transcript of effect or 5 ' triphosphoric acids, thus improve the efficiency building sequencing library.
S300(Jointing):Connect RNA joint in 5 ' ends of the transcript removing 5 ' cap sequences or 5 ' triphosphoric acids,
To obtain the transcript being connected with RNA joint.According to one embodiment of present invention, using connecting reagent, 5 ' caps are being removed
5 ' ends of the transcript of minor structure or 5 ' triphosphoric acids connect RNA joint, and wherein, described connection reagent has T4RNA ligase
Activity.According to one embodiment of present invention, described connection reagent comprises:T4RNA ligase, buffer, soluble-salt, two sulfur
Threitol.According to one embodiment of present invention, the described pH connecting reagent is 7.5.According to one embodiment of present invention, institute
Stating buffer is Tris-HCl.According to one embodiment of present invention, described soluble-salt is magnesium chloride.According to the present invention one
Individual embodiment, under 30 degrees Celsius, using connecting reagent, at 5 ' ends of the transcript removing 5 ' cap sequences or 5 ' triphosphoric acids
End connects RNA joint.Thus, it is possible to improve the efficiency of jointing, thus improving the efficiency building sequencing library.
S400(Reverse transcription):Reverse transcription is carried out to the transcript being connected with RNA joint, to obtain and described transcript pair
The cDNA answering.According to embodiments of the invention, carry out the reverse transcription primer that reverse transcription adopted and have and described RNA in its end
The corresponding sequence of joint, thus, obtained cDNA also will have joint in its end, consequently facilitating follow-up library build and
Sequencing.The implication of the term " corresponding with RNA joint " herein being used refers to, the sequence energy comprising in reverse transcription primer
Enough mate with RNA joint, and amplified reaction can be carried out, thus obtaining the cDNA two ends with joint.For example, exist
Carry out comprising in one of two reverse transcription primer of reverse transcription and one of RNA joint identical sequence, and in another reverse transcription
In primer, then it is contained in the complementary sequence of another RNA joint.According to one embodiment of present invention, described reverse transcription adopts
There is SEQ ID NO:The oligonucleotide of sequence shown in 1 is as reverse transcription primer.According to one embodiment of present invention, described anti-
Transcription primers(SEQ ID NO:1)In at least one N by thio-modification, such that it is able to prevent this primer by nuclease degradation.Root
According to one embodiment of the present of invention, described reverse transcription primer(SEQ ID NO:1)Middle penultimate N is by thio-modification.
S500(Amplification):Described cDNA is expanded, to obtain amplified production.Those skilled in the art can pass through
Any of method is expanded, for example can be by conventional PCR method it is only necessary to be carried out using the sequence according to joint
Design corresponding primer.
S600(Library construction):Based on described amplified production, build sequencing library.Those skilled in the art can be according to institute
Expect using sequence measurement come for amplified production, the operation that those skilled in the art may be referred to manufacturer and provided is said
Bright, here is not repeating.It should be noted that the amplified production obtained by being processed using the method according to the invention, can
To be applied to Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion
Proton, 454, PacBio RS sequencing system, Helicos tSMS technology and nano-pore sequencing technology, such that it is able to realize
High-flux sequence.
Thus, using the method, can effectively be directed to the transcript be enriched with sample of nucleic acid and build sequencing library, because
And the high-flux sequence of the TSS of the RNA of eucaryon and protokaryon can be applied simultaneously to, have simple to operate, accuracy rate is high and cost
Low many merits.In addition, it is necessary to explanation, between above-mentioned various processes, can optionally include purified product
Step, according to embodiments of the invention, purifying RNA can adopt phenol/chloroform/isoamyl alcohol(Volume ratio is 25:24:1)Extracting,
Ethanol precipitation, is to remove the enzyme in reactant mixture, in order to avoid the reaction of impact next step, and use ethanol precipitation, also
The transcript of some small molecules can be retained, such as microRNA, so that the TSS information of this part of non-coding RNA is obtained, thus
Help understand transcriptional control state.
In a third aspect of the present invention, the present invention proposes a kind of sequencing library it is characterised in that being by foregoing
Method builds.Using this sequencing library, can effectively rna transcription be originally sequenced, can apply simultaneously to eucaryon and
The high-flux sequence of the TSS of the RNA of protokaryon, has simple to operate, the high many merits with low cost of accuracy rate.
In a fourth aspect of the present invention, the present invention proposes a kind of sample of nucleic acid sequence measurement.Enforcement according to the present invention
Example, this sample of nucleic acid sequence measurement includes:According to foregoing method, build sequencing library;And to described sequencing library
It is sequenced, to obtain sequencing result.Using the method, effectively rna transcription can be originally sequenced, can answer simultaneously
For the high-flux sequence of eucaryon and the TSS of the RNA of protokaryon, have simple to operate, the high many merits with low cost of accuracy rate.
According to embodiments of the invention, described sequencing utilizes Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system
System, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology and nano-pore are surveyed
At least one of sequence technology is carried out.Thereby, it is possible to the feature using the high flux of these sequencing devices, deep sequencing, enter one
Step improves the efficiency of sequencing.In one embodiment of the invention, described sequencing is to be entered using Illumina Hiseq2000
Row.
According to the fifth aspect of the invention, the present invention proposes a kind of method of determination TSS.Enforcement according to the present invention
Example, this determines that the method for TSS includes:Extract RNA sample from host;Using foregoing method, obtain by multiple sequencing sequences
The sequencing result that row are constituted;And it is based on described sequencing result, determine TSS.Using the method, can effectively determine nucleic acid sample
TSS in this.
According to one embodiment of present invention, described RNA sample is at least a portion of the total serum IgE of host.According to this
Bright embodiment, host can be eukaryote, such as people or prokaryote, such as escherichia coli.
According to one embodiment of present invention, based on described sequencing result, determine TSS, further include:By described sequencing
Data is compared with reference sequences;
Based on comparison result, determine described transcriptional start point,
Wherein, comprise at least a portion of 5 '-UTR sequence of intended gene in described reference sequences, be selected to and institute
State reference sequences to upper and sequencing sequence in described reference sequences most upstream as positive sequence, and determine the described positive
First bit base of sequence is as described transcriptional start site.Term " intended gene " used herein above refers in reference
On genome, preset the scope potentially including of series of genes, these genes are probably known or not
Know and speculate out by bioinformatics.According to embodiments of the invention, the length of reference sequences is not particularly restricted,
According to embodiments of the invention, reference sequences include at least the translation initiation site of intended gene and its sequence of upstream predetermined length
Row.Because transcriptional start site is in the upstream in translation site, thus, by selecting the length of reference sequences, can will transcribe
Beginning site is included therein.For example, according to embodiments of the invention, for prokaryotic hosts, described reference sequences comprise described pre-
Determine the nucleotide sequence between the translation initiation site of gene and this translation initiation site upstream 700bp site, for eucaryon host,
Described reference sequences comprise between the translation initiation site of described intended gene and this translation initiation site upstream 5000bp site
Nucleotide sequence.
According to one embodiment of present invention, described comparison can be carried out using SOAPAlignment.In the present invention,
By a kind of short sequence mapping Programm oapalignmentv2.2, the clean sequence fragment that high throughput sequencing technologies are obtained
Compare the mispairing that base is not allowed on reference gene group and reference gene sequence respectively.Reference gene group sequence and reference base
Because sequence can be taken at public database.
According to one embodiment of present invention, further include described positive sequence is screened, wherein said screening
Principle be:The number of described positive sequence is N times comparing to the sequencing sequence number meansigma methodss within described intended gene
More than, wherein said N is real number more than 1 it is preferable that described N is at least 10 real number.According to embodiments of the invention, than
To rear, can first comparison result be screened, to obtain reliably TSS information.Screening technique is:Assume clean sequence ratio
To gene(Sequence section corresponding with intended gene)First position be original TSS, but these sequences are likely to be
The inside comparing gene becomes false-positive TSS, so needing to be further carried out filtering.The method can make the sequence of acquisition
It is listed in 5 ' end enrichments of gene, the sequence number of therefore real TSS can be higher than the average falling in the sequence of gene internal, in
It is to introduce a multiple N between them to filter TSS, if the sequence number of the TSS screening falls in corresponding gene internal sequence
N times of columns meansigma methodss is just regarded as real TSS.According to embodiments of the invention, N can be at least 10 reality
Number.
According to one embodiment of present invention, further include to carry out X 2 test to the selection result.According to the present invention's
One embodiment, when the test value of described X 2 test is more than 3.84, that is, confidence level is more than 95%.Enforcement according to the present invention
Example, after filtration, verifies its reliability for filter result this method, specifically, in a upper embodiment using X 2 test
On the basis of, calculate the meansigma methodss of the corresponding multiple of all of TSS, and their standard deviation, after standardization, use following formula
Calculate chi-square value:Work as according to finding in X 2 test table
Confidence level is that when 0.95, chi-square value is 3.84 it is possible to obtain the TSS that reliability is more than 95%, the card side calculating according to formula
Value is necessary for more than 3.84.
In addition, according to embodiments of the invention, can also can also include to sequencing after carrying out obtaining sequencing result
Sequence removes underproof sequence, the step obtaining clean sequencing sequence.According to embodiments of the invention, underproof sequence
Including:
Sequencing quality is considered unqualified less than 50% that the base number of a certain threshold values exceedes whole piece series number
Sequence.Depending on low quality threshold values is by concrete sequencing technologies and sequencing environment;
The uncertain base of sequencing result in sequence(As the N in Illumina Hiseq2000 sequencing result)Number exceedes
10% of whole piece series number is considered unqualified sequence;
In addition to sample joint sequence, the exogenous array being introduced with other experiments is compared, such as various terminal sequence.If in sequence
There is exogenous array and be then considered unqualified sequence.
Original sequence data after removing unqualified series processing the sequence data that obtains we be referred to as clean sequence
Column-slice section(clean reads), can be as the basis of subsequent analysis, thus, it is possible to improve the effectiveness of subsequent analysis.
In addition, after X 2 test, a series of analysis of biological information is carried out to the reliable result of checking, such as:
1)TSS(Transcriptional start site)Classification:According to embodiments of the invention, the TSS of screening can be divided into two big
Class, a class is to compare on genome and have the TSS of corresponding gene annotation, referred to as annotated TSS;Another kind of is energy
Compare on genome but do not having annotated gene information about, the TSS referred to as not annotated, can be used for new gene
Prediction.
2)TSS annotates:Here mainly the TSS falling in known is annotated, including the expression of TSS, residing for TSS
Position, and corresponding gene annotation information.
3)Build TSS collection of illustrative plates:According to embodiments of the invention, the TSS that same species find in the method can be used
Displaying of the formal intuition of picture forms TSS collection of illustrative plates, can intuitively find out very much the position that each TSS is located from collection of illustrative plates
And their expression.Simultaneously it can also be seen that the TSS in different sample expresses, the difference of distribution.
4)Promoter region is found and 5 ' UTR length statistics.
5)Experimental repeatability is analyzed:According to embodiments of the invention, the results relevance analysis to parallel laboratory test twice can
Obtain the assessment to experimental result reliability and operational stability.
6)New gene is predicted:According to embodiments of the invention, for the TSS nearby not finding reference gene, can be by
Sequential extraction procedures near these TSS out carry out predictive genes.Prokaryote is predicted with glimmer, and eukaryote is used
Genscan is predicted.
7)Data visualization:According to embodiments of the invention, using analysis result, can for gene interested or
The TSS distribution mapping in region is observed.
In a sixth aspect of the present invention, the present invention proposes a kind of enrichment reagents for being enriched with transcript from RNA sample.
According to embodiments of the invention, enrichment reagents have 5 '-monophosphate 5 prime excision enzyme activity.Using this enrichment reagents, can be effectively
Enrichment transcript, it is thus possible to apply simultaneously to the high-flux sequence of the TSS of the RNA of eucaryon and protokaryon, has simple to operate, accurate
The really high many merits with low cost of rate.According to one embodiment of present invention, described enrichment reagents contain DNase I.Thus,
The specificity and efficiency of the RNA of degraded 5 ' monophosphate can be improved further, thus improving the method for enrichment transcript further
Efficiency.According to one embodiment of present invention, described enrichment reagents can also contain buffer and soluble-salt further, with
Just improve the enzymatic activity of DNase I further.According to one embodiment of present invention, the pH of described enrichment reagents is 8.0.According to
One embodiment of the present of invention, described buffer is Tris-HCl, and described soluble-salt is selected from sodium chloride and magnesium chloride extremely
Few one kind.According to one embodiment of present invention, under 30 degrees Celsius, using described enrichment reagents, described RNA sample is carried out
Process.It is thus possible to improve further carry out being enriched with the efficiency of transcript using enrichment reagents according to embodiments of the present invention.Root
According to the embodiment of the present invention, the example with the enzyme of 5 ' monophosphate 5 prime excision enzyme activities can include:Exoribonuclease XRN-1,
TerminatorTMDepend on exonuclease or the TAKARA of 5 ' phosphoric acidTMAlkali phosphatase.
In a seventh aspect of the present invention, the present invention proposes a kind of device building sequencing library.With reference to Fig. 5, according to this
Inventive embodiment, the device of this structure sequencing library includes:Transcript enrichment unit 211, end trimming unit 212, RNA connect
Head connection unit 213, reverse transcription unit 214, amplification unit 215 and library construction unit 216.Enforcement according to the present invention
Example, is provided with foregoing enrichment reagents in transcript enrichment unit 211, to be enriched with transcript from RNA sample;End is repaiied
Whole unit 212 is connected with described transcript enrichment unit 211, and is suitable to remove 5 ' cap sequences of described transcript or 5 ' three
Phosphoric acid, to obtain the transcript of removal 5 ' cap sequence or 5 ' triphosphoric acids;RNA joint connection unit 213 is single with end finishing
Unit 212 is connected, and is suitable to connect RNA joint in 5 ' ends of the transcript removing 5 ' cap sequences or 5 ' triphosphoric acids, so that
Obtain the transcript being connected with RNA joint;Reverse transcription unit 214 is connected with described RNA joint connection unit 213, and is suitable to
Reverse transcription is carried out to the transcript being connected with RNA joint, to obtain cDNA corresponding with described transcript;Amplification unit 215
It is connected with described reverse transcription unit 214, and is suitable to described cDNA is expanded, to obtain amplified production;Library construction
Unit 216 is connected with described amplification unit 215, and is suitable to, based on described amplified production, build sequencing library.Using this dress
Put, can effectively be directed to the transcript be enriched with sample of nucleic acid and build sequencing library it is thus possible to apply simultaneously to eucaryon
With the high-flux sequence of the TSS of the RNA of protokaryon, have simple to operate, the high many merits with low cost of accuracy rate.According to this
A bright embodiment, is provided with end finishing reagent, wherein, reagent is repaired in described end in described end trimming unit 212
There is tobacco acid pyrophosphatase activity.According to one embodiment of the present of invention, described finishing reagent comprises:Tobacco acid pyrophosphatase,
Soluble-salt, EDTA, beta -mercaptoethanol and Triton-X 100.According to one embodiment of the present of invention, described soluble-salt is vinegar
Sour sodium.According to one embodiment of the present of invention, the pH of described finishing reagent is 7.5.According to one embodiment of the present of invention, described reversion
It is provided with SEQ ID NO in record unit 214:The oligonucleotide of sequence shown in 1 is as reverse transcription primer.According to the present invention one
Individual embodiment, in described reverse transcription primer, at least one N is by thio-modification.According to one embodiment of the present of invention, described reverse transcription
In primer, penultimate N is by thio-modification.According to one embodiment of the present of invention, set in described RNA joint connection unit 213
It is equipped with connection reagent, wherein, described connection reagent has T4RNA and connects enzymatic activity.According to one embodiment of the present of invention, described company
Connect reagent to comprise:T4RNA ligase, buffer, soluble-salt, dithiothreitol, DTT.According to one embodiment of the present of invention, described company
The pH connecing reagent is 7.5.According to one embodiment of the present of invention, described buffer is Tris-HCl.An enforcement according to the present invention
Example, described soluble-salt is magnesium chloride.
In a eighth aspect of the present invention, the present invention proposes a kind of sample of nucleic acid sequencing equipment.With reference to Fig. 4, according to this
Bright embodiment, this equipment includes:Library construction device 210, described library construction device 210 is foregoing device, with
Just it is directed to sample of nucleic acid and build sequencing library;And sequencing device 220, described sequencing device 220 and described library construction device
210 are connected, and are suitable to described sequencing library is sequenced, to obtain sequencing result.Using this equipment, can be effective
Rna transcription is originally sequenced, the high-flux sequence of the TSS of the RNA of eucaryon and protokaryon can be applied simultaneously to, there is operation letter
Single, the high many merits with low cost of accuracy rate.According to embodiments of the invention, described sequencing equipment is Illumina
Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS
At least one of sequencing system, Helicos tSMS system and nano-pore sequencing system.
In a ninth aspect of the present invention, the present invention proposes a kind of system of determination TSS.With reference to Fig. 3, according to the present invention's
Embodiment, this system includes:Sample extraction equipment 100, described sample extraction equipment is used for extracting RNA sample from host;Nucleic acid
Sample sequencing equipment 200, described sample of nucleic acid sequencing equipment is connected with described sample extraction equipment, and described sequencing equipment is
Foregoing sample of nucleic acid sequencing equipment, to be sequenced for described RNA sample, thus obtain by multiple sequencing sequences
The sequencing result constituting;And TSS determines equipment 300, described TSS determines that equipment 300 is connected with described sequencing equipment 200, and
And be suitable to, based on described sequencing result, determine TSS.According to embodiments of the invention, can effectively determine core using this system
TSS in acid sample.With reference to Fig. 6, according to one embodiment of present invention, described TSS determines that equipment further includes:Compare
Device 310, described comparison device is used for described sequencing data and reference sequences are compared;Determine device 320, described determination
Device is suitable to based on comparison result, determines described TSS, wherein, comprises 5 '-UTR sequence of intended gene in described reference sequences
At least one, and, described determination device 320 be suitable to:It is selected to compare described sequence section corresponding with intended gene
And the sequencing sequence that closest described sequence section 5 ' corresponding with intended gene is held is as positive sequence, and determines the described positive
First base of sequence is transcriptional start site.According to one embodiment of present invention, described comparison device is suitable for use with SOAP
Alignment carries out described comparison.According to one embodiment of present invention, described determination device further includes screening unit,
Described screening unit is suitable to described positive sequence is screened, and the principle of wherein said screening is:The sequence of described positive sequence
Column number is more than N times of described sequence intrasegmental part sequence number meansigma methodss corresponding with intended gene, and wherein said N is more than 1
Real number it is preferable that N can be at least 10 real number.According to one embodiment of present invention, described determination device is further
Including verification unit, described verification unit is suitable to carry out X 2 test to the selection result.According to one embodiment of present invention, institute
The test value stating X 2 test is more than 3.84, and corresponding confidence level is more than 95%.
The term " intended gene " being used in the present invention should be interpreted broadly, and it can refer to any of gene,
Can also refer to by known method, prediction may encoding proteins nucleotide sequence.
Below in conjunction with embodiment, the solution of the present invention is explained.It will be understood to those of skill in the art that it is following
Embodiment is merely to illustrate the present invention, and should not be taken as limiting the scope of the invention.Unreceipted particular technique or bar in embodiment
Part, according to the technology described by document in the art or condition(For example write with reference to J. Pehanorm Brooker etc., Huang Peitang etc. is translated
's《Molecular Cloning:A Laboratory guide》, the third edition, Science Press)Or carry out according to product description.Agents useful for same or instrument
Unreceipted production firm person, be can by city available from conventional products, for example can purchase from Illumina company.
Conventional method
The method being adopted in an embodiment mainly includes TSS library construction and sequencing post analysis, wherein TSS library structure
Construction method mainly comprises the steps:
(1)Take total serum IgE sample, after DNaseI digestion, the postdigestive RNA of ethanol precipitation purification;
(2)Will(1)The RNA obtaining and reagent I mixes reaction, and enrichment contains 5 ' medicated caps or the RNA of 5 ' triphosphoric acids;
(3)Phenol/chloroform/isoamyl alcohol(25:24:1)Extracting and purifying(2)The RNA obtaining;
(4)Will(3)RNA after purification and reagent II mixes reaction, removes 5 ' medicated caps or 5 ' triphosphoric acids obtain 5 ' monophosphates;
(5)Phenol/chloroform/isoamyl alcohol(25:24:1)Extracting and purifying(4)The RNA obtaining;
(6)Will(5)RNA add RNA joint, and with reagent III mix reaction, obtaining 5 ' end plus joint RNA;
(7)Will with specific reverse transcription primer(6)RNA reverse transcription, obtaining two ends has the cDNA of particular sequence joint
And use magnetic beads for purifying;
(8)Using polymerase chain reaction(PCR)Amplification(7)The cDNA fragment of gained two ends adjunction head, pure using magnetic bead
Change PCR primer;
(9)Using Agilent Bioanalyzer 2100 and Q-PCR detection library concentration and clip size.
Step(1)In, the amount of total serum IgE is 5 μ g.
Step(2)In, reagent I, contains:1 μ L 5 ' monophosphate excision enzyme(1U/μL), 50mM buffer salt, 2mM-100mM can
Soluble, pH 8.0, solvent is water.In reagent I, buffer salt is Tris-HCl.In reagent I, soluble-salt is sodium chloride or chlorination
Magnesium.Step(2)Middle gained RNA is 30 DEG C with reagent I mixing temperature.
Described step(4)In, reagent II contains:0.2 μ L tobacco acid pyrophosphatase(10U/μL), 50mM soluble-salt, pH
6.0,1mM EDTA, 0.1% beta -mercaptoethanol, 0.01%Triton X-100, solvent is water.In reagent II, soluble-salt is vinegar
Sour sodium.Sample is 37 DEG C with reagent II mixing temperature.
Described step(6)In, reagent III contains:1 μ L T4 RNA ligase 1,50mM buffer salt, 10mM soluble-salt,
1mM dithiothreitol, DTT, pH 7.5, solvent is water.In reagent III, buffer salt is Tris-HCl.In reagent III, soluble-salt is chlorine
Change magnesium.Step(6)Middle gained RNA is 20 DEG C with reagent III mixing temperature.
Described step(7)In specific reverse transcription primer sequence used be:5-
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNN-3, wherein penultimate N does thio-modification.
Described step(3)With(5)Afterwards, purifying RNA is all using phenol/chloroform/isoamyl alcohol extraction, ethanol precipitation, be for
Remove the enzyme in reactant mixture, in order to avoid the reaction of impact next step, and use ethanol precipitation moreover it is possible to retain some little point
The transcript of son, such as microRNA, so that the TSS information of this part of non-coding RNA is obtained, thus helping understand that transcription is adjusted
Control state.
With reference to Fig. 2, to data produced by the sequencing of TSS library, carry out analysis of biological information, comprise the following steps:
(1)Filter sequencing sequence;
In the present invention, after receiving high-flux sequence sequence, sequencing sequence is filtered, remove underproof sequence
Row.Wherein high throughput sequencing technologies can be Illumina Hiseq2000 sequencing technologies or other high passes existing
Amount sequencing technologies.
Unqualified sequence includes:The base number that sequencing quality is less than a certain threshold values exceedes whole piece series number
50% is considered unqualified sequence.Depending on low quality threshold values is by concrete sequencing technologies and sequencing environment;In sequence, sequencing result is not
The base determining(As the N in Illumina Hiseq2000 sequencing result)Number exceedes 10% of whole piece series number
It is considered unqualified sequence;In addition to sample joint sequence, the exogenous array being introduced with other experiments is compared, such as various terminal sequence
Row.If there is exogenous array in sequence, it is considered unqualified sequence.Original sequence data is through removing at unqualified sequence
The sequence data obtaining after reason we be referred to as clean sequence fragment(clean reads), as the basis of subsequent analysis.
(2)Clean sequence fragment and reference sequences compare;
In the present invention, by a kind of short sequence mapping Programm oapalignment v2.2, by high throughput sequencing technologies
The clean sequence fragment obtaining compares the mispairing not allowing base in reference gene group and reference gene sequence respectively.Ginseng
Examine genome sequence and reference gene sequence can be taken at public database.
(3)After comparison, first comparison result is screened, to obtain reliably TSS information.Screening technique is:Assume
First position of clean sequence alignment to genome is original TSS, but these sequences are likely to be comparison and arrive gene
Inside become false-positive TSS, so need be further carried out filter.The method can make the sequence of our acquisitions in base
5 ' end enrichments of cause, the sequence number of therefore real TSS can be higher than the average falling in the sequence of gene internal, then at him
Between introduce multiple N and filter TSS, if the sequence number of the TSS screening falls putting down in corresponding gene internal sequence number
N times of average is just regarded as real TSS.
(4)After filtration, its reliability is verified using X 2 test for filter result this method, that is, X 2 test value should
Should be that confidence level is more than 95% more than 3.84.
(5)After X 2 test, a series of analysis of biological information is carried out to the reliable result of checking, such as:
1)TSS(Transcriptional start site)Classification:According to embodiments of the invention, the TSS of screening can be divided into two big
Class, a class is to compare on genome and have the TSS of corresponding gene annotation, referred to as annotated TSS;Another kind of is energy
Compare on genome but do not having annotated gene information about, the TSS referred to as not annotated, can be used for new gene
Prediction.
2)TSS annotates:Here mainly the TSS falling in known is annotated, including the expression of TSS, residing for TSS
Position, and corresponding gene annotation information.
3)Build TSS collection of illustrative plates:According to embodiments of the invention, the TSS that same species find in the method can be used
Displaying of the formal intuition of picture forms TSS collection of illustrative plates, can intuitively find out very much the position that each TSS is located from collection of illustrative plates
And their expression.Simultaneously it can also be seen that the TSS in different sample expresses, the difference of distribution.
4)Promoter region is found and 5 ' UTR length statistics.
5)Experimental repeatability is analyzed:According to embodiments of the invention, the results relevance analysis to parallel laboratory test twice can
Obtain the assessment to experimental result reliability and operational stability.
6)New gene is predicted:According to embodiments of the invention, for the TSS nearby not finding reference gene, can be by
Sequential extraction procedures near these TSS out carry out predictive genes.Prokaryote is predicted with glimmer, and eukaryote is used
Genscan is predicted.
7)Data visualization:According to embodiments of the invention, using analysis result, can for gene interested or
The TSS distribution mapping in region is observed.
Embodiment 1 people's RNA sample and the transcriptional start site sequence analysis of e. coli rna sample
People's RNA sample(Sample one)Purchased from Agilent company, e. coli rna(Sample two)Be by escherichia coli cultivate to
The RNA extracting after exponential phase.Take the total serum IgE of 1-5 μ g, digested with DNaseI, ethanol precipitation purification, after purification
RNA and reagent I mixes reaction, and enrichment obtains the global RNA containing 5 ' medicated caps or 5 ' triphosphoric acids, taken out with phenol/chloroform/isoamyl alcohol
After purification, mix reaction with reagent II, remove the 5 ' medicated caps held or triphosphoric acid is allowed to become monophosphate, with phenol/chloroform/different
Amylalcohol extracting and purifying, the RNA of 5 ' monophosphates and reagent III and RNA joint are mixed reaction, add joint at RNA 5 ' end, use
Specific reverse transcription primer will carry the cDNA of fixed sequence program added with the RNA reverse transcription of 5 ' joints for two ends, using magnetic beads for purifying
CDNA product, using polymerase chain reaction(PCR)Amplification gained cDNA fragment, magnetic beads for purifying PCR primer, upper machine sequencing.Survey
Sequence uses Illumina Hiseq2000.
According to the information analysiss flow process of conventional method, screening has obtained a series of TSS information, and Fig. 7 is that the TSS after screening exists
Distribution on genome, upper figure and figure below are the TSS scattergram of people RNA and e. coli rna sample respectively, and wherein 0 is gene
The initiation site of coding region, its upstream is exactly the site of transcription initiation, it can be seen that most sequence all falls in base
Upstream because of coding region.
In addition, in the present embodiment, carried out a series of analysis for these TSS information.
It is the classification of TSS first, the TSS of screening is divided into two big class, a class is to compare on genome and have correspondence
Gene annotation TSS, referred to as annotated TSS;Another kind of is to compare still do not noting about on genome
The gene information released, the TSS referred to as not annotated, can be used for the prediction of new gene.
Next has done the annotation of TSS, mainly the TSS falling in known is annotated here, including the expression of TSS
Amount, TSS location, and corresponding gene annotation information.Then build TSS collection of illustrative plates, inventor is by same species at this
Displaying of the formal intuition of TSS picture finding in method forms TSS collection of illustrative plates, can intuitively find out very much from collection of illustrative plates
Position and their expression that each TSS is located.Simultaneously it can also be seen that the TSS in different sample expresses, the difference of distribution
Different.As shown in figure 8, each is the TSS collection of illustrative plates of the sample of 8 people, as we can see from the figure in different samples TSS distribution feelings
Condition.
Searching followed by promoter region and 5 ' UTR length statistics, Fig. 9 is the base distribution figure of TSS upstream, wherein horizontal
Coordinate 1 is corresponding be exactly TSS position, based on purine(A/G), upper figure is shown that the TSS upstream base scattergram of people, has
Significantly GC enrichment region, this is also the main promoter and enhancer of eukaryote, bottom panel show colibacillary base distribution figure,
Also typical TATA box can be found at its TSS upstream -10 area;Figure 10 shows people(Upper figure)And escherichia coli(Figure below)5 '
The distribution of lengths of UTR, that is, TSS is to the distance of coding region, the performance of the effect length gene function of 5 ' UTR, the 5 ' of eucaryon
UTR is longer than protokaryon.
In the present embodiment, also the result of parallel laboratory test twice has been done with correlation analysiss can obtain to experimental result reliability
Property and the assessment of operational stability, as shown in figure 11, dependency between the parallel laboratory test twice of same sample closer to 1, explanation
Repeatable high.
, for the TSS nearby not finding reference gene, extracting the sequence near these TSS, to carry out gene pre- for the present invention
Survey.Colibacillary glimmer is predicted, and people is predicted with genscan.Finally, the present invention utilizes analysis result,
TSS distribution mapping for gene interested or region is observed, and as shown in figure 12, upper figure is two gene NM_ of people
018997 and NM_031901 TSS distribution, they are the genes that there occurs variable sheer, and in figure redness vertical line represents screening
TSS, the vertical line of black be filter before the sequence that obtains, blue horizontal line represents the exon of gene, and yellow horizontal line is the interior of gene
Containing son, figure below be one operator of escherichia coli TSS distribution, protokaryon there is not intron, so only representing gene
Blue horizontal line, 4 genes of this operator have a TSS.Can see that the TSS that inventor screens is in gene
Upstream, be also reliable.
Description of the invention is given for the sake of example and description, and is not exhaustively or by the present invention
It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Select and retouch
Stating embodiment is in order to the principle of the present invention and practical application are more preferably described, and so that those of ordinary skill in the art is managed
The solution present invention is thus design is suitable to the various embodiments with various modifications of special-purpose.
In the description of this specification, reference term " embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or the spy describing with reference to this embodiment or example
Point is contained at least one embodiment or the example of the present invention.In this manual, to the schematic representation of above-mentioned term not
Necessarily refer to identical embodiment or example.And, the specific features of description, structure, material or feature can be any
One or more embodiments or example in combine in an appropriate manner.
Although embodiments of the invention have been shown and described above it is to be understood that above-described embodiment is example
Property it is impossible to be interpreted as limitation of the present invention, those of ordinary skill in the art is in the principle without departing from the present invention and objective
In the case of above-described embodiment can be changed within the scope of the invention, change, replace and modification.
Claims (15)
1. a kind of method determining transcriptional start point is it is characterised in that include:
Extract RNA sample from host, using enrichment reagents, described RNA sample is processed, so that enrichment transcript, wherein,
Described enrichment reagents have 5 '-monophosphate 5 prime excision enzyme activity, and described transcript is to have cap sequence or three phosphorus in its 5 ' end
The RNA molecule of acid groups;
Sequencing library structure is carried out to the RNA sample after processing, including,
Remove cap sequence or the triphosphoric acid group of described transcript, obtain to remove turning of cap sequence or triphosphoric acid group
Record this,
Connect RNA joint in 5 ' ends of the transcript removing cap sequence or triphosphoric acid group, connect to obtain and to be connected with RNA
The transcript of head,
Reverse transcription is carried out to the transcript being connected with RNA joint, to obtain cDNA corresponding with described transcript, described reversion
The reverse transcription primer of record has the sequence corresponding with described RNA joint in its end, and described reverse transcription adopts sequence such as SEQ
ID NO:Oligonucleotide 5 '-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNN-3 ' shown in 1 is as reverse transcription
Primer, in described reverse transcription primer at least one N by thio-modification,
Described cDNA is expanded, to obtain amplified production,
Based on described amplified production, build sequencing library;
Described sequencing library is sequenced, obtains the sequencing result being made up of multiple sequencing sequences;
Based on described sequencing result, determine transcriptional start point, including,
Described sequencing result and reference sequences are compared, in described reference sequences, comprises 5 '-UTR sequence of intended gene
At least partially:For prokaryotic hosts, the translation initiation site that described reference sequences comprise described intended gene is risen with this translation
Nucleotide sequence between beginning site upstream 700bp site, for eucaryon host, described reference sequences comprise described intended gene
Nucleotide sequence between translation initiation site and this translation initiation site upstream 5000bp site,
Based on the comparison result obtaining, it is selected to described reference sequences to upper and in described reference sequences most upstream survey
Sequence sequence, as positive sequence, is screened to described positive sequence, the principle of described screening is:The number of described positive sequence
It is more than N times comparing to the sequencing sequence number meansigma methodss within described intended gene, wherein said N is at least 10 reality
Number,
First bit base of the positive sequence in the selection result that determination obtains is as described transcriptional start site.
2. method according to claim 1 is it is characterised in that repair, using end, the medicated cap that reagent removes described transcript
Structure or triphosphoric acid group, wherein,
Described end finishing reagent has tobacco acid pyrophosphatase activity.
3. method according to claim 1 it is characterised in that in described reverse transcription primer penultimate N repaiied by thio
Decorations.
4. method according to claim 1 it is characterised in that described sequencing utilize Illumina Hiseq2000,
Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system,
At least one of Helicos tSMS technology and nano-pore sequencing technology is carried out.
5. method according to claim 1 is it is characterised in that described RNA sample is at least of the total serum IgE of host
Point.
6. method according to claim 1 is it is characterised in that carry out described comparison using SOAP Alignment.
7. method according to claim 1 is it is characterised in that further include the selection result is carried out X 2 test, institute
The test value stating X 2 test is more than 3.84.
8. a kind of system determining transcriptional start point is it is characterised in that include:
Sample extraction equipment, described sample extraction device is used for extracting RNA sample from host;
Sample of nucleic acid sequencing equipment, described sample of nucleic acid sequencing equipment is connected with described sample extraction device, described sample of nucleic acid
Sequencing equipment includes library construction device and sequencing device,
Described library construction device includes,
Transcript enrichment unit, is provided with enrichment reagents in described transcript enriching apparatus, and described enrichment reagents have 5 '-mono- phosphorus
Sour 5 prime excision enzyme activity, to be enriched with transcript from RNA sample,
End trimming unit, described end trimming unit is connected with described transcript enrichment unit, and is suitable to remove described turning
Record this 5 ' cap sequences or 5 ' triphosphoric acids, to obtain the transcript of removal 5 ' cap sequence or 5 ' triphosphoric acids,
RNA joint connection unit, described RNA joint connection unit is connected with end trimming unit, and is suitable to removing 5 ' caps
5 ' ends of the transcript of minor structure or 5 ' triphosphoric acids connect RNA joint, to obtain the transcript being connected with RNA joint,
Reverse transcription unit, described reverse transcription unit is connected with described RNA joint connection unit, and is suitable to connect to being connected with RNA
The transcript of head carries out reverse transcription, to obtain cDNA corresponding with described transcript, is provided with anti-in described reverse transcription unit
Transcription primers, described reverse transcription primer has the sequence corresponding with described RNA adapter-primer in its end, and described reverse transcription is drawn
The sequence of thing such as SEQ ID NO:Shown in 1:5 '-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTNNNNNN-3 ', described
In reverse transcription primer, at least one N is by thio-modification,
Amplification unit, described amplification unit is connected with described reverse transcription unit, and is suitable to described cDNA is expanded, so that
Obtain amplified production, and
Library construction unit, described library construction unit is connected with described amplification unit, and is suitable to based on described amplified production,
Build sequencing library,
Described sequencing device is connected with described library construction device, and is suitable to described sequencing library is sequenced, to obtain
The sequencing result being made up of multiple sequencing sequences must be obtained;And
Transcriptional start point determines equipment, and described transcriptional start point determines that device is connected with described sequencing device, and is suitable to based on described
Sequencing result, determines transcriptional start point, and described transcriptional start point determines that equipment includes:
Comparison device, described comparison device is used for described sequencing result and reference sequences being compared, in described reference sequences
Comprise at least a portion of 5 '-UTR sequence of intended gene,
Determine device, described determination device is suitable to based on comparison result, determine described transcriptional start point, described determination device is suitable to:
It is selected to described reference sequences to upper and sequencing sequence in described reference sequences most upstream as positive sequence, described
Determine that device further includes screening unit, described screening unit is suitable to described positive sequence is screened, wherein said sieve
Choosing principle be:The number of described positive sequence is the N comparing to the sequencing sequence number meansigma methodss within described intended gene
More than times, wherein said N is the real number more than 1,
First bit base of the positive sequence in the selection result that determination obtains from described screening unit is as described transcription
Beginning site.
9. system according to claim 8 is it is characterised in that described N is at least 10 real number.
10. system according to claim 8 is it is characterised in that be provided with end finishing examination in the trimming unit of described end
Agent, wherein, described end finishing reagent has tobacco acid pyrophosphatase activity.
11. systems according to claim 8 it is characterised in that in described reverse transcription primer penultimate N repaiied by thio
Decorations.
12. systems according to claim 8 are it is characterised in that be provided with connection examination in described RNA joint connection unit
Agent, wherein, described connection reagent has T4RNA and connects enzymatic activity.
13. systems according to claim 8 it is characterised in that described sequencing device be selected from Illumina Hiseq2000,
Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system,
At least one in Helicos tSMS system and nano-pore sequencing system.
14. systems according to claim 8 are it is characterised in that described comparison device is suitable for use with SOAP Alignment
Carry out described comparison.
15. systems according to claim 8 are it is characterised in that described determination device further includes verification unit, described
Verification unit is suitable to the selection result is carried out X 2 test, and the test value of described X 2 test is more than 3.84.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210379402.8A CN103710336B (en) | 2012-09-29 | 2012-09-29 | Transcript enrichment method from RNA sample and applications thereof |
PCT/CN2013/081581 WO2014048185A1 (en) | 2012-09-29 | 2013-08-15 | Method for enriching transcript from rna sample and use thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210379402.8A CN103710336B (en) | 2012-09-29 | 2012-09-29 | Transcript enrichment method from RNA sample and applications thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103710336A CN103710336A (en) | 2014-04-09 |
CN103710336B true CN103710336B (en) | 2017-02-22 |
Family
ID=50386954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210379402.8A Active CN103710336B (en) | 2012-09-29 | 2012-09-29 | Transcript enrichment method from RNA sample and applications thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN103710336B (en) |
WO (1) | WO2014048185A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106319639B (en) * | 2015-06-17 | 2018-09-04 | 深圳华大智造科技有限公司 | Build the method and apparatus of sequencing library |
CN113463202B (en) * | 2020-03-31 | 2022-04-15 | 广州序科码生物技术有限责任公司 | Novel RNA high-throughput sequencing method, primer group and kit and application thereof |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1163357A4 (en) * | 1999-03-19 | 2004-11-10 | Inst Genetics Llc | Primers-attached vector elongation (pave): a 5'-directed cdna cloning strategy |
US20040166499A1 (en) * | 2000-10-05 | 2004-08-26 | Yoshihide Hayashizaki | Oligonucleotide linkers comprising a variable cohesive portion and method for the praparation of polynucleotide libraries by using said linkers |
JP2009072062A (en) * | 2006-04-07 | 2009-04-09 | Institute Of Physical & Chemical Research | Method for isolating 5'-terminals of nucleic acid and its application |
WO2009135212A2 (en) * | 2008-05-02 | 2009-11-05 | Epicentre Technologies Corporation | Selective 5' ligation tagging of rna |
CN101967476B (en) * | 2010-09-21 | 2012-11-14 | 深圳华大基因科技有限公司 | Joint connection-based deoxyribonucleic acid (DNA) polymerase chain reaction (PCR)-free tag library construction method |
WO2013063308A1 (en) * | 2011-10-25 | 2013-05-02 | University Of Massachusetts | An enzymatic method to enrich for capped rna, kits for performing same, and compositions derived therefrom |
CN102534813B (en) * | 2011-11-15 | 2013-09-04 | 杭州联川生物技术有限公司 | Method for constructing sequencing library of middle-small-segment RNA (Ribonucleic Acid) |
CN102533752B (en) * | 2012-02-28 | 2015-01-21 | 盛司潼 | Oligo dT primer and method for constructing cDNA library |
-
2012
- 2012-09-29 CN CN201210379402.8A patent/CN103710336B/en active Active
-
2013
- 2013-08-15 WO PCT/CN2013/081581 patent/WO2014048185A1/en active Application Filing
Non-Patent Citations (4)
Title |
---|
CRISPRRNAmaturationbytrans-encoded small RNA and host factor RNase III;Elitza Deltcheva等;《NATURE》;20110311;第471卷;全文 * |
Dynamics of transcriptional start site selection during nitrogen stress-induced cell differentiation in Anabaena sp. PCC7120;Jan Mitschke等;《PNAS》;20111113;第108卷(第50期);全文 * |
The primary transcriptome of the major human pathogen Helicobacter pylori;Cynthia M. Sharma等;《nature》;20100311;第464卷;全文 * |
The transcriptional landscape and small RNAs of Salmonella enterica serovar Typhimurium;Carsten Kröger等;《PNAS》;20120425;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN103710336A (en) | 2014-04-09 |
WO2014048185A1 (en) | 2014-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220316010A1 (en) | Methods for copy number determination | |
US11999951B2 (en) | Massively parallel contiguity mapping | |
US20210363583A1 (en) | Methods for assessing a genomic region of a subject | |
ES2866044T3 (en) | Methods and compositions for DNA profiling | |
De Paoli-Iseppi et al. | Isoform age-splice isoform profiling using long-read technologies | |
ES2829295T3 (en) | High-throughput detection of AFLP-based molecular markers and high-throughput sequencing | |
McCarty et al. | Mu-seq: sequence-based mapping and identification of transposon induced mutations | |
US9334532B2 (en) | Complexity reduction method | |
JP2020014478A (en) | Methods and products for quantifying RNA transcript variants | |
CN103060924A (en) | Library preparation method of trace nucleic acid sample and application thereof | |
WO2017054302A1 (en) | Sequencing library, and preparation and use thereof | |
US11761037B1 (en) | Probe and method of enriching target region applicable to high-throughput sequencing using the same | |
CN106574287A (en) | Sample preparation for nucleic acid amplification | |
CN104153003A (en) | Method for establishing DNA (Deoxyribose Nucleic Acid) library based on illumina sequencing platform | |
CN110869515B (en) | Sequencing method for genome rearrangement detection | |
CN105039322B (en) | DNA sequence labels and sequencing library construction method and kit | |
CN103710336B (en) | Transcript enrichment method from RNA sample and applications thereof | |
Calvo-Roitberg et al. | Challenges in identifying mRNA transcript starts and ends from long-read sequencing data | |
CN108728515A (en) | A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods | |
US20190112594A1 (en) | Compositions and methods that are useful for identifying allele variants that modulate gene expression | |
US20150120204A1 (en) | Transcriptome assembly method and system | |
CN110651050A (en) | Targeted enrichment method and kit for detecting low-frequency mutation | |
CN110359096A (en) | A method of library is targeted using biological sample direct construction | |
WO2019052322A1 (en) | Method for analyzing oligonucleotide sequence impurity based on high throughput sequencing and use | |
WO2022051532A1 (en) | Systems and methods for identifying feature linkages in multi-genomic feature data from single-cell partitions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |