WO2024138574A1

WO2024138574A1 - Helicase and use thereof

Info

Publication number: WO2024138574A1
Application number: PCT/CN2022/143631
Authority: WO
Inventors: 刘珍君; 王宗安; 王洪涛; 孔超娣; 曾涛; 郭斐; 王乐乐; 季州翔; 黎宇翔; 董宇亮; 章文蔚; 徐讯
Original assignee: 深圳华大生命科学研究院
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2024-07-04

Abstract

Disclosed in the present invention are a helicase and a use thereof. The helicase has two tower domains and a PIN domain, and the two tower domains are located on the same side of a helicase three-dimensional structure. According to the technical solution of the present invention, a brand-new helicase BCH3X having a special helix characteristic domain is provided, and the helicase has good salt tolerance and stability, can have high unwinding activity in the case of a high salt content, can be used for nucleic acid control and characterization, and is applied to nanopore sequencing.

Description

Helicase and its application

Technical Field

The present invention relates to the field of biotechnology, and in particular to a helicase and an application thereof.

Background technique

As an emerging single-molecule sequencing technology, nanopore sequencing technology has brought disruptive changes to the gene sequencing industry with its unique advantages such as high throughput, long read length, fast speed, in situ detection and label-free operation. This technology does not require the use of imaging equipment for detection, so that the system can be reduced to a portable level to meet different sequencing scenarios. And due to its non-amplified direct sequencing nature, there is no length limit on the sequenceable DNA, allowing real-time base calling, and can also achieve RNA, methylation and other modified molecules, as well as direct sequencing of other single molecules. Nanopore sequencing technology has a wide range of application value in many fields such as molecular biology, medicine, epidemiology and ecology, such as genome mapping, epidemic and other infectious diseases, detection of rare species, identification of hidden intermediates, dynamic monitoring of biological non-covalent interactions, promotion of epigenetic and post-translational modification characterization, and fast and inexpensive protein sequencing.

Nanopore sequencing technology is a sequencing technology based on electrical signals. A nanopore (protein or solid) inserted in a membrane as a signal sensor separates two electrolyte chambers. When voltage is applied between the two electrolyte chambers, a stable perforation current is generated. When the molecule to be tested enters the nanopore, the flow of ions is hindered, resulting in current signal fluctuations. Different bases have different effects on the current. By detecting the current fluctuation signal of the nanopore in real time and using machine learning to analyze and decode the current signal, real-time sequencing of the molecule to be tested can be achieved.

In the sequencing process, due to the extremely fast speed of nucleic acid molecules passing through the nanopore channel, it is impossible to accurately obtain polynucleotide sequence information. Therefore, effectively reducing and controlling the perforation movement of nucleic acid molecules is a key technical problem in realizing nanopore sequencing. At present, the most common and effective method is to control the perforation movement of nucleic acid molecules using the idea of helicase unwinding to improve the detection accuracy. And in order to better maintain the sequencing speed and sequencing uniformity, the helicase needs to have good salt tolerance and thermal stability in high-salt electrolyte solution.

The helicase in the current commercial nanopore sequencer is the Dda helicase derived from the bacterial phage T4, which has poor production, stability and salt tolerance. In particular, high salt will inhibit the unwinding activity of the Dda helicase, causing its unwinding speed to decrease and unable to fully exert its unwinding ability, thereby weakening its sequencing speed in nanopore sequencing applications and reducing sequencing efficiency.

Summary of the invention

The present invention aims to provide a helicase and application thereof, so as to solve the technical problem of poor salt tolerance of the helicase in the prior art.

To achieve the above object, according to one aspect of the present invention, a helicase is provided, which has two tower domains and one pin domain, and the two tower domains are located on the same side of the three-dimensional structure of the helicase.

Further, the helicase includes at least one of the following: A) BCH326, BCH326 is a protein having the amino acid sequence shown in SEQ ID NO: 1; B) BCH338, BCH338 is a protein having the amino acid sequence shown in SEQ ID NO: 3; C) a protein in which at least one cysteine on the surface of the protein defined in A) or B) is mutated to alanine, glutamine, glycine, histidine, isoleucine, leucine, valine, serine, threonine or methionine; D) a protein in which at least one amino acid at a site on the tower domain and/or pin domain of the amino acid sequence of any of the proteins defined in A), B) and C) is mutated to cysteine or at least one non-natural amino acid is introduced, and the protein has the ability to unwind DNA; and E) a protein having more than 70% homology with the amino acid sequence of any of the proteins defined in A), B), C) and D) and having the same function.

Furthermore, C) includes: proteins in which the C at position 319 of BCH326 is replaced by A, S, T, V, I, L or G; and proteins in which the C at position 326 or 459 of BCH338 is replaced by A, S, T, V, I, L or G.

Further, in D), the amino acid mutation at at least one site on the tower domain and/or the pin domain to cysteine or the introduction of at least one unnatural amino acid includes at least one of the following: S389, R340, K341, S342, N343, K343, S344, I345, V346, I347, D348, K349, D350, G351, K352, A353, K354, E355, F356, L357, R358, K359, F360, L361, N362, F363, A364, K365, I366, Y367, N368, F369, T370, N371, K372, G373 on the tower domain of BCH326 , G374, H378, G379, R380, R381, I382, T383, K384, K385, S386, K387, K388, E389, L390 and W391 are mutated to cysteine or at least one unnatural amino acid is introduced; at least one of D87, I88, G89, T90, I91, H92, S93, Y94, F95, D96, I97, K98, P99, D100, I101, D102, D103, N104, G105, N106, R107, V108, F109, K110, P111 or S112 on the pin domain of BCH326 is mutated to cysteine or at least one unnatural amino acid is introduced cysteine or introduces at least one unnatural amino acid; S405, K406, F407, L408, V409, P410, L411, G412, D413, G414, S415, K416, E417, D418, L419, F420, P421, L422, Y423, K424, E425, A426, V427, F428, D429, I430, A431, K432, T433, M434, N435, N436, Q437, R438, K439, I440, S441, K442, N443, S444, K445, K446, N447, F448 or W449 on the tower domain of BCH338 9, the amino acid at least one of which is mutated to cysteine or at least one unnatural amino acid is introduced; the amino acid at least one of which is mutated to cysteine or at least one unnatural amino acid is introduced on the pin domain of BCH338, including E93, I94, R95, P96, D97, I98, N99, E100, F101, G102, E103, R104, I105, F106, V107, P108, K109, L110, R111, D112, M113, and M114; preferably, in E), the protein has 70%, 80%, 90%, 95% or 99% or more homology and the same function as the amino acid sequence of the protein defined in any one of A), B), C) and D).

Further, the unnatural amino acid is selected from 4-azido-L-phenylalanine, 4-acetyl-L-phenylalanine, 3-acetyl-L-phenylalanine, 4-acetoacetyl-L-phenylalanine, O-allyl-L-tyrosine, 3-(phenylselenoyl)-L-alanine, O-2-propyn-1-yl-L-tyrosine, 4(dihydroxyboryl)-L-phenylalanine, 4-[(ethylsulfanyl)carbonyl]-L-phenylalanine, (2S)-2-amino-3-{4-[(propan-2-ylsulfanyl)carbonyl]phenyl}propanoic acid, (2S)-2-amino-3-{4-[(2-amino-3-sulfanylpropanoyl)amino]phenyl}propanoic acid, O-methyl-L-tyrosine, 4-amino-L-phenylalanine, 4-cyano-L-phenylalanine, 3-cyano-L-phenylalanine, 4-fluoro-L-phenylalanine, 4-iodo-L-phenylalanine, 4-bromo-L-phenylalanine, O-(trifluoromethyl)tyrosine, 4-nitro-L-phenylalanine, 3-hydroxy-L-tyrosine, 3-amino-L-tyrosine, 3-iodo-L-tyrosine, 4-isopropyl-L-phenylalanine, 3-(2-naphthyl)-L-alanine, 4-phenyl-L-phenylalanine, (2S)-2-amino-3-(naphthylamino)propionic acid, 6-(methylsulfanyl)norleucine, 6-oxo-L-lysine, D-tyrosine, (2R)-2-hydroxy-3-(4-hydroxyphenyl)-L-alanine, 2-amino-3-(8-hydroxy-3-quinolyl)propionic acid, 4-benzoyl-L-phenylalanine, S-(2-nitrobenzyl)cysteine, (2R)-2-amino-3-[(2-nitrobenzyl)sulfanyl]propionic acid, (2S)-2-amino-3-[(2-nitrobenzyl)oxy]propionic acid, O-(4,5-dimethoxy-2-nitrobenzyl)-L-serine , (2S)-2-amino-6-({[(2-nitrobenzyl)oxy]carbonyl}amino)hexanoic acid and O-(2-nitrobenzyl)-L-tyrosine or 2-nitrophenylalanine; preferably, BCH326 introduces at least one unnatural amino acid including at least one of the following: D100 introduces 4-azido-L-phenylalanine, I101 introduces 4-azido-L-phenylalanine, D102 introduces 4-azido-L-phenylalanine, D103 introduces 4-azido-L-phenylalanine phenylalanine, N104 introduces 4-azido-L-phenylalanine, G105 introduces 4-azido-L-phenylalanine, N106 introduces 4-azido-L-phenylalanine, R107 introduces 4-azido-L-phenylalanine, D103 introduces 4-acetyl-L-phenylalanine, G105 introduces 4-acetyl-L-phenylalanine and N106 introduces 4-acetyl-L-phenylalanine; preferably, BCH338 introduces at least one unnatural amino acid including at least one of the following: A 431 introduced 4-azido-L-phenylalanine, K432 introduced 4-azido-L-phenylalanine, T433 introduced 4-azido-L-phenylalanine, M434 introduced 4-azido-L-phenylalanine, N435 introduced 4-azido-L-phenylalanine, S441 introduced 4-azido-L-phenylalanine, K442 introduced 4-azido-L-phenylalanine, N443 introduced 4-azido-L-phenylalanine and S444 introduced 4-azido-L-phenylalanine.

Further, there is at least one amino acid mutation at the amino acid site of the DNA binding region of the helicase and/or the amino acid site near the ATP catalytic active center, and the mutation includes mutating the original amino acid to an amino acid with a larger side chain; preferably, mutating the original amino acid to an amino acid with a larger side chain includes at least one of the following: asparagine is replaced by glutamine, histidine, arginine or lysine; proline is replaced by arginine, lysine, phenylalanine or leucine; histidine is replaced by arginine, lysine, glutamine, asparagine phenylalanine, tyrosine or tryptophan; proline is replaced by arginine, lysine, glutamine, asparagine or histidine; phenylalanine is replaced by arginine, lysine, histidine, tyrosine or tryptophan; isoleucine is replaced by phenylalanine, tryptophan, histidine, lysine or arginine; tyrosine is replaced by arginine, lysine, or tryptophan; the amino acid site of the DNA binding region of BCH326 includes The amino acid sites near the ATP catalytic activity center include: L157, V160, L294, G296, N299, L303, A304, I328, F329, T330, N331, G332, G333 and E334, and the amino acid sites near the ATP catalytic activity center include: K211, E212, E213, N214, Y215, K216, A217, P218, L219, K220, D221, I222, N22 3 and N224; the amino acid sites in the DNA binding region of BCH338 include: H89, S90, Y91, F92, E93, I94, R95 and P96; the amino acid sites near the ATP catalytic active center include: Y152, Q153, L154, P155, P156, V157, F193, L194, I195, K196, E197, Y198, E199, E200 and N201.

Further, the amino acid on the surface of the helicase that interacts with the nanopore binding region has a mutation in at least one site, and the mutation includes mutating the original amino acid to an amino acid with a shorter side chain; preferably, the mutation of the original amino acid to an amino acid with a shorter side chain includes: asparagine is replaced by isoleucine, valine, isoleucine, alanine, serine or glycine; lysine is replaced by isoleucine, valine, isoleucine, alanine, serine or glycine; lysine is replaced by isoleucine, valine, isoleucine, alanine, serine or glycine; arginine is replaced by isoleucine, valine, isoleucine, alanine, serine or glycine; preferably, the amino acids on the surface of BCH326 that interact with the nanopore binding region include : M1, E2, S3, K4, I5, N6, L7, T8, E9, D10, Q11, L12, K13, I14, I15, K16, I189, I190, R191, T192, Q193, N194, K195, N196 and S197; the amino acids on the surface of BCH338 that interact with the nanopore binding region include: M1, G2, E3, I4, K5, L6, N7, E8, E9, Q10, Q11, K12, K177, I177, L178, R179, T180, K181, N182, L213, I214, D215, H216, F217, H218, V219, Y220, G221, D248, L249, T250, D251, S252, T253, E254 and S255.

According to another aspect of the present invention, an isolated DNA molecule is provided. The DNA molecule has (a) a nucleotide sequence encoding the helicase of any one of claims 2 to 3; or (b) a nucleotide sequence that hybridizes with the DNA molecule defined in (a) under stringent conditions; or (c) a nucleotide sequence shown in SEQ ID NO: 2 or SEQ ID NO: 4; or (d) a nucleotide sequence that has more than 70% homology with any one of the nucleotide sequences defined in (a) to (c) and encodes a protein having the same function as the helicase.

Furthermore, the DNA molecule has a nucleotide sequence that has 75% or more, preferably 85% or more, more preferably 95% or more, and further preferably 99% or more homology with any of the nucleotide sequences defined in (a) to (c) and encodes a protein having the same function.

According to another aspect of the present invention, a recombinant vector is provided, which comprises any one of the above-mentioned DNA molecules.

Further, the recombinant vector is selected from a plasmid, a virus or a carrier expression vector; further, the recombinant vector includes a regulatory element for controlling the expression of the DNA molecule; further, the regulatory element includes a promoter operably linked to the DNA molecule; preferably, the promoter includes T7, trc, lac, ara or λL; more preferably, the recombinant vector is selected from plasmid PET.28a(+), PET.21a(+) or PET.32a(+).

According to another aspect of the present invention, a host cell is provided, which comprises any one of the above-mentioned DNA molecules of the present invention, or any one of the above-mentioned recombinant vectors of the present invention.

Further, the host cell includes Escherichia coli; preferably, the host cell includes BL21(DE3), BL21Star(DE3)pLysS, Rossata(DE3) or Lemo21(DE3).

According to another aspect of the present invention, there is provided an application of the above-mentioned helicase in nucleic acid control or characterization; further, nucleic acid control includes controlling the speed of nucleic acid passing through a nanopore, controlling the stability of nucleic acid perforation, or controlling the continuity of nucleic acid perforation; further, the application includes the application of nanosensors and single-molecule nanopore sequencing applications.

According to another aspect of the present invention, a nanopore sequencing kit is provided, wherein the kit comprises a helicase, and the helicase is any of the above-mentioned helicases of the present invention.

According to another aspect of the present invention, a method for nanopore sequencing is provided, which comprises sequencing a nucleic acid molecule to be sequenced under the control of a helicase, wherein the helicase is any of the above-mentioned helicases of the present invention.

By applying the technical solution of the present invention, a new type of helicase BCH3X with a special helical characteristic domain is provided, which has good salt tolerance and stability, can have high unwinding activity under high salt, can be used for the control and characterization of nucleic acids, and is applied to nanopore sequencing.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings constituting a part of the present application are used to provide a further understanding of the present invention. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the drawings:

Figure 1 shows the molecular sieve Superdex 200 purification results of BCH326, including: (A) molecular sieve Superdex 200 elution graph of BCH326; (B) molecular sieve elution gel graph of BCH326.

Figure 2 shows the molecular sieve Superdex 200 purification results of BCH338, including: (A) molecular sieve Superdex 200 elution graph of BCH338; (B) molecular sieve elution gel graph of BCH338.

FIG3 shows the Alphafold2 predicted structure of BCH326.

FIG4 shows the Alphafold2 predicted structure of BCH338.

FIG5 shows the ATPase activity detection of BCH326 protein.

FIG6 shows the ATPase activity detection of BCH338 protein.

FIG. 7 shows the dsDNA melting activity detection of BCH326 protein (low salt reaction buffer).

FIG. 8 shows the dsDNA melting activity detection of BCH326 protein (high salt reaction buffer).

FIG. 9 shows the dsDNA melting activity detection of BCH338 protein (low salt reaction buffer).

FIG. 10 shows the dsDNA melting activity detection of BCH338 protein (high salt reaction buffer).

FIG. 11 shows the detection of the restriction sequence blocking the melting activity of BCH326 protein (low salt reaction buffer).

FIG. 12 shows the detection of the restriction sequence blocking the melting activity of BCH326 protein (high salt reaction buffer).

FIG. 13 shows the detection of the restriction sequence blocking the melting activity of BCH338 protein (low salt reaction buffer).

FIG. 14 shows the detection of the restriction sequence blocking the melting activity of BCH338 protein (high salt reaction buffer).

FIG. 15 shows a schematic diagram of a linker (a: upper chain; b: lower chain).

FIG16 shows a schematic diagram of a sequencing library containing helicase (a: upper strand; b: lower strand; c: double-stranded target fragment; d: helicase; e: cholesterol-labeled double-stranded DNA).

FIG17 shows a schematic diagram of a patch clamp amplifier.

FIG. 18 shows a graph of BCH326 sequencing current signals.

FIG. 19 shows a graph of BCH338 sequencing current signals.

FIG20 shows the crystal structure of Dda.

Detailed ways

It should be noted that, in the absence of conflict, the embodiments and features in the embodiments of the present application can be combined with each other. The present invention will be described in detail below with reference to the accompanying drawings and in combination with the embodiments.

According to a typical embodiment of the present application, a type of helicase is provided, which has two tower domains and one pin domain, and the two tower domains are located on the same side of the three-dimensional structure of the helicase. When the helicase with such a structure is sequenced in the nanopore, the tower domain and the pin domain can be cross-linked to form a DNA binding region, which is convenient for speed-controlled sequencing, increases sequencing continuity and stability, and prevents DNA from slipping or fluctuating during the sequencing process, causing fluctuations in sequencing signals; and has a high salt tolerance, can exert a better unwinding ability at high salt concentrations, and improves sequencing efficiency.

According to a typical embodiment of the present application, a helicase is provided, the gene of which is derived from a deep-sea metagenome and has high salt tolerance. The helicase includes at least one of the following: A) BCH326, BCH326 is a protein having an amino acid sequence shown in SEQ ID NO: 1; B) BCH338, BCH338 is a protein having an amino acid sequence shown in SEQ ID NO: 3; C) a protein in which at least one cysteine on the surface of the protein defined in A) or B) is mutated to alanine, glutamine, glycine, histidine, isoleucine, leucine, valine, serine, threonine or methionine; D) a protein in which at least one amino acid at a site on the tower domain and/or pin domain of the amino acid sequence of any of the proteins defined in A), B) and C) is mutated to cysteine or at least one non-natural amino acid is introduced, and has the ability to unwind DNA; and E) a protein having more than 70% homology with the amino acid sequence of any of the proteins defined in A), B), C) and D) and having the same function.

The helicase defined in A) or B) above is derived from the deep-sea metagenome and has a high salt tolerance.

The helicase defined in C) above can improve protein uniformity, thereby improving indicators such as sequencing uniformity, by mutating at least one cysteine on the surface of the helicase to alanine, glutamine, glycine, histidine, isoleucine, leucine, valine, serine, threonine or methionine.

According to a typical embodiment of the present invention, the helicase defined in C) above includes: a protein in which the C at position 319 of BCH326 is replaced by A, S, T, V, I, L or G; and a protein in which the C at position 326 or 459 of BCH338 is replaced by A, S, T, V, I, L or G.

According to a typical embodiment of the present invention, based on the sequence defined in A) or B), the protein is mutated to stably connect the tower domain and the pin domain together, so that the DNA is fixed in the region formed by the two during the sequencing process, thereby improving the stability and sustainability of the sequencing. For example, the amino acid mutation at at least one site on the tower domain and/or the pin domain of any of the amino acid sequences in A), B) and C) to cysteine or the introduction of at least one non-natural amino acid includes at least one of the following:

On the tower domain of BCH326, S389, R340, K341, S342, N343, K343, S344, I345, V346, I347, D348, K349, D350, G351, K352, A353, K354, E355, F356, L357, R358, K359, F360, L361, N362, F363, A364, K3 The amino acid in at least one of 65, I366, Y367, N368, F369, T370, N371, K372, G373, G374, H378, G379, R380, R381, I382, T383, K384, K385, S386, K387, K388, E389, L390 and W391 is mutated to cysteine or at least one unnatural amino acid is introduced;

The amino acid of at least one of D87, I88, G89, T90, I91, H92, S93, Y94, F95, D96, I97, K98, P99, D100, I101, D102, D103, N104, G105, N106, R107, V108, F109, K110, P111 or S11 on the pin domain of BCH326 is mutated to cysteine or at least one unnatural amino acid is introduced;

On the tower domain of BCH338, S405, K406, F407, L408, V409, P410, L411, G412, D413, G414, S415, K416, E417, D418, L419, F420, P421, L422, Y423, K424, E425, A426, at least one of V427, F428, D429, I430, A431, K432, T433, M434, N435, N436, Q437, R438, K439, I440, S441, K442, N443, S444, K445, K446, N447, F448, or W449 is mutated to cysteine or at least one unnatural amino acid is introduced;

The amino acid of at least one of E93, I94, R95, P96, D97, I98, N99, E100, F101, G102, E103, R104, I105, F106, V107, P108, K109, L110, R111, D112, M113, and M114 on the pin domain of BCH338 is mutated to cysteine or at least one unnatural amino acid is introduced;

The above mutations enable chemical linkage between the tower domain and the pin domain, including covalent or non-covalent linkage.

In addition, in F), the protein preferably has 70%, 80%, 90%, 95% or 99% or more homology and the same function as the amino acid sequence of the protein defined in any of A), B) and C). The term "homology" used herein has the meaning generally known in the art, and those skilled in the art are also familiar with the rules and standards for determining the homology between different sequences. The sequences defined by different degrees of homology in the present invention must also have the helicase function at the same time. Those skilled in the art can obtain such variant sequences under the guidance of the disclosure of this application.

According to a typical embodiment of the present invention, the non-natural amino acids mentioned are not limited to 4-azido-L-phenylalanine, 4-acetyl-L-phenylalanine, 3-acetyl-L-phenylalanine, 4-acetoacetyl-L-phenylalanine, O-allyl-L-tyrosine, 3-(phenylselenoalkyl)-L-alanine, O-2-propyn-1-yl-L-tyrosine, 4-(dihydroxyboryl)-L-phenylalanine, 4-[(ethylsulfanyl)carbonyl]-L-phenylalanine, (2S)-2-amino-3-{4- [(Propan-2-ylsulfanyl)carbonyl]phenyl}propanoic acid, (2S)-2-amino-3-{4-[(2-amino-3-sulfanylpropionyl)amino]phenyl}propanoic acid, O-methyl-L-tyrosine, 4-amino-L-phenylalanine, 4-cyano-L-phenylalanine, 3-cyano-L-phenylalanine, 4-fluoro-L-phenylalanine, 4-iodo-L-phenylalanine, 4-bromo-L-phenylalanine, O-(trifluoromethyl)tyrosine, 4-nitro-L-phenylalanine, 3-hydroxy-L-tyrosine, 3-amino- L-tyrosine, 3-iodo-L-tyrosine, 4-isopropyl-L-phenylalanine, 3-(2-naphthyl)-L-alanine, 4-phenyl-L-phenylalanine, (2S)-2-amino-3-(naphth-2-ylamino)propionic acid, 6-(methylsulfanyl)norleucine, 6-oxo-L-lysine, D-tyrosine, (2R)-2-hydroxy-3-(4-hydroxyphenyl)propionic acid, (2R)-2-aminooctanoate 3-(2,2′-bipyridin-5-yl)-D-alanine, 2-amino-3-(8-hydroxy- [0063] The invention also includes the following: (1) an amino acid, (2-nitro-3-[(2-nitrobenzyl) sulfanyl] propionic acid, (2S)-2-amino-3-[(2-nitrobenzyl) oxy] propionic acid, (2R)-2-amino-3-[(2-nitrobenzyl) sulfanyl] propionic acid, (2S)-2-amino-3-[(2-nitrobenzyl) oxy] propionic acid, (4,5-dimethoxy-2-nitrobenzyl)-L-serine, (2S)-2-amino-6-({[(2-nitrobenzyl) oxy] carbonyl} amino) hexanoic acid, (2-nitrobenzyl)-L-tyrosine or 2-nitrophenylalanine.

In one embodiment of the present invention, cysteine is mutated or non-natural amino acids are introduced at the above positions, and one or more ends of one or more connectors are preferably used to covalently connect the tower domain and the pin domain of the helicase. If one end is covalently connected, one or more connectors can instantaneously connect the above two or more cysteines and/or non-natural amino acids. If two or all ends are covalently connected, one or more connectors permanently connect two or more cysteines and/or non-natural amino acids. Among them, the connector is capable of producing a medium containing covalent or non-covalent effects. This can be a commercial cross-linking agent, or it can be a small protein, a polypeptide, a synthetic small molecule, etc. It is used to connect the tower domain and the pin domain, so that after the DNA binds to the helicase, the two domains are connected, and the DNA can not leave the area during the perforation movement, thereby improving the stability and sustainability of nanopore sequencing.

One of the covalent attachment methods is to cross-link the tower domain or the pin domain of the helicase during sequencing using a cross-linking agent, thereby improving the continuity and stability of sequencing. Suitable chemical cross-linking agents are well known in the art. Suitable chemical cross-linking agents include, but are not limited to, those comprising the following functional groups: maleimide, active ester, succinimide, azide, alkyne (e.g., dibenzocyclooctyne (DIBO or DBCO), difluorocycloalkyne and linear alkyne), phosphine (e.g., those used in traceless and non-traceless Staudinger binding), haloacetyl (e.g., iodoacetamide), phosgene-type reagents, sulfonyl chloride reagents, isothiocyanates, acyl halides, hydrazine, disulfide, vinyl sulfone, aziridine and photosensitive reagents (e.g., aryl azide, diaziridine).

In addition, the present invention can also mutate the amino acid sites in the DNA binding region of this type of helicase, and the amino acid sites near the ATP catalytic active center. The mutation direction includes but is not limited to mutation to larger side chain amino acids, thereby increasing the (i) electrostatic interaction between at least one amino acid and one or more nucleotides in ssDNA; (ii) hydrogen bonding and/or (iii) cation-pi (cation-π) interaction; substitution to increase positively charged amino acids to reduce the repulsion between the motor protein and the pore, etc.

Among them, the mutation of the original amino acid to an amino acid with a larger side chain includes at least one of the following: asparagine (N) is replaced by glutamine (Q), histidine (H), arginine (R) or lysine (K); proline (P) is replaced by arginine (R), lysine (K), phenylalanine (F) or leucine (I); histidine (H) is replaced by arginine (R), lysine (K), glutamine (Q), asparagine (N) phenylalanine (F), tyrosine (Y) or tryptophan (W); proline (P) is replaced by arginine (R), lysine (K), glutamine (Q), asparagine (N) phenylalanine (F), tyrosine (Y) or tryptophan (W); Acid (P) is replaced by (i) arginine (R), lysine (K), glutamine (Q), asparagine (N) or histidine (H); phenylalanine (F) is replaced by arginine (R), lysine (K), histidine (H), tyrosine (Y) or tryptophan (W); isoleucine (I) is replaced by phenylalanine (F), tryptophan (W), histidine (H), lysine (K) or arginine (R); tyrosine (Y) is replaced by arginine (R), lysine (K), or tryptophan (W), etc.

These sites include but are not limited to the DNA binding region: BCH326: L157, V160, L294, G296, N299, L303, A304, I328, F329, T330, N331, G332, G333, E334; BCH338: H89, S90, Y91, F92, E93, I94, R95, P96; amino acid sites near the ATP catalytic active center: BCH326: K 211, E212, E213, N214, Y215, K216, A217, P218, L219, K220, D221, I222, N223, N224; BCH338: one or more of Y152, Q153, L154, P155, P156, V157, F193, L194, I195, K196, E197, Y198, E199, E200, N201. In the present invention, the amino acid site near the ATP catalytic active center is understood by those skilled in the art to refer to the amino acid site around the ATP catalytic active center and having an impact on the ATP catalytic active center.

According to one embodiment of the present invention, the long side chain amino acids at the binding amino acid sites of the surface and nanopore binding region of this type of helicase can be mutated into short side chain amino acids to reduce the repulsion between the helicase and the nanopore during sequencing.

Common mutation directions include asparagine (N) replaced by isoleucine (I), valine (V), isoleucine (L), alanine (A), serine (S) or glycine (G); lysine (K) replaced by isoleucine (I), valine (V), isoleucine (L), alanine (A), serine (S) or glycine (G); lysine (K) replaced by isoleucine (I), valine (V), isoleucine (L), alanine (A), serine (S) or glycine (G); arginine (R) replaced by isoleucine (I), valine (V), isoleucine (L), alanine (A), serine (S) or glycine (G), etc. Preferably, these amino acid positions include but are not limited to BCH326: M1, E2, S3, K4, I5, N6, L7, T8, E9, D10, Q11, L12, K13, I14, I15, K16, I189, I190, R191, T192, Q193, N194, K195, N196, S197; BCH338: M1, G2, E3, I4, K5, L6, N7, E8, E9, Q10, Q11, K12, K177, One or more of the sites I177, L178, R179, T180, K181, N182, L213, I214, D215, H216, F217, H218, V219, Y220, G221, D248, L249, T250, D251, S252, T253, E254, S255.

According to a typical embodiment of the present application, an isolated DNA molecule is provided, which has (a) a nucleotide sequence encoding any of the above-mentioned helicases; or (b) a nucleotide sequence that hybridizes with the DNA molecule specified in (a) under stringent conditions; or (c) a nucleotide sequence shown in SEQ ID NO: 2 or SEQ ID NO: 4; or (d) a nucleotide sequence that has more than 70% (preferably more than 80%, more preferably more than 85%, further preferably more than 90%, most preferably more than 95%, for example, it can be 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 98.5%, 99%, 99.5%, 99.6%, 99.7%, 99.8% or more, or even more than 99.9%) homology with any of the nucleotide sequences specified in (a) to (c) and encodes a nucleotide sequence having the same function as the above-mentioned helicase.

It should be noted that the “homology” in the present application refers to the identity between any two nucleotide sequences or amino acid sequences, from the first amino acid to the last amino acid encoded by the corresponding genes.

"Isolated" in this application means changed "by the hand of man" from its natural state, i.e., if it occurs in nature, it is changed and/or separated from its original environment. For example, a polynucleotide or polypeptide naturally present in a living organism is not "isolated", however, the same polynucleotide or polypeptide separated from its coexisting state in its natural state is "isolated" (as the term is used in this article).

The DNA molecule in the present invention hybridizes with the helicase encoding gene of the present invention under "strict conditions", which refers to the conditions under which the existence of the helicase encoding gene of the present invention can be identified by nucleic acid hybridization. In the present invention, if two DNA molecules can form an antiparallel double-stranded nucleic acid structure, it can be said that the two DNA molecules can specifically hybridize with each other. If two DNA molecules show complete complementarity, one of the DNA molecules is called the "complement" of the other DNA molecule. In the present invention, if two DNA molecules can hybridize with each other with sufficient stability so that they anneal and bind to each other under conventional "highly stringent" conditions, the two DNA molecules are said to have "complementarity". Deviation from complete complementarity is allowed as long as this deviation does not completely prevent the two molecules from forming a double-stranded structure. In order for a DNA molecule to be able to serve as a primer or probe, it is only necessary to ensure that it has sufficient complementarity in sequence so that a stable double-stranded structure can be formed under the specific solvent and salt concentration used.

In the present invention, the substantially homologous sequence is a DNA molecule that can specifically hybridize with the complementary strand of another matching DNA molecule under highly stringent conditions. Suitable stringent conditions for promoting DNA hybridization, for example, treatment with 6.0× sodium chloride/sodium citrate (SSC) at about 45°C, followed by washing with 2.0×SSC at 50°C, are well known to those skilled in the art. For example, the salt concentration in the washing step can be selected from about 2.0×SSC, 50°C for low stringency conditions to about 0.2×SSC, 50°C for high stringency conditions. In addition, the temperature conditions in the washing step can be increased from about 22°C at room temperature for low stringency conditions to about 65°C for high stringency conditions. Both the temperature conditions and the salt concentration can be changed, or one of them can be kept constant while the other variable is changed. Preferably, the stringent conditions in the present invention may be specific hybridization with the nucleotide sequence encoding the helicase of the present application in a 6×SSC, 0.5% SDS solution at 65°C, and then washing the membrane once with 2×SSC, 0.1% SDS and once with 1×SSC, 0.1% SDS.

According to a typical embodiment of the present application, a recombinant vector is provided, which comprises the above-mentioned DNA molecule, i.e., the helicase expression gene. The recombinant vector is selected from a plasmid, a virus or a carrier expression vector, etc.; the recombinant vector comprises a regulatory element for controlling the expression of the above-mentioned DNA molecule; the regulatory element comprises a promoter operably connected to the DNA molecule; preferably, the promoter comprises T7, trc, lac, ara or λL; more preferably, the recombinant vector is selected from plasmid PET.28a(+), PET.21a(+) or PET.32a(+), etc.

The helicase expression gene is inserted into the recombinant vector, and the helicase expression gene is replicated in large quantities by utilizing the ability of the recombinant vector to replicate itself in large quantities. "Recombinant" here refers to genetically engineered DNA prepared by transplanting or splicing a gene from one species into the cells of a host organism of a different species. This DNA becomes part of the host's genetic structure and is replicated.

According to a typical embodiment of the present application, a host cell is provided, wherein the host cell is transformed with the above-mentioned DNA molecule or recombinant vector.

The above recombinant vector is transformed into a host cell, and the host cell is used to replicate, transcribe, and translate the helicase expression gene on the recombinant vector, so that a large amount of helicase can be produced. The host cell includes Escherichia coli, which can be BL21 (DE3), BL21Star (DE3) pLysS, Rossata (DE3), Lemo21 (DE3), etc. The helicase of the present invention can be successfully expressed in the Escherichia coli recombinant protein expression system, and the protein is uniform and has high purity.

According to a typical embodiment of the present application, this type of helicase exhibits superior unwinding activity in a high-salt environment than in a low-salt environment, can bind well to single-stranded DNA, and unwind double-stranded DNA. This type of helicase has strong unwinding activity, and the limiting sequence blocking Spacer-18 (Sp18) cannot completely block its unwinding activity. This helicase can be used for the control and characterization of nucleic acids and applied to single-molecule nanopore sequencing. Nucleic acid control includes the control of the speed of nucleic acid passing through the nanopore, the stability control of nucleic acid perforation, or the continuity control of nucleic acid perforation; further, the application of the helicase includes the application of nanosensors and single-molecule nanopore sequencing applications. Among them, sequencing stability refers to the DNA to be tested entering the nanopore at a constant speed, the sequencing signal is stable and complete, the signal-to-noise ratio is high, and the sequencing quality will not decrease significantly as the sequencing time increases. Sequencing continuity refers to the continuous output of sequencing signals until the sequencing of the molecule to be tested is completed, the library is continuously captured, and there is no sudden interruption resulting in low coverage and accuracy of sequencing.

The beneficial effects of the present application will be further explained in detail below in conjunction with specific embodiments.

Example 1

Cloning, expression and purification of BCH326 and BCH338

1. Cloning and expression of BCH326 and BCH338

The full-length DNA sequences of BCH326 and BCH338 were respectively connected to the PET.28a(+) plasmid, and the double restriction sites Nde1 and Xho1 were used, so that the N-termini of the expressed BCH326 and BCH338 proteins had a 6*His tag and a thrombin restriction site.

The cloned PET.28a(+)-BCH326 and PET.28a(+)-BCH328 plasmids were transformed into E. coli expression bacteria BL21(DE3) or its derivatives. A single colony was picked and inoculated into 20 mL of LB medium containing kanamycin resistance, and cultured at 37°C overnight with shaking. Then, the culture was transferred into 2 L of LB medium containing kanamycin resistance, and cultured at 37°C with shaking until OD ₆₀₀ = 0.6-0.8, cooled to 16°C, and IPTG was added at a final concentration of 500 μM to induce expression overnight.

2. Purification of BCH326 and BCH338

Buffer A: 20 mM Tris-HCl pH 7.5, 250 mM NaCl, 20 mM imidazole

Buffer B: 20 mM Tris-HCl pH 7.5, 250 mM NaCl, 300 mM imidazole

Buffer C: 20 mM Tris-HCl pH 7.5, 80 mM NaCl

Buffer D: 20 mM Tris-HCl pH 7.5, 1000 mM NaCl

Buffer E: 20 mM Tris-HCl pH 7.5, 200 mM NaCl

Collect the expressed BCH326 and BCH338 cells, resuspend the cells with Buffer A, break the cells with a cell disruptor, and then centrifuge to obtain the supernatant. Mix the supernatant with the Ni-NTA filler that has been equilibrated with Buffer A in advance and combine for 1 hour. Collect the filler and wash the filler with Buffer A in large quantities until no impurities are washed out. Next, add Buffer B to the filler to elute the protein. Pass the eluted protein through a desalting column (Cytiva, Sephadex G-25) equilibrated with Buffer C, and change the protein buffer from Buffer B to Buffer C. Then, add the protein solution that has passed through the desalting column to the ssDNA cellulose filler equilibrated with Buffer C, and add an appropriate amount of coagulation protease, which can specifically recognize the thrombin cleavage site amino acid sequence LVPRG↓S in the vector sequence PET28(a)+, thereby removing the affinity His tag carried by the protein. This operation is performed at 4°C and incubated overnight on a rotary shaker. The next day, the ssDNA cellulose filler was collected, and the target protein was specifically adsorbed to the ssDNA filler. The ssDNA cellulose filler was washed 3-4 times with Buffer C to remove the impurities that were not adsorbed to the ssDNA cellulose filler, and then eluted with buffer D to destroy the specific adsorption of the target protein and the ssDNA filler, and the target protein was eluted into the solution. The protein purified by ssDNA cellulose. The protein purified by ssDNA cellulose was concentrated in a 4°C precooled centrifuge through a 30K ultrafiltration concentrator (Merck millipore), the parameters were set to a speed of 3000g, each centrifugation time was 10min, and repeated several times to concentrate the final protein volume to 2mL. Finally, it was passed through the molecular sieve Superdex 200 (Cytiva), and the molecular sieve buffer used was Buffer E. The target protein peak was collected, concentrated, and frozen. As shown in Figure 1, after purification, a large amount of BGH326 protein with good purity can be obtained, and the peak shape of the protein is uniform. On average, 0.28 mg of target protein was purified per 1 L of expression bacteria, which was on par with the yield of helicase Dda (0.23 mg of protein was purified per 1 L of expression bacteria).

As shown in Figure 2, after purification, a large amount of BGH338 protein with good purity can be obtained, and the peak shape of the protein is uniform. On average, 0.42 mg of the target protein is purified per 1L of expression bacteria, which is higher than the yield of helicase Dda (0.23 mg protein is purified per 1L of expression bacteria).

3. Amino acid sequence of BCH326 (SEQ ID NO.1)

4. DNA sequence of BCH326 (SEQ ID NO.2)

5. Amino acid sequence of BCH338 (SEQ ID NO.3)

6. DNA sequence of BCH338 (SEQ ID NO.4)

7. AlphaFold2 structure prediction of BCH326 and BCH338

With the help of AlphaFold2, BCH326 and BCH338 were predicted and their structures were obtained. The root mean square error (RMSD) between the predicted value and the true value of the protein skeleton structure reached

and

As shown in Figure 3 and Figure 4. Different secondary structures are marked with different shapes: helix, sheet, and loop. It can be seen that compared with the Dda helicase (as shown in Figure 20, PDB number 3UPU), these two helicases have two tower domains, both located on the same side of the protein, while the Dda helicase has only one tower domain.

Example 2

ATPase activity detection of BCH326 and BCH338 proteins

1. Preparation of double-stranded DNA (ovDNA) and single-stranded DNA (ssDNA): SEQ ID NO.5 and SEQ ID NO.6 were annealed to ovDNA with 20 Ts hanging at 5'. The annealing process was incubated at 95℃ for 5 minutes, cooled to 25℃ at a rate of 0.1℃/s, and continued to incubate for 30 minutes. The annealing formula is shown in Table 1. 100μM SEQ ID NO.6 was diluted to 10μM with TE buffer (pH=8) as ssDNA.

Table 1. ovDNA annealing recipe

溶液Solution	体积volume
100μM SEQ ID NO.5100μM SEQ ID NO.5	5μL5μL
100μM SEQ ID NO.6100μM SEQ ID NO.6	5μL5μL
TE缓冲液(pH＝8)TE buffer (pH = 8)	40μL40μL

2. Prepare high salt reaction buffer (2×): 20 mM HEPES (pH 8.0), 4 mM ATP, 4 mM MgCl ₂ , 1.0 M KCl.

3. Dilute protein: Dilute BCH326 and BCH338 proteins to 10 μM using 1× PBS.

4. Perform ATP hydrolysis reaction: add corresponding reagents according to the reaction system in Table 2, incubate at 30℃ for 30min, and inactivate at 80℃ for 5min. ①② are experimental groups, ③④⑤⑥ are corresponding control groups, and each group has 3 replicates.

Table 2. ATP hydrolysis reaction system

编号serial number	反应缓冲液(2×)Reaction buffer (2×)	DNADNA	蛋白protein	H ₂O _H2O
①①	10μL10μL	1μL(ovDNA)1μL (ovDNA)	1μL 1μL		8μL8μL
②②	10μL10μL	1μL(ssDNA)1μL (ssDNA)	1μL1μL	8μL8μL
③③	10μL10μL	————	1μL1μL	9μL9μL
④④	10μL10μL	1μL(ovDNA)1μL (ovDNA)	————	9μL 9μL
⑤⑤	10μL10μL	1μL(ssDNA)1μL (ssDNA)	————	9μL9μL
⑥⑥	10μL10μL	————	————	10μL10μL

5. Detection of the remaining ATP in the reaction: According to the manufacturer's instructions, use an ATP detection kit (Biyuntian, S0026B) to determine the remaining ATP concentration in the reaction.

6. Experimental results: As shown in Figures 5 and 6, under high salt conditions, both BCH326 and BCH338 have the activity of hydrolyzing ATP.

Example 3

Detection of dsDNA melting activity of BCH326 and BCH338 proteins

1. Preparation of double-stranded DNA (ovDNA): anneal SEQ ID NO.7 and SEQ ID NO.8 to ovDNA with 20 Ts hanging from the 5'. The annealing process is to incubate at 95°C for 5 minutes, cool to 25°C at a rate of 0.1°C/s, and incubate for 30 minutes. The annealing formula is shown in Table 3.

Table 3. ovDNA annealing recipe

溶液Solution	体积volume
100μM SEQ ID NO.7100μM SEQ ID NO.7	5μL5μL
100μM SEQ ID NO.8100μM SEQ ID NO.8	5μL5μL
TE缓冲液(pH＝8)TE buffer (pH = 8)	40μL40μL

2. Prepare reaction buffer: low salt reaction buffer 1 is 100 mM HEPES (pH=8.0), 1 mg/mL BSA, 10 mM MgCl ₂ , 150 mM KCl; high salt reaction buffer 2 is 100 mM HEPES (pH=8.0), 1 mg/mL BSA, 10 mM MgCl ₂ , 500 mM KCl.

3. Prepare the reaction solution: add 3μL 10μM annealed ovDNA, 6μL 100μM SEQ ID NO.9 (as competitive DNA to capture the unwound single DNA chain), and 6μL 100mM ATP to 585μL low salt reaction buffer or high salt reaction buffer as the experimental reaction solution. Add 1μL 10μM SEQ ID NO.8, 2μL 100μM SEQ ID NO.9, and 2μL 100mM ATP to 195μL low salt reaction buffer or high salt reaction buffer as the positive control solution.

4. Dilute protein: Dilute BCH326 and BCH338 proteins to 4.8 μM using 1× PBS.

5. Prepare the melting reaction: Divide into experimental group ①, negative group ②, and positive group ③. Add the corresponding reagents according to Table 4. Use an ELISA reader to detect the kinetic changes of fluorescence intensity within 30 minutes at 30°C. Repeat 3 times for each group.

6. Data analysis: Calculate the percentage of the fluorescence values of the experimental group and the negative control group relative to the fluorescence value of the positive control group.

Table 4. Melting reaction recipe

编号serial number	溶液1Solution 1	溶液2 Solution 2
①①	58.5μL实验反应液58.5μL experimental reaction solution	1.5μL蛋白1.5 μL protein
②②	58.5μL实验反应液58.5μL experimental reaction solution	1.5μL反应缓冲液1.5 μL reaction buffer
③③	58.5μL阳性对照液58.5μL positive control solution	1.5μL反应缓冲液1.5 μL reaction buffer

7. Experimental results:

Within the error range and the allowable instrument fluctuation, the experimental results are plotted by calculating the ratio of the measured fluorescence value to the fluorescence value measured by the positive control (due to the sensitivity of the instrument, the negative control group has fluorescence absorption reading). From the experimental results, it can be seen that the negative control group in each experiment remains unchanged during the measurement process, while the fluorescence value of the experimental group gradually increases with the increase of reaction time, indicating that it has the activity of unwinding double-stranded DNA, and the unwinding direction is 5'-3'.

As shown in Figures 7 and 8, under low salt (final KCl concentration of 150 mM) and high salt (final KCl concentration of 500 mM) conditions, BCH326 has the activity of unwinding dsDNA, and as the salt concentration increases, the activity of BCH326 unwinding dsDNA increases. As shown in Figures 9 and 10, under low salt and high salt conditions, BCH338 has the activity of unwinding dsDNA, and as the salt concentration increases, the activity of BCH338 unwinding dsDNA increases.

Example 4

Limiting sequence blocking BCH326 and BCH338 unzipping activity detection

1. Prepare double-stranded DNA (ovDNA) containing a restriction sequence: anneal SEQ ID NO.7 and SEQ ID NO.10 to ovDNA (containing a restriction sequence) with 20 Ts hanging from the 5'. The annealing process is to incubate at 95°C for 5 minutes, cool to 25°C at a rate of 0.1°C/s, and incubate for 30 minutes. The annealing formula is shown in Table 5.

Table 5. Annealing formula of ovDNA (containing restriction sequences)

溶液Solution	体积volume
100μM SEQ ID NO.7100μM SEQ ID NO.7	5μL5μL
100μM SEQ ID NO.10100μM SEQ ID NO.10	5μL5μL
TE缓冲液(pH＝8)TE buffer (pH = 8)	40μL40μL

2. Prepare reaction buffer: low salt reaction buffer is 100 mM HEPES (pH=8.0), 1 mg/mL BSA, 10 mM MgCl ₂ , 150 mM KCl; high salt reaction buffer is 100 mM HEPES (pH=8.0), 1 mg/mL BSA, 10 mM MgCl ₂ , 500 mM KCl.

3. Prepare the reaction solution: Take 3μL 10μM annealed ovDNA (containing the restriction sequence), 6μL 100μM SEQ ID NO.9 (20-fold competitive DNA), and 6μL 100mM ATP to 585μL low-salt reaction buffer or high-salt reaction buffer as the experimental reaction solution. Take 1μL 10μM SEQ ID NO.11, 2μL 100μM SEQ ID NO.9 (20-fold competitive DNA), and 2μL 100mM ATP to 195μL low-salt reaction buffer or high-salt reaction buffer as the positive control solution.

4. Dilute protein: Dilute BCH326 and BCH338 proteins to 4.8 μM using 1× PBS.

5. Prepare the melting reaction: Divide into experimental group ①, negative group ②, and positive group ③. Add the corresponding reagents according to Table 6. Use an ELISA reader to detect the kinetic changes of fluorescence intensity within 30 minutes at 30°C. Repeat 3 times for each group.

Table 6. Melting reaction recipe

编号serial number	溶液1Solution 1	溶液2 Solution 2
①①	58.5μL实验反应液58.5μL experimental reaction solution	1.5μL蛋白1.5 μL protein
②②	58.5μL实验反应液58.5μL experimental reaction solution	1.5μL 1×PBS1.5 μL 1× PBS
③③	58.5μL阳性对照液58.5μL positive control solution	1.5μL 1×PBS1.5 μL 1× PBS

7. Experimental results:

As shown in Figures 11 and 12, under low salt (final KCl concentration of 150 mM) and high salt (final KCl concentration of 500 mM) conditions, the restriction sequence weakened the activity of BCH326 in unwinding dsDNA, but could not completely block its unwinding activity, and BCH326 still had a continuous unwinding activity trend under high salt conditions. As shown in Figure 13, under low salt conditions, the restriction sequence almost completely blocked BCH338 from unwinding dsDNA; as shown in Figure 14, under high salt conditions, the restriction sequence could weaken the activity of BCH338 in unwinding dsDNA, but could not completely block its unwinding dsDNA.

Example 5

Nanopore sequencing applications of BCH326 and BCH338 proteins

1. Two partially complementary DNA strands (top strand, SEQ ID NO.11 and bottom strand, SEQ ID NO.12) were annealed to form a linker, which was then connected to the double-stranded target fragment to be tested using T4 DNA ligase at room temperature and purified to prepare a sequencing library. Figure 15 shows a schematic diagram of the linker (a: top strand; b: bottom strand).

2. BCH326 or BCH338 protein was incubated with the sequencing library at 25°C for 1 h (molar concentration ratio 1:8) to form a sequencing library containing helicase. Figure 16 shows a schematic diagram of a sequencing library containing helicase (a: upper strand; b: lower strand; c: double-stranded target fragment; d: helicase; e: cholesterol-labeled double-stranded DNA).

3. The sequencing library containing helicase was incubated with single-stranded DNA containing cholesterol at the 5' end (ssDNA-chol, SEQ ID NO.13) at room temperature for 10 minutes. The ssDNA-chol sequence is complementary to a part of the bottom strand of the adapter. After cholesterol binds to the phospholipid membrane, it can reduce the amount of library loading and increase the capture rate.

4. Use a patch clamp amplifier or other electrical signal amplifier (as shown in FIG17 ) to collect current signals. A Teflon membrane with a micrometer-sized hole in the middle (diameter 50-200 μm) divides the electrolytic cell into two chambers, the cis chamber and the trans chamber; a pair of Ag/AgCl electrodes are placed in each of the cis chamber and the trans chamber; a layer of bimolecular phospholipid membrane is formed at the micropores of the two chambers, and the nanopore protein CsgG-Eco-(Y51A/F56Q/R97W/R192D-StrepII(C)) is added; electrical measurements are obtained after a single nanopore protein is inserted into the phospholipid membrane; the reaction product of step 3 is added, 180 mV is applied, the sequencing library is captured by the nanopore, and the nucleic acid passes through the nanopore under the control of the helicase. The buffer used in this experiment was: 0.47M KCl, 25mM HEPES, 1mM EDTA, 5mM ATP, 25mM MgCl2, pH 7.6, and the sequencing temperature was 28°C.

5. The sequencing experiment was performed using BCH326, and the sequencing electrical signal is shown in Figure 18. The sequencing experiment was performed using BCH338, and the sequencing electrical signal is shown in Figure 19. The results show that as the helicase controls the DNA single strand to enter the nanopore, part of the current is blocked and the current becomes smaller. Since different nucleotides have different sizes, the size of the blocked current is also different, so a fluctuating current signal can be seen. And both Figures 18 and 19 have complete connector signals and reply signals, and the signal-to-noise ratio is high, indicating that the stability of the sequencing signal is good.

SEQ ID NO.5：5’-GCGTCGAAAAGCAGTACTTAGGCATT-3’

SEQ ID NO.6：5’-TTTTTTTTTTTTTTTTTTTTTTTAATGCCTAAGTACTGCTTTTCGACGC-3’

SEQ ID NO.7：5’-BHQ-1-GCGTCGAAAAGCAGTACTTAGGCATT-3’

SEQ ID NO.8：5’-TTTTTTTTTTTTTTTTTTTTTTTAATGCCTAAGTACTGCTTTTCGACGC-FAM-3’

SEQ ID NO.9：5’-AATGCCTAAGTACTGCTTTTCGACGCT-3’

SEQ ID NO.10: 5’-TTTTTTTTTTTTTTTTTTTTTTTNNNNAATGCCTAAGTACTGCTTTTCGACGC-FAM-3’(N＝iSP18)

SEQ ID NO.11：5’-TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNNGGTTGTTTCTGTTGGTGCTGATATTGCT-3’(N＝iSP18)

SEQ ID NO.12：5’-GCAATATCAGCACCAACAGAAACAACCTTTGAGGCGAGCGGTCAA-3’

SEQ ID NO.13：5’-cholesterol-TTGACCGCTCGCCTC-3’.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection scope of the present invention.

Claims

A helicase, characterized in that the helicase has two tower domains and one pin domain, and the two tower domains are located on the same side of the three-dimensional structure of the helicase.
The helicase according to claim 1, characterized in that the helicase comprises at least one of the following:

A) BCH326, wherein BCH326 is a protein having an amino acid sequence as shown in SEQ ID NO: 1;

B) BCH338, wherein BCH338 is a protein having an amino acid sequence as shown in SEQ ID NO: 3;

C) a protein in which at least one cysteine on the surface of the protein defined in A) or B) is mutated to alanine, glutamine, glycine, histidine, isoleucine, leucine, valine, serine, threonine or methionine;

D) a protein having DNA unwinding ability, wherein at least one amino acid in the tower domain and/or the pin domain of the amino acid sequence of the protein defined in any one of A), B) and C) is mutated to cysteine or at least one unnatural amino acid is introduced; and

E) A protein having an amino acid sequence homology of more than 70% with the protein defined in any one of A), B), C) and D) and having the same function.
The helicase according to claim 2, characterized in that said C) comprises:

A protein in which the amino acid C at position 319 of BCH326 is substituted with A, S, T, V, I, L or G; and

A protein in which the amino acid C at position 326 or position 459 of BCH338 is substituted with A, S, T, V, I, L or G.
The helicase according to claim 2, characterized in that, in D), the amino acid mutation at at least one site on the tower domain and/or the pin domain to cysteine or the introduction of at least one non-natural amino acid comprises at least one of the following:

The tower domain of BCH326 has S389, R340, K341, S342, N343, K343, S344, I345, V346, I347, D348, K349, D350, G351, K352, A353, K354, E355, F356, L357, R358, K359, F360, L361, N362, F363, A364, K The amino acid in at least one of 365, I366, Y367, N368, F369, T370, N371, K372, G373, G374, H378, G379, R380, R381, I382, T383, K384, K385, S386, K387, K388, E389, L390 and W391 is mutated to cysteine or at least one unnatural amino acid is introduced;

The amino acid of at least one of D87, I88, G89, T90, I91, H92, S93, Y94, F95, D96, I97, K98, P99, D100, I101, D102, D103, N104, G105, N106, R107, V108, F109, K110, P111 or S112 on the pin domain of BCH326 is mutated to cysteine or at least one unnatural amino acid is introduced;

The tower domain of BCH338 has S405, K406, F407, L408, V409, P410, L411, G412, D413, G414, S415, K416, E417, D418, L419, F420, P421, L422, Y423, K424, E425, A426, V427, F428, D the amino acid in at least one of 429, I430, A431, K432, T433, M434, N435, N436, Q437, R438, K439, I440, S441, K442, N443, S444, K445, K446, N447, F448, or W449 is mutated to cysteine or at least one unnatural amino acid is introduced;

The amino acid of at least one of E93, I94, R95, P96, D97, I98, N99, E100, F101, G102, E103, R104, I105, F106, V107, P108, K109, L110, R111, D112, M113, and M114 on the pin domain of BCH338 is mutated to cysteine or at least one unnatural amino acid is introduced;

Preferably, in E), the protein has 70%, 80%, 90%, 95% or 99% or more homology and the same function as the amino acid sequence of the protein defined in any one of A), B), C) and D).
The helicase according to any one of claims 2 to 4, characterized in that the non-natural amino acid is selected from 4-azido-L-phenylalanine, 4-acetyl-L-phenylalanine, 3-acetyl-L-phenylalanine, 4-acetoacetyl-L-phenylalanine, O-allyl-L-tyrosine, 3-(phenylselenoyl)-L-alanine, O-2-propyn-1-yl-L-tyrosine, 4-(dihydroxyboryl)-L-phenylalanine, 4-[(ethylsulfanyl)carbonyl]-L-phenylalanine, (2S)-2-amino-3- {4-[(propan-2-ylsulfanyl)carbonyl]phenyl}propanoic acid, (2S)-2-amino-3-{4-[(2-amino-3-sulfanylpropionyl)amino]phenyl}propanoic acid, O-methyl-L-tyrosine, 4-amino-L-phenylalanine, 4-cyano-L-phenylalanine, 3-cyano-L-phenylalanine, 4-fluoro-L-phenylalanine, 4-iodo-L-phenylalanine, 4-bromo-L-phenylalanine, O-(trifluoromethyl)tyrosine, 4-nitro-L-phenylalanine, 3-hydroxy-L-tyrosine, 3-amino- L-tyrosine, 3-iodo-L-tyrosine, 4-isopropyl-L-phenylalanine, 3-(2-naphthyl)-L-alanine, 4-phenyl-L-phenylalanine, (2S)-2-amino-3-(naphth-2-ylamino)propionic acid, 6-(methylsulfanyl)norleucine, 6-oxo-L-lysine, D-tyrosine, (2R)-2-hydroxy-3-(4-hydroxyphenyl)propionic acid, (2R)-2-aminooctanoate 3-(2,2′-bipyridin-5-yl)-D-alanine, 2-amino-3-(8-hydroxy-3- at least one of (2R)-2-amino-3-[(2-nitrobenzyl)sulfanyl]propionic acid, (2S)-2-amino-3-[(2-nitrobenzyl)oxy]propionic acid, O-(4,5-dimethoxy-2-nitrobenzyl)-L-serine, (2S)-2-amino-6-({[(2-nitrobenzyl)oxy]carbonyl}amino)hexanoic acid and O-(2-nitrobenzyl)-L-tyrosine or 2-nitrophenylalanine;

Preferably, the BCH326 introduces at least one unnatural amino acid including at least one of the following: D100 introduces 4-azido-L-phenylalanine, I101 introduces 4-azido-L-phenylalanine, D102 introduces 4-azido-L-phenylalanine, D103 introduces 4-azido-L-phenylalanine, N104 introduces 4-azido-L-phenylalanine, G105 introduces 4-azido-L-phenylalanine, N106 introduces 4-azido-L-phenylalanine, R107 introduces 4-azido-L-phenylalanine, D103 introduces 4-acetyl-L-phenylalanine, G105 introduces 4-acetyl-L-phenylalanine and N106 introduces 4-acetyl-L-phenylalanine;

Preferably, the BCH338 introduces at least one unnatural amino acid including at least one of the following: A431 introduces 4-azido-L-phenylalanine, K432 introduces 4-azido-L-phenylalanine, T433 introduces 4-azido-L-phenylalanine, M434 introduces 4-azido-L-phenylalanine, N435 introduces 4-azido-L-phenylalanine, S441 introduces 4-azido-L-phenylalanine, K442 introduces 4-azido-L-phenylalanine, N443 introduces 4-azido-L-phenylalanine and S444 introduces 4-azido-L-phenylalanine.
The helicase according to any one of claims 1 to 4, characterized in that there is at least one amino acid mutation at an amino acid site in the DNA binding region of the helicase and/or an amino acid site near the ATP catalytic active center, wherein the mutation comprises mutating the original amino acid to an amino acid with a larger side chain;

Preferably, the mutation of the original amino acid to an amino acid with a larger side chain comprises at least one of the following: asparagine is replaced by glutamine, histidine, arginine or lysine; proline is replaced by arginine, lysine, phenylalanine or leucine; histidine is replaced by arginine, lysine, glutamine, asparagine phenylalanine, tyrosine or tryptophan; proline is replaced by arginine, lysine, glutamine, asparagine or histidine; phenylalanine is replaced by arginine, lysine, histidine, tyrosine or tryptophan; isoleucine is replaced by phenylalanine, tryptophan, histidine, lysine or arginine; tyrosine is replaced by arginine, lysine or tryptophan;

The amino acid sites in the DNA binding region of BCH326 include: L157, V160, L294, G296, N299, L303, A304, I328, F329, T330, N331, G332, G333, and E334, and the amino acid sites near the ATP catalytic active center include: K211, E212, E213, N214, Y215, K216, A217, P218, L219, K220, D221, I222, N223, and N224;

The amino acid sites in the DNA binding region of BCH338 include: H89, S90, Y91, F92, E93, I94, R95 and P96; the amino acid sites near the ATP catalytic active center include: Y152, Q153, L154, P155, P156, V157, F193, L194, I195, K196, E197, Y198, E199, E200 and N201.
The helicase according to any one of claims 1 to 4, characterized in that the amino acid on the surface of the helicase that interacts with the nanopore binding region has a mutation in at least one site, wherein the mutation comprises mutating the original amino acid into an amino acid with a shorter side chain;

Preferably, the mutation of the original amino acid to an amino acid with a shorter side chain includes: asparagine is replaced by isoleucine, valine, isoleucine, alanine, serine or glycine; lysine is replaced by isoleucine, valine, isoleucine, alanine, serine or glycine; lysine is replaced by isoleucine, valine, isoleucine, alanine, serine or glycine; arginine is replaced by isoleucine, valine, isoleucine, alanine, serine or glycine;

Preferably, the amino acids on the surface of BCH326 that interact with the nanopore binding region include: M1, E2, S3, K4, I5, N6, L7, T8, E9, D10, Q11, L12, K13, I14, I15, K16, I189, I190, R191, T192, Q193, N194, K195, N196, and S197;

The amino acids on the surface of BCH338 that interact with the nanopore binding region include: M1, G2, E3, I4, K5, L6, N7, E8, E9, Q10, Q11, K12, K177, I177, L178, R179, T180, K181, N182, L213, I214, D215, H216, F217, H218, V219, Y220, G221, D248, L249, T250, D251, S252, T253, E254, and S255.
An isolated DNA molecule, characterized in that the DNA molecule has

(a) a nucleotide sequence encoding the helicase according to any one of claims 2 to 3; or

(b) a nucleotide sequence that hybridizes under stringent conditions to the DNA molecule defined in (a); or

(c) having the nucleotide sequence shown in SEQ ID NO: 2 or SEQ ID NO: 4; or

(d) A nucleotide sequence having 70% or more homology with any one of the nucleotide sequences defined in (a) to (c) and encoding a protein having the same function as the helicase.
The DNA molecule according to claim 8 is characterized in that the DNA molecule has a nucleotide sequence that has more than 75%, preferably more than 85%, more preferably more than 95%, and further preferably more than 99% homology with any one of the nucleotide sequences defined in (a) to (c) and encodes a protein with the same function.
A recombinant vector, characterized in that the recombinant vector comprises the DNA molecule according to claim 8 or 9.
The recombinant vector according to claim 10, characterized in that the recombinant vector is selected from a plasmid, a virus or a carrier expression vector;

Furthermore, the recombinant vector includes a regulatory element for controlling the expression of the DNA molecule;

Furthermore, the regulatory element includes a promoter operably linked to the DNA molecule;

Preferably, the promoter comprises T7, trc, lac, ara or λL;

More preferably, the recombinant vector is selected from plasmid PET.28a(+), PET.21a(+) or PET.32a(+).
A host cell, characterized in that the host cell contains the DNA molecule according to claim 8 or 9, or the recombinant vector according to claim 10 or 11.
The host cell according to claim 12, characterized in that the host cell comprises Escherichia coli;

Preferably, the host cells include BL21(DE3), BL21 Star(DE3)pLysS, Rossata(DE3) or Lemo21(DE3).
Use of a helicase according to any one of claims 1 to 7 in nucleic acid control or characterization;

Further, the nucleic acid control includes controlling the speed of the nucleic acid passing through the nanopore, controlling the stability of the nucleic acid perforation, or controlling the continuity of the nucleic acid perforation;

Furthermore, the application includes application in nanosensors and/or application in single-molecule nanopore sequencing.
A nanopore sequencing kit, comprising a helicase, characterized in that the helicase is the helicase according to any one of claims 1 to 7.
A nanopore sequencing method, comprising sequencing a nucleic acid molecule to be sequenced under the control of a helicase, wherein the helicase is the helicase according to any one of claims 1 to 7.