WO2002040632A2 - Creation and identification of proteins having new dna binding specificities - Google Patents
Creation and identification of proteins having new dna binding specificities Download PDFInfo
- Publication number
- WO2002040632A2 WO2002040632A2 PCT/US2001/043107 US0143107W WO0240632A2 WO 2002040632 A2 WO2002040632 A2 WO 2002040632A2 US 0143107 W US0143107 W US 0143107W WO 0240632 A2 WO0240632 A2 WO 0240632A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gene
- sequence
- dna
- protein
- dna binding
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/82—Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
- C12N15/8216—Methods for controlling, regulating or enhancing expression of transgenes in plant cells
- C12N15/8217—Gene switch
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K14/00—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
- C07K14/435—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
- C07K14/46—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates
- C07K14/47—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals
- C07K14/4701—Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans from vertebrates from mammals not used
- C07K14/4702—Regulators; Modulating activity
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/79—Vectors or expression systems specially adapted for eukaryotic hosts
- C12N15/82—Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
- C12N15/8216—Methods for controlling, regulating or enhancing expression of transgenes in plant cells
Definitions
- the invention relates to DNA binding proteins and methods of creating new regulatory proteins.
- DNA binding proteins regulate the activity of genes or set of genes through their effects on transcription. The regulation typically occurs through binding to DNA. Accordingly, the term "DNA binding proteins” has been adopted to mean the large class of proteins that bind and regulate DNA. Features of this binding may be understood through the specific three- dimensional structure of the protein and of the DNA, which provides information of interactions between the protein and the nucleotide bases and/or sugar-phosphate-backbone moieties of the DNA.
- the regulatory system encompasses gene- regulated transcription (and thereby gene activity regulation).
- the regulatory system comprises, as a minimum, a regulatory gene that encodes a DNA binding protein that influences DNA transcription, a promoter where RNA synthesis is initiated, an operator that consists of at least one transcriptional control sequence and a structural gene (protein-coding gene) that can be regulated.
- regulator substances, repressors and/or activators of gene transcription often control gene expression.
- the regulatory mechanisms of transcription and gene expression differ between prokaryotic and eukaryotic organisms. However, the basis for the regulation is similar.
- Protein regulators of gene transcription bind to specific DNA sequences and inhibit or activate the transcription of one or more genes. More than one regulator protein (as an activator or inhibitor) often binds a given gene or gene set through binding to one or more DNA sequences.
- the DNA binding site When present in a prokaryote the DNA binding site often is termed an "operator ' When present in a eukaryote the DNA binding sequence often is termed an "activator,” “activator sequence,” “enhancer,” or “enhancer sequence.” Additional DNA sequences are important for transcription and transcriptional regulation. These further sequences form binding sites for general transcription factors such as proteins used for gene transcription generally. One such transcription factor is an RNA polymerase. A DNA sequence that binds an RNA polymerase generally is termed a "promoter" and is important for transcription control.
- Wharton et al (1984) changed the amino acid sequence of the recognition helix of the 434 repressor protein to that of the 434 cro protein and reported the conversion of binding specificity of the mutated repressor to that of the cro protein.
- Wharton and Ptashne (1985) substituted the recognition helix amino acid sequence of the 434 repressor with that of the P22 repressor protein and reported the conversion of binding specificity to that of the P22 protein.
- Wharton and Ptashne experiments between 434 repressor and ⁇ repressor, or ⁇ cro and
- PCT WO97/37030 describes a such a method for selecting seven amino acid long peptides that repress a reporter gene through a zinc-finger motif structure. This method is similar to that reported by Simonscits et al. (1999) where a reporter gene is used to screen for protein variants that function as repressors. This latter work describes the construction of combinatorial libraries of mutations of single chain variants of 434 repressor and the phenotypic screening of the libraries for desired DNA binding specificities. All of these methods suffer from the serious limitation that their use with libraries larger than 10 4 to 10 5 members becomes very burdensome, since at least each individual member of the library should be scored for its respective phenotype. In fact in the Simonscits et al.
- the reporter screening methods are disadvantageous since screening of larger libraries for phenotypic traits is difficult if not impossible. These methods also fail to teach how to balance reporter gene expression through selection methods used on the target gene promoter.
- US patent 5,789,538 shows a phage display/physical screening method that selects for zinc-finger variants that bind to desired target DNA sequences using a library of DNA sequences.
- the DNA sequences encode zinc-fingers with mutational variations at presumed and known DNA-protein interfaces.
- the selected protein variants differ in sequence from wildtype forms and their DNA sequence binding specificities can be selected from a large phage set that displays different zinc-fingers. Further descriptions of this type of technology are found in US patents 6,242,568, 6,013,453, 5,223,409 and 5,571,698. These in vitro / ex vivo technologies while useful, suffer several disadvantages.
- the conditions in which a protein functions in DNA binding external to the cell as a part of a phage particle are distinctly different from those conditions within the cell where presumably any useful DNA binding protein variant will find its application.
- These differences can be numerous. Among them are, for example, cooperative interactions with other proteins of the cell including those of the transcriptional machinery, the physical stability of the 3- dimensional structure of the protein variants under in vivo conditions, and the proteolytic sensitivity of the protein variants.
- DNA binding that is to be selected from phage display experiments occurs externally to the cell and is separated in space from the compartment in which it naturally occurs means that only DNA binding characteristics will be selected and that any other function or characteristic of the DNA binding protein will be ignored by the phage display system.
- This can lead to the identification of variants that might not reflect the normal mechanism of transcriptional control that operate within the cell and precludes the selection of protein variants that function in cooperation with other DNA binding or transcription-effecting proteins (whether known or unknown) in transactivation and transcriptional repression processes, h addition, instabilities in the protein structure of the variants due to the differences between cell internal and external milieus will not be adequately controlled.
- US patents 5,096,815 and 5,198,346 describe new DNA binding proteins, in particular repressor proteins, generated through combinatorial mutagenesis of the DNA encoding the proteins, that possess new DNA binding specificities that are identified through genetic selection systems that target DNA binding to desired DNA sequences.
- the repressor proteins described here are proteins that are similar to normal wildtype proteins except at a number of positions within the gene that encode the protein.
- Such gene mutational libraries of the DNA binding protein are inserted into a plasmid or other suitable vector for protein expression and are incorporated into a bacterial cell by standard molecular biological techniques.
- DNA targets of the binding protein variants also may be incorporated into a plasmid.
- the target sequence functions as a regulatory operator for a structural gene, that when expressed, provides a selective disadvantage to cell growth.
- a protein variant binds to its target operator sequence and represses transcription of the deleterious structural gene, the affected cell acquires a selective growth advantage.
- transcriptional activation systems such as the bacterial two-hybrid system described by Joung et al. (2000) and other related eukaryotic systems (Wilson et al., 1984; Chien et al., 1991) also may suffer disadvantageous effects from genetic pressure.
- Joung et al. report, for example, the occurrence of a relatively high rate of background antibiotic resistance that can be found in their system. This serious problem presumably is attributable to undesirable selective pressure that resulted in increased spectinomycin resistance that was not dependent on the desired DNA binding protein transactivation and that results in increased false positive identification when activation of antibiotic resistance was selected.
- a number of the techniques described above use relatively simple reporter gene transcription systems to report the presence in phenotypic screening experiments of DNA binding protein variants that bind desired target sequences. These techniques do not take into consideration the necessity of balancing the effects of different target DNA sequences on the reporter gene transcriptional activities and thereby not generally applicable to all target sequences. In addition, these methods suffer from the limitation that each individual clone in the library needs to be scored in the screening system for the effect of the DNA binding protein variant on the phenotype of the reporter gene transcription. While useful for relatively small combinatorial libraries of mutations, these systems are not practical for use with larger libraries. Thus, while interesting, these technologies have severe limitations.
- any mutation that confers partial or complete resistance to the imposed selection will relieve the growth inhibition and contaminate the desired cells that are selected as harboring a desired protein variant that binds the target DNA. It is difficult to separate these contaminating false positives and as libraries become larger the frequency of false positive increases. Thus, while interesting, these technologies also have severe limitations.
- New DNA binding gene sequences include DNA sequences that encode proteins that regulate such target genes as well as gene constructs and biological materials that contain such DNA binding proteins and/or their DNA sequences.
- the invention also encompasses methods for discovering transcriptional promoters. Embodiments of these methods: a) identify desired target genes specific for DNA binding proteins; b) target DNA binding protein variants to desired DNA binding sequences; c) remove undesired DNA binding protein variants from a larger library of variants; and d) provide media useful to assay in vivo DNA binding.
- the invention further encompasses kits to identify and produce DNA binding protein variants and/or their DNA sequences.
- One embodiment of the invention is a method for deriving a gene sequence of a DNA binding protein that can bind to a target regulatory sequence, comprising the steps of selecting a starting DNA sequence for a DNA binding protein, mutating the selected sequence, providing a mutated DNA sequence to a cell that has at least one genetically neutral transcriptional unit wherein the transcriptional unit comprises at least one promotor, at least one reporter gene or separator gene and at least one copy of the target regulatory sequence, wherein binding between the DNA binding protein encoded by the mutated sequence and the target regulatory sequence regulates the expression of the at least one reporter gene or separator gene, and screening for the regulated expression of a gene from the transcriptional unit.
- the invention is a method for deriving a gene sequence of a useful DNA binding protein that binds to a target DNA regulatory sequence comprising the steps of selecting a DNA sequence that encodes a protein, mutating the selected sequence, providing a mutated DNA sequence to a cell that has at least one genetically neutral transcriptional unit wherein the transcriptional unit comprises at least one promotor and at least one reporter gene or separator gene and at least one copy of the target DNA regulatory sequence, wherein binding between the DNA binding protein encoded by the mutated sequence and the target regulatory sequence regulates the expression of the at least one reporter gene or separator gene, and screening for expression of a gene by the transcriptional unit.
- transgenic plants that contain a heterologous gene wherein the heterologous gene comprises a sequence determined by a method as described herein
- transgenic plants that contain a mutated gene wherein the mutated gene comprises a sequence determined by a method as described herein
- tools for controlling gene expression comprising a nucleic acid with a sequence obtained by a method as described herein and genes having a sequence prepared by any of the methods described herein.
- FIGS 1 to 23 depict nucleic acid sequences according to embodiments of the invention.
- the inventors discovered methods and tools that, in most embodiments, avoid the use of regular negative or positive selection pressure to generate superior cell libraries of new
- telomere shortening refers to DNA sequences.
- regular negative or positive selection pressure refers to gene selection that significantly affects cell survival enough for the gene to be used in selection procedures.
- a "genetically neutral" gene desirably used for selection in the invention is not very essential to cell growth and survival and / or in preferred embodiments does not measurably affect survival.
- the disadvantages of selection pressure on growth or replication are alleviated in embodiments of the invention by relying on an operator, reporter and or separator gene product to distinguish cell clones of differing gene sequences without affecting cell survival or replication.
- These disadvantages include, among other things, an unacceptably high level of false positive and false negative clones.
- the disadvantages are particularly acute for larger libraries such as those having more than 1,000,000 members, as spontaneous mutations create more undesirable yet selected sequences at the higher population level.
- any mutation that gives a selective advantage or disadvantage, respectively may tend to accumulate and form a colony, and be falsely detected as having an operational gene variant.
- the methods discovered and presented here function intracellularly with natural transcriptional regulatory mechanisms that reflect functional DNA binding and thereby eliminate problems associated with extracellular DNA binding methods.
- the identification of DNA binding variants occurs unobtrusively to the cell and no particularly strong positive or negative consequences from the screening or selection mechanisms effects the growth or survivability of the cells.
- a target DNA binding sequence (a desired operator) is cloned adjacent to a structural gene used for screening and selection so that (1) the expression of the structural gene can be regulated through the binding of a DNA binding protein variant to the operator sequence, and (2) the DNA binding protein variants are expressed from DNA sequences that have been combinatorially mutated.
- the screening selection gene(s) use reporter genes and/or separator genes that lack significant negative or positive evolutionary selective pressure for growth or survival of the cells.
- the reporter and separator genes preferably are structural gene(s) that act to distinguish cells expressing the protein from cells having reduced expression of the gene.
- a reporter gene codes for a "reporter" that is detectable either directly or indirectly.
- An example of a directly detectable reporter is a fluorescent protein such as green fluorescent protein.
- An example of an indirectly detectable reporter is an enzyme that is detected by addition of a substrate such as a colorimetric, fluorescent or chemilumigenic substrate.
- a cell can be separated from other cells based on detection of the expression level of reporter inside (intracellular) that cell or outside (extracellular to) the cell, or a combination of intracellular and extracellular.
- a separator gene codes for a protein that leads directly or indirectly to an altered molecular structure on the cell surface. Most typically the separator gene codes for a protein that goes to the outer surface and is found there. Separator gene expression allows physical separation of cells based on binding to the expressed molecule, which may be the separator protein, or something else which is influenced by the separator protein.
- a separator gene may, for example, be a antibody binding site, such as a single chain antibody, or an antigen.
- a cell (with its genetic complement) can, for example, be physically separated from other cells through specific binding with the separator gene product.
- combinations are possible that allow physical separation of cells based on the regulatory control of gene expression by the mutated DNA binding protein variant.
- a gene may be both a separator gene and a reporter gene.
- a protein that has enzymatic activity yet is expressed at the cell surface can facilitate selection both by presenting a target for binding to the is the cell and by reacting with a suitable substrate to mark the cell in some manner, such as by formation of an optical product in the vicinity of the cell.
- the gene expression levels of the reporter and/or separator genes are adjusted such that expression levels in the absence of binding of repressor protein to operator sequence are discernible from expression levels influenced by the binding of protein to the desired operator sequence. Still another embodiment is a method wherein lacZ and lacZ' reporter gene product activities are assayed in vivo.
- Another embodiment of the invention is the use of separator gene expression and repression through which clones containing the desired operator-binding protein variant are physically separated from those cells that do not contain such desired variants.
- the expression and/or repression of a reporter or separator gene is used to finally select the cells that contain the desired DNA binding protein variants.
- the selection and screening genes useful in this invention include any natural or synthetic gene or DNA sequence that encodes a peptide, protein or enzyme that can be detected or used to identify or separate cells expressing the product from those cells that have a repressed expression.
- genes that encode detectable products to distinguish or separate cells repressing or expressing the gene should not be present or should not be intact in the host cell used for the screening experiment.
- a gene product may create a colored chemical reaction product, something that consumes a colored reactant, or product(s) that are directly detectable.
- reporter genes that encode proteins and enzymes that synthesize colored products, or that contribute significantly to the milieu required for a colorimetric reaction to proceed.
- the expression of the reporter gene may be enhanced by a method whereby the result can be visually or spectrophotometrically detected.
- gene products or gene fusion products that produce antibodies, fragments of antibodies, antigens, purification tags in the form of proteins, protein domains or peptides that can be expressed on the cell surface and that can be used to remove from a mixed culture those cells expressing such gene products or gene fusion products thereby enriching the remainder of the culture with cells that repress the expression of these genes.
- Preferred genes for the creation of such separation proteins that are under the transcriptional control of the to be identified DNA binding protein variants and that can be expressed and located to the E. coli outer membrane and are therefore of interest for separating repressed from non-repressed expression are the E.
- the reporter and separator genes demonstrate no negative selective pressure for growth or survivability of the cell under the conditions used to discriminate expression from repression.
- An example of a useful reporter gene is the Escherichia coli lacZ gene encoding ⁇ - galactosidase.
- the proper medium for example that contains the colorimetric lactose analog, Xgal (5-bromo-4-chloro-3-indoyl- ⁇ -D-galactopyranoside), the expression and repression of lacZ gene expression can be visually or spectrophotometrically discriminated, hi the absence of a critical amount of ⁇ -galactosidase expression, i.e. under repressed lacZ gene expression conditions, Xgal remains (largely) unhydrolyzed and (largely) colorless.
- lacZ gene expression is not repressed and ⁇ -galactosidase is sufficiently produced, Xgal is hydrolyzed to galactose and an indoxyl-derivative, the latter of which is then oxidized by air to a blue indigo dye that is easily detected visually or spectrophotometrically.
- Xgal is hydrolyzed to galactose and an indoxyl-derivative, the latter of which is then oxidized by air to a blue indigo dye that is easily detected visually or spectrophotometrically.
- the entire lacZ gene as the reporter gene
- one preferred embodiment uses the truncated version of this gene, the lacZ' gene, together with the appropriate lacZ M15 mutated ⁇ -galactosidase gene expressed from the host cell chromosome in the process known as -peptide complementation to achieve the same results.
- DNA binding protein gene as the starting point for the generation of DNA binding protein variants is in principle open to any DNA sequence that encodes an expressible protein.
- genes for producing DNA binding protein variants are known and may be used.
- Regulatory DNA binding proteins can be categorized into at least four known major groups based on typical structures observed in the three dimensional representation of DNA binding proteins. Embodiments of the invention include the generation and/or modification and use of known proteins in each class. Embodiments of the invention include using known sequences of proteins of these classes and conserved amino acid substitutions from these sequences as starting sequences for new and useful binding proteins. Variations in the native sequence can be made using any of the techniques and guidelines for conservative and non- conservative mutations as for example set forth in U.S. Pat. No. 5,364,934.
- a first class of proteins that are particularly useful for practice of the invention contain a motif called the helix-turn-helix motif (HTH, Brennen and Mathews, 1989; Pabo and Sauer, 1984) having ⁇ -helices that pack against one another.
- the helices are joined by a ⁇ -turn or a more extended loop structure, and have been observed to directly or indirectly interact with DNA sequences through side-chains of at least one of the helices.
- the helix-turn- helix motif is not a stable folding unit within the protein but is integrated into a 60 to 90 amino acid residue long domain.
- the protein structures outside of the HTH-motif within these domains may differ in structure from one HTH-containing protein to the next.
- HTH-motif although displaying minimal amino acid homologies, often show similar relative positions of their ⁇ -carbon atoms.
- the second helix of the HTH motif (the recognition helix) is positioned in the domain such that this helix is adjacent to the major groove of the DNA.
- the side-chains of this recognition helix have been seen to interact with specific bases of DNA binding sequences through hydrogen bonding, so-called hydrophobic interactions as well as by complementary van der Waal's surface interactions.
- HTH binding motifs may exist in dimeric DNA binding proteins or in monomeric DNA binding proteins. Homodimeric DNA binding proteins possessing HTH motifs bind palindromic or partially palindromic DNA sequences.
- the HTH DNA binding protein motif and variations thereof can be found in both eukaryotic and prokaryotic organisms and is exemplified by such prokaryotic proteins such as ⁇ cro, ⁇ repressor, catabolite activating protein CAP, lac repressor, 434 repressor, 434 cro and others and by such eukaryotic proteins such as any of the homeodomain proteins like antennapedia, NK-2 / vnd, and the POU-specific domain containing proteins and others.
- a second class of DNA binding motifs is one in which one or more zinc ions is a structural component of the DNA binding domain, i.e., the zinc-containing DNA binding proteins.
- a typical motif of this class is the zinc-finger motif.
- a zinc-ion is coordinated by cysteine residues or cysteine and histidine residues of the protein and results in a structure resembling a finger that interacts with the DNA in a sequence specific manner.
- DNA binding proteins possessing a zinc-finger motif are exemplified by Zif, EGR1, EGR2, GLI, Wilson's tumor gene, Spl, Hunchback, Kruppel, ADR1 and BrLA proteins and others.
- Structural variations of the zinc-finger motif that also can be classified as zinc-containing motifs, with additional finger structures as exemplified by the glucocorticoid receptor or may contain binuclear zinc ion centers such as seen in the yeast GAL4 protein.
- a third major class of known regulatory DNA binding proteins are proteins that contain a leucine zipper motif.
- This structural motif is involved in the dimerization of leucine zipper motif containing proteins.
- the leucine zipper motif generally comprises an ⁇ -helical structure having several leucine residues (typically up to five) spaced periodically through the helix (usually every seventh consecutive residue). This repeating structure within an ⁇ -helix results in orientation of leucine residue at a similar position on the face of the helix every second consecutive turn of the helix.
- the interface of two such juxtaposed leucine zipper helices from two separate polypeptide chains results in complementary hydrophobic interactions between the helices that can stabilize the protein dimer formed.
- the leucine zipper class of DNA binding protein motifs is exemplified by several subclasses characterized by additional motifs within the subclasses.
- Examples of such subclasses of leucine zipper DNA binding proteins are the b/zip proteins GCN4, C/ERB, fos, jun, myc and others, the basic helix-loop-helix (b/HLH) proteins exemplified by the MyoD protein, and the basic helix-loop-helix zip proteins (b/HLH/zip) exemplified by the MAX protein.
- a fourth, somewhat more diverse class of regulatory DNA binding proteins is characterized as having ⁇ -sheet structures that contribute to DNA binding.
- members from this group are the TATA binding protein (TBP), a general eukaryotic transcription factor that interacts with the minor groove of TATA box DNA through the factor's ⁇ -sheet structures, the prokaryotic Met repressor, the eukaryotic tumor suppressor p53 protein and the specific transcription factor NF- ⁇ B protein.
- Advantageous embodiments of the invention utilize genes that code for DNA binding proteins that influence gene transcription. Particularly advantageous are genes or gene sequences that encode bacterial repressor proteins and/or fragments. Of the four different groups of DNA binding proteins enumerated above in this context, the DNA binding proteins that contain helix-turn-helix motifs are particularly preferred. However, it is also possible to use other sequences that encode zinc-containing proteins, leucine zipper containing proteins or members of other types of DNA binding proteins. Sequences of these proteins are known to skilled artisans and are not repeated here due to space restrictions. Embodiments of the invention include the use of known sequence from each class.
- An advantageous embodiment of the invention uses a gene encoding a cro protein based on the homodimeric 434 cro protein and a second desirable embodiment uses a homeodomain based on the monomeric NK-2 homeodomain protein from the Drosophila melanogaster vnk gene.
- DNA binding proteins from humans.
- specific problems can be approached using species specific binding proteins.
- the methods encompass the use of specific animal and plant DNA binding proteins, as well as those from free-living as well as infective micro-organisms (including viruses). Because the desirable property of protein binding to DNA exists even in smaller portions of the protein, partial gene sequences which code for those portions also may be used.
- the invention is particularly useful in the field of agriculture. The creation and use of
- DNA binding proteins that recognize specific DNA sequences such as, for example, transcription control molecular affecting virus genes, plant growth genes, senescence genes, fruiting genes, carbohydrate metabolism genes, and other genes, is particularly contemplated. Although the DNA binding activity of proteins as described herein often leads to decreased synthesis of one or more proteins, a skilled artisan will appreciate that increases in individual protein production also are possible.
- proteins that can be produced at increased levels utilizing the present invention include, but are not limited to, nutritionally important proteins; growth promoting factors; proteins for early flowering in plants; proteins giving protection to the plant under certain environmental conditions, e.g., proteins conferring resistance to metals or other toxic substances, such as herbicides or pesticides; stress related proteins which confer tolerance to temperature extremes; proteins conferring resistance to fungi, bacteria, viruses, insects and nematodes; proteins of specific commercial value, e.g., enzymes involved in metabolic pathways, such as EPSP synthase.
- DNA encoding regulatory elements and encoding protein are known to the skilled worker in that field, as exemplified by U.S. No. 5J02,933, issued to Klee et al., and other representative citations in that publication. Regulation Through Binding Between Protein and DNA
- Embodiments of the invention utilize binding between protein and DNA. As will be appreciated by a skilled artisan, a variety of binding interactions have been discovered and are useful for these embodiments.
- a DNA target or other DNA according to embodiments of the invention include not only the specific sequence listed but also similar sequences that are homologous to the sequence.
- DNA homology is determined routinuely by a skilled artisan.
- a DNA sequence that is 50% homologous to a cognate binding sequence of 8 base pairs long will have an identical match for any 4 of the bases when the two sequences are lined up side by side.
- this DNA binding protein is made up of two identical monomers comprised of 71 amino acid residues each that are folded into a single domain having 5 ⁇ -helices.
- Helices 2 and 3 (numbered from the N-terminus to C-terminus) form HTH motifs, with the first and fourth helices packing against the HTH to create a hydrophobic core.
- Interactions between the monomers are formed by protein-protein interactions from structures of the C-temtinal end of the monomers, specifically in helices 4 and 5 and loop structures between helices 3 and 4 (Mondragon et al, 1989; Harrison and Aggarwal, 1990; Mondragon and Harrison, 1991 and Padmanabhan et al., 1997).
- the second helix of the each of the HTH motifs of the monomers (helix 3 of 434 cro) is found to sterically fit into the major groove of the DNA binding sequence.
- the two recognition helices of the homodimer are separated by a distance that allows them to fit into the major grooves of a consecutive turn of the DNA double helix.
- the specific DNA sequences that bind with highest affinity to wild-type 434 cro protein form the operators of a regulatory genetic switch that participate in the regulation of lytic or lysogenic life-cycles of the bacteriophage (Ptashne, M. The Genetic Switch). These operators are named OR1, OL1, OR2, OL2, OR3 and OL3 from their positions within the bacteriophage genome.
- the cro protein from 434 binds the OR3 operator sequence with highest affinity, followed by that of the OR1 sequence.
- the specific operator control sequences for 434 cro are partially pahndromic DNA sequences of approximately 14 base pairs in length that to varying degrees possess palindromic base sequences in the first and last four bases of the operator DNA.
- the consensus sequence for the palindromic part of these operators is 5' ACAANNNNNNTTGT-3' (where N is a nonpalindromic base).
- the OR3 operator is an exception to the palindromic consensus and possesses a single 5 -ACAG-3' half-site (Koudelka and Lam, 1993; Bell and Koudelka, 1995).
- the surfaces of the amino end of the recognition helices that face the DNA of the major groove form complementary binding surfaces with the DNA operator regulatory sequence. Interactions through these complementary surfaces include hydrogen bonding between amino acid residue side chains and bases of the DNA binding sequence, hydrophobic interaction surfaces, van der Waal's surface complementarity and ionic interactions between protein and DNA of the operator.
- the DNA in the complex is bent with respect to standard B- DNA.
- lysine 27 and serine 30 interact with the sugar phosphate backbone of the operator.
- Glutamine 28 can form one or two hydrogen bonds between its sidechain amide carboxyl group and the N6-amino of the adenine base of the first operator base-pair and/or an amide NH and the lone pair of the N7 of adenine 1.
- the second residue of the recognition helix, glutamine29 can from a hydrogen bond with the 6-oxa group of the guanine base of operator base-pair two.
- Base pair three contacts with the protein are of an hydrophobic nature with the thymine methyl group fitting a pocket constructed from the methylene groups of the side-chains of lysine27 and glutamine29.
- Base-pair 4 is the nonconsensus base-pair of the OR3 operator and is therefore implicated in both binding specificity and affinity differences between 434 repressor and 434 cro proteins.
- Target DNA in many embodiments are regulatory sequences which interact with DNA binding protein(s) to cause a change in gene expression.
- a few examples of such sequences include genes that are substantial or essential for the establishment or maintenance of a disease or disease state, i.e., a gene essential for an infectious state, a toxin, and/or the survival and/or replication of the causative agent of the disease, or genes which encode various traits and/or functions of plants, animals or other organisms.
- Causative agents of disease are microorganisms such as viruses, bacteria, parasites like trypanosomes, protozoan, and plasmodia as well as higher organisms and including cells of the human body, especially those that are of a degenerative, transformed or have otherwise undesirable traits or characteristics, such as those of malignant or benign tumors, lymphomas, myelomas, carcinomas, plant viroids and the like.
- Advantageous target sequences are those that are evolutionarily conserved, highly conserved or relatively highly conserved, for examples of the latter, sequences of the HIV-1 long terminal repeat regions in general and U3 region in particular.
- target sequences from human immunodeficiency virus types 1 and 2, human papilloma viruses, breast, prostate, ovarian, liver, lung, spleen, muscle, cancer cells, plant viruses, plants and the like.
- Palindromic or partially palindromic target sequences are preferred when the desired DNA binding protein variant is a member of the homodimeric proteins.
- Nonpalindromic target sequences are preferred when a monomeric or heterodimeric DNA binding protein variant is desired.
- a target sequence is cloned into a position adjacent to a reporter gene and/or separating gene such that the target can then function as an operator sequence for the regulation of the gene expression in for example a bacterial system using a DNA binding protein as repressor.
- the gene both reporter or separating
- the gene is genetically neutral. That is, a protein gene is chosen such that,upon up-regulation or down-regulation does not strongly affect cell growth or survival. Examples of such genes include such reporter genes as lacZ and lacZ' derivatives, intrinsically fluorescent proteins such as the green fluorescent protein and derivatives thereof and the luciferase enzyme, and separator genes such as the E.
- Strep-tags protein sequence, W “X” H P G F “Y” “Z”, in which "X” represents any desired amino acid and "Y” and “Z” either both denote Gly, or "Y” denotes Glu and "Z” denotes Arg or Lys
- His- tags protein sequences composed of a minimum of 5 consecutive HIS residues
- FLAG-Tag protein epitope sequence protein sequence DYKDDDK, TP Hopp, KS Prickett, V Price, RT Libby, CJ March, P Cerritti, DL Urdal, PJ Conlon. BioTechnology 6:1205-1210, 1988
- the HA epitope protein sequence YPYDVPDYA, HL Niman, RA Houghten, LA Walker, RA
- Glu-Glu epitope protein sequence EEEEYMPME, T Grussenmeyer, KH Scheidtmann, MA Hutchinson, E Eckhart, G Walter. Proc. Natl. Acad. Sci. USA 82:7952- 7054, 1985; B Rubinfeld , S Munemitsu, R Clark, L Conroy, K Watt, W Crosier, F McCormick, P Polakis. Cell 65: 1033-1042,1991)
- KT3 epitope protein sequence PPEPET, H MacArthur, G Walter. J. Virol.
- IRS epitope protein sequence RYIRS, TC Liang, W Luo, JT Hsieh, SH Lin. Arch. Biochem. Biophys. 329:208-214, 1996; W Luo, TC Liang, JM Li, JT Hsieh, SH Lin. Arch. Biochem. Biophys.
- genes that encode the following proteins are particularly desirable for separators and/or reporters as being genetically neutral: lacZ, lacZ', green fluorescent protein, luciferase, lamB, K88as pilin, K88ad pilin, TraT, PhoE, OmpA, OmpC, OmpF, OmpF, BtuB, OmpA-lipoprotein fusion, Strep-tag, His-tag, FLAG-Tag epitope, HA epitope, c-myc epitope, AUl epitope, AU5 epitope, Glu-Glu epitope, KT3 epitope, IRS epitope, BTag epitope, protein kinase C epsilon (Pk) epitope and the Vesicular Stomatitis Virus (VSV) epitope.
- lacZ lacZ'
- green fluorescent protein luciferase
- lamB LamB
- K88as pilin K88ad pilin
- Toxins such as those produced from the S, R and Rz genes of bacteriophage lambda, the gene E protein from bacteriophage phi-x 174, nutritional and chemical resistance genes, genes that metabolize growth inhibitory substances to substances that do not inhibit growth and vice versa, genes that determine a resistance to lytic bacteriophage infections, for example, antibiotic genes, galT,K, tetA, lacZ+ (when used to generate toxic metabolites), pheS, argP, thyA, crp, pyrF, ptsM, secA, malE, ompA, btuB, lamB, tonA, cir, tsx, aroP, cysK, and dctA.
- Toxins such as those produced from the S, R and Rz genes of bacteriophage lambda, the gene E protein from bacteriophage phi-x 174, nutritional and chemical resistance genes, genes that metabolize growth inhibitory
- the combination of promoter sequence, target operator sequence and choice of reporter gene and separator gene used in specific experiments affects the strength of expression of the reporter gene and separator gene.
- the strength of the reporter and/or separator gene expression also may vary as assayed, for example, by the enzymatic activity of the reporter gene itself or by the quantity of separator gene product available on the outer surface of the cell for binding to the separating medium. Accordingly, it is desirable for optimizing identification of cells exhibiting a repressed phenotype due to a DNA binding protein variant binding to the target operator sequence, to balance the strength of gene expression levels with the specific reporter and/or separator genes as well as with the operator used.
- One exemplary embodiment for the discovery of balanced gene expression of the reporter gene and/or separator gene for the identification of cells having repressed phenotypes for these genes utilizes, in preliminary experiments, combinatorial mutagenesis of the minimal promoters used for reporter gene and/or separator gene transcription. Additionally or alternatively, combinatorial mutagenesis of the sequences of the ribosomal binding site through the start codon used for reporter gene and/or separator gene translation can be utilized.
- cells expressing a reporter gene and/or separator gene construct that has mutated transcriptional or translational control sequences that are expressed in the unrepressed states are compared to cells containing optimally balanced promoter-target operator-translational control sequence-reporter and/or separator gene constructs for reporter gene and separator gene expression.
- Such gene expression is comparably assayed in an advantageous embodiment through, for example, the analysis of reporter gene activity measurements and separator gene product surface expression.
- mutagenized separator gene constructs are assayed for their abihty to bind separator medium similarly and for the ability to be released from separator medium similarly to known and balanced separator gene expression.
- the identification of such balanced gene expression in newly created constructs is important for the optimal identification of repressed phenotypes.
- the identification of DNA binding protein variants that bind to the desired target operator sequences and not to sequences unrelated to the desired target sequences can be improved by design considerations of the genetic constructions used.
- the target operator DNA sequences need to be placed within a maximal distance from the +1 position of the promoter so that repression of transcription will be achieved.
- the monomeric DNA binding protein variants can be directed to the approximate location of the desired operator by fusion of the coding sequences of the mutagenized DNA binding protein with a second DNA binding domain having a known binding sequence specificity that differs from the desired target specificity.
- This known specificity of the second domain should be of a reduced affinity such that repression of the reporter gene and/or separator genes does not occur by the second domain when used alone, hi this embodiment, the desired target operator is cloned adjacent to the known operator of the second DNA binding domain.
- the known operator should optimally be more than 10 basepairs away from the +1 position of the promoter used for reporter and separator gene transcription.
- the variants of the mutagenized DNA binding protein that bind the desired target operator sequence are assisted to the desired target sequence by the second domain binding to its binding sequence. Variants that bind with high affinity to the desired target DNA sequence are found that repress reporter and separator gene expression.
- Site directed or cassette mutagenesis techniques that induce mismatches in the known DNA binding sequence of the assisting domain can be used to reduce, balance or otherwise achieve optimal repressible transcription activities.
- a DNA sequence encoding a DNA binding protein is then mutated so that a large collection of different mutations and combinations of mutations are generated.
- Different collections of mutations and combinations of mutations can be constructed in specific regions of the protem known to have an influence on the DNA binding properties of the protein as well as in regions not directly known to have an influence on DNA binding. Each of these collections is for the purposes of this description of invention termed a combinatorial mutational library.
- These mutational libraries can be constructed such that they have varying complexities, from several tens of thousands of mutations and combinations of mutations to millions or billions of such combinations.
- a multitude of different DNA binding protein variants are created that can bind different DNA sequences with differing binding affinities. Those mutations and combinations of mutations that are able to bind to the target operator DNA sequences can thereby regulate the expression of the adjacent reporter or separator gene by influencing transcription.
- a quantity of the separator gene product may be expressed on the cell surface and can bind a component of the separation media. Through this binding a cell that expresses unrepressed or non-transctivated levels of separator gene product on its surface may be removed or separated from those displaying repressed or transactivated expression.
- cells that contain DNA binding protein variants that bind the desired DNA binding sequence are selected through the resultant activity of the reporter gene product on the final cell culture of cells enriched for repressed or transactivated reporter gene and separator gene phenotype(s).
- An advantageous embodiment of the invention has the reporter and separator genes cloned together as an operon with the target selection DNA binding sequence and minimal promoter sequence on one plasmid vector.
- a DNA binding protein that is mutated into a combinatorial library on a second plasmid vector is expressed together with the first plasmid in a bacterial cell.
- the plasmids then are transformed sequentially into a host cell where preferentially, the separation reporter gene plasmid is first transformed into the host cell followed by transformation of the resultant cells with the combinatorial library expressing the DNA binding protein variants.
- the host cell is any cell that can replicate and express the reporter gene, separator gene and DNA binding protein variants and that is capable of showing a repressed phenotype for the reporter gene and separator genes.
- An advantageous host cell is the Escherichia coli strain DH5 ⁇ (Life Technologies, Inc., Gaithersburg, MD).
- the protein-coding sequence of the DNA binding protem can be mutated via several known methods. These methods can be random or may use targeting to specific regions of the regulator gene. Especially preferred as an embodiment of this invention are in vitro mutagenesis methods. In these methods, isolated DNA composing parts or all of the DNA binding protein can be mutagenized at specific positions within the gene. Especially preferred for the mutagenesis are the use of mutagenic DNA-cassettes.
- a regulator gene can be modified by the insertion of additional nucleotide residues, especially in form from chemically synthesized oligonucleotides, as well as the deletion of nucleotide residues from the gene, as well as the incorporation of point mutations within the regulator gene.
- Combinations of multiple additions and/or deletions and/or point mutations can also be incorporated in the regulator gene.
- a preferred embodiment of this invention uses combinatorial libraries of mutations of the regulator gene, h principle, in the in vitro mutagenesis reactions, single stranded DNA can be synthesized with many mutations and combinations of mutations within the coding sequence of the regulator gene. Single stranded and/or double stranded DNA can alternatively be enzymatically, chemically or physically treated such that mutations and combinations of mutations within the coding sequence of the regulator gene are created.
- oligonucleotide primers to the single-stranded or denatured double-stranded, mutagenized DNA and the use of in vitro DNA polymerase reactions for the conversion of the oligonucleotide-primed/mutagenized DNA hybrid molecules to double-stranded DNA molecules.
- sequences of interest from the mutagenized double-stranded DNA molecules so created can then be hydrolyzed from the DNA polymerase reaction products through the use of appropriate restriction endonucleases and are thereby made available for use in subsequent cloning experiments, hi a preferred embodiment, these subsequent cloning experiments combine the so-mutagenized and restricted, mostly double-stranded DNA molecules that encode variants of the sequence of interest of the DNA binding regulator protein into a cloning vector containing the remaining parts, if any, of the DNA binding regulator protein such that the expression of the DNA binding regulator variants as proteins is assured.
- the resultant regulator DNA binding protein variants that bind a specific DNA sequence or sequences can be genetically fused to DNA sequences known to help activate or repress the transcription of a gene to be regulated in other cell types.
- protein genes that encodezinc-finger DNA binding motifs may be modified.
- libraries of altered proteins that bind DNA sequences are made based on known techniques for genetic manipulation.
- proteins and their DNA binding motifs which are known or that may be discovered in the future may be utilized as starting material for embodiments of the invention.
- US patents 6,013,453 and 6,242,568 show DNA sequences of mutational libraries that encode zinc-finger DNA binding motifs for new DNA binding proteins that bind to desired DNA regulatory sequences.
- the DNA mutational libraries of these zinc- finger protein variants can be used to identify protein species that bind to specified DNA sequences.
- a randomized library of zinc-finger sequences may be examined by binding with one or more DNA sequence triplets.
- randomized zinc-fingers may be positioned between, or next to, two or more zinc-fingers that have defined sequence and binding specificities. These procedures can determine preferred target DNA sequences for the randomized fingers. In this way, new zinc-finger proteins with having multiple fingers can be constructed with novel specific DNA sequence binding characteristics and are useful for practice of embodiments of the invention.
- sequences, materials and methods taught in these patent specifications are particularly included by reference.
- DNA sequence specific binding variants are fused to transcriptional activator domains important in the activation of prokaryotic and especially eukaryotic transcription.
- DNA sequences coding for protein domains associated with the inhibition of DNA transcription can be genetically fused with the DNA binding specific protein variants so that transcription of genes for example in eukaryotic cells can be repressed.
- Additional DNA sequences that encode protein domains or signal sequences useful for the targeting of a protein to a certain cellular compartment including extracellular compartments, can be fused to the resultant protein-encoding sequences.
- An important therapeutic use is, for example the cloning of DNA sequences that encode regulator protein variants that are discovered with these methods and that bind to DNA sequences found in the long terminal repeat region of HIN-1 and additional fusions of these regulatory variants in hematopoietic stem cells.
- a multitude of such regulator variants can be used that recognize many different variations of these long terminal repeat sequences that could arise by mutation of the long terminal repeat D ⁇ A sequences of the HIN-1 virus.
- an HIN-1 virus infects such a genetically-modified lymphocyte, then the transcription of the viral genome and or parts thereof that are dependant on the HIN long terminal repeat sequences will be inhibited due to the presence of the long terminal repeat D ⁇ A sequence-specific transcriptional repressor protein(s). The replication of the virus will be thereby inhibited.
- the so-modified lymphocytes will remain viable and active and will be further available for immune function.
- a further use of the invention is the use of proteins or protein domains derived from the so- identified D ⁇ A-binding specific regulators as therapeutic agents.
- a further important potential therapeutic use of the invention is, for example, the identification of regulators that inhibit the expression of genes that are essential for tumor growth or survival or that activate the expression of tumor-suppressor genes or genes that activate cell death - apoptosis programs.
- the D ⁇ A encoding such regulators for tumor genes or tumor suppressor genes can be delivered to the tumor cells by gene delivery systems of viral or nonviral types or of microbiological nature.
- the expression of such genes within the tumor cells of the patient should inhibit the growth and replication of the tumor cells.
- a further use of the invention is the use of proteins or protein domains derived from the so-identified DNA- binding specific regulators as therapeutic agents.
- a further important use of the invention is the creation and identification of regulators of gene expression, either as repressors or activators, for genes of interest in what has become known as target validation studies. In these studies, it is of interest to identify and use such regulators for the repression or activation of genes and gene products that are of interest to the pharmacological industry. By the use of such regulators in cell and organism studies, the influence of the repression or activation of the specific gene under study on other related and unrelated genes and gene products can be observed. Such observations can take the form of for example genome-wide, or selective gene set expression studies, for example through DNA array technologies, through northern or western analyses, and through other such technologies.
- a further use of the invention is in the area of target discovery studies, hi such studies, combinatorial libraries of DNA binding protem domains of repressor or activator regulatory genes can be inserted using molecular biological gene transfer methods into cell or other assay systems that have phenotypes that are desired to be affected.
- the action of specific repressor or activator construction variants is compared using the phenotype of interest to control experimental cells not having a DNA binding domain in the otherwise identical regulator construction.
- Cells displaying a DNA binding protein variant-dependent desired change in phenotype are investigated further.
- the specific DNA binding protem variant responsible for the phenotype change is isolated and its gene is sequenced.
- the effects of the specific variant are then characterized using genome-wide, or selective gene set expression studies, for example through DNA array technologies, through northern or western analyses, and through other such technologies in order to discover the gene(s) responsible for the phenotypic changes.
- the invention also encompasses a kit for the construction and identification of such DNA sequence specific DNA binding protein variants.
- the kit contains a reporter / separator gene plasmid as well as DNA binding protein expression plasmids and mutational cassettes for the construction of mutational libraries of the DNA binding protein (see Figures 1 through 23).
- the invention is further illustrated by the following examples, which are meant to illustrate embodiments and not to limit the claims in any way.
- Example 1 Use of a screening plasmid with a / cZ-derived reporter gene for the identification of 434 cro variants that bind HIN-1 target sequences.
- Plasmid pP2HINl was constructed from synthetic D ⁇ A, and from D ⁇ A derived from the vectors, pACYC184, ⁇ UR222, ⁇ UC119 and pUC4KA ⁇ . This plasmid was used to screen 434 cro DNA binding protein variants expressed from a repressor plasmid library (described below) to DNA target sequences derived from HIN-1 D ⁇ A (GenBank Sequence AF096643J, bases 373 to 394) in in vivo screening experiments.
- a synthetic double stranded oligonucleotide cassette was created from oligonucleotides having the following 5' to 3' (upper strand) D ⁇ A sequence (SEQ ID NO: 1): 5'TCGGGAAAGATCTAAGTTAGTGTATTGACATGATAGAAGCACTCTACTATATTCC TAGGAGATGCTGCATATAAGCAGCTGCTGGTACCAAGTTCACGTTAAAGGAAACA GACCATGACGCGTATTACG-3'.
- the first base of this sequence is arbitrarily assigned the base number 1 of the pP2HINl plasmid.
- This cassette encodes a Bgl ⁇ l restriction site followed by an optimized transcription promoter, an Styl restriction site, the HJV-1 target sequence, a Kpnl restriction site, a 13 base pair spacer, an optimal Shine-Dalgarno ribosome binding site (AGGA) followed by an 8 base pair spacer and a translation initiation start sequence (ATG).
- the synthetic cassette in ⁇ P2HIVl is followed by 12 base pairs of protein coding D ⁇ A (5 ⁇ CGCGTATTACG3') that is fused to 22-basepairs of lacZ '-derived D ⁇ A from the vector pUR222 (bases 1857 to 1835 of GenBank sequence L09145J).
- This D ⁇ A is followed in pP2HIVl by additional t ⁇ cZ-derived D ⁇ A from vector pUC119 (GenBank sequence UO7650J, bases 285-451).
- the pUC119 derived D ⁇ A of pP2HIVl is then followed by 1941 base pairs of D ⁇ A derived from pACYC184 (GenBank Sequence X06403J, bases 3946 to 4245, base 1 to 1521).
- the pACYC184-derived D ⁇ A of pP2HINl is fused to the kanamycin resistance region of vector pUC4KA ⁇ (GenBank sequence XO6404J, bases 404 to 1673).
- the synthetic P2 promoter region of pP2HINl, bases 8-52 was optimized for screening blue white phenotypes using 434 cro- derived repressors in E. coli DH5 ⁇ using Xgal containing media IM2.
- the IM2 medium contained per liter 10 g bactotrypton, 2 g yeast extract, 5 g ⁇ aCl, ⁇ aOH to pH 7.0, 12 g Agar, 0.8 ml 50 mg/ml ampicillin, 1.0 ml 30 mg/ml kanamycin, 2.5 ml 2% 5-bromo-4-chloro-3-indolyl- ⁇ -D-galactopyranoside previously dissolved in dimethylformamide and 0.5 ml 1M isopropyl- ⁇ -D-thiogalactopyranoside.
- the promoter of pP2HTvl can be removed and replaced by other synthetic promoters with other characteristics using the unique BglK and Styl restriction sites.
- the target DNA sequences can be synthesized from synthetic oligonucleotides and can be exchanged using the unique Styl and Kp ⁇ sites of pP2HINl.
- Plasmid pP2null is identical in sequence with plasmid pP2HINl except that the HIN1 -derived target sequence of pP2HINl has been deleted.
- Plasmid p434cro2 was used to create combinatorial mutation libraries of the 434 cro gene and to express these protein variants in E. coli cells containing the pP2HINl screening plasmid. Plasmid p434cro2 is based on the pUC119 cloning vector (GenBank sequence UO7650J).
- Plasmid p434cro2 was constructed as follows. A synthetic gene encoding a Shine- Dalgarno ribosome binding sequence followed by a 434 cro protein encoding sequence optimized for expression in E. coli was synthesized from four oligonucleotides. The gene included unique restriction sites for the replacement of the D ⁇ A encoding the HTH region of 434 cro and was made double-stranded using a T4 D ⁇ A polymerase reaction, restricted with Hmdlll and ⁇ coRI and cloned into Hzndill and ⁇ coRI-digested pUC119 D ⁇ A. The synthetic 434 cro gene has the sequence shown in Figure 1 (S ⁇ Q ID NO: 2) .
- the partial coding sequence of the lacZ' gene of the latter was removed by ⁇ coRI and Kasl digestion, followed by T4 DNA polymerase filling reaction and re-ligation.
- the resulting 3197 base pair plasmid, p434cro2 was used for the construction of combinatorial mutation libraries of the 434 cro gene.
- oligonucleotide identical in sequence with the DNA between bases 315 and 383 of p434cro2 that included the Sad and Bst ⁇ II restriction sites of p434cro2 was synthesized with NNS mutagenic codons in several positions of the DNA that encodes the recognition helix of 434 cro.
- This mutagenic oligonucleotide was annealed to an oligonucleotide primer complementary to its 3' end and filled in using a T4 DNA polymerase reaction. After restriction with Sad and BstEil, the resultant synthetic double stranded cassette was ligated into S ⁇ cl and R ⁇ tEU-cut ⁇ 434cro2 DNA.
- the re-ligated combinatorial p434cro2 preparation was electroporated into DH5 alpha E. coli. Samples were analyzed at this point to assure that the complete library was represented in the transformed cell preparations. The cells were grown at 37° in LB media without ampicillin for one hour and then ampicillin was added to 50 microgram/ml. The cells were then grown for an additional 8 hours to amplify the plasmid DNA. The plasmid DNA was then isolated by conventional procedures.
- D. Screening combinatorial libraries in p434cro2 for targeted DNA binding The DNA sequence of pP2HTVl used in this example is shown in Figure 2 (SEQ ID NO: 3). The DNA sequence of p434cro2 is shown in Figure 3 (SEQ ID NO: 4). E. coli DH5 alpha cells containing the pP2HIVl plasmid were made competent by conventional methods and were subsequently transformed with the 434 cro combinatorial library in p434cro2 DNA. The cells from the transformation were then plated on IM2 media containing and incubated at 37° until colony diameters were between 0.8 and 1.2 mm. The resultant colonies were then optically screened for repression of lacZ' transcription.
- a plasmid, pComp is constructed from plasmid pP2HINl, synthetic D ⁇ A and D ⁇ A derived from the Escherichia coli genome.
- the plasmid is used to select and screen 434 cro D ⁇ A binding protein variants expressed from a repressor plasmid library to D ⁇ A target sequences derived from the cauliflower mosaic virus 35S promoter (Rogers, S.G., Klee, H.J.,
- the first 180 codons of the outer membrane protein ompA in plasmid pComp are isolated from PCR experiments performed with Escherichia coli genomic D ⁇ A1
- This ompA gene fragment encodes the first 159 amino acid residues of the mature ompA protein including its ⁇ -terminal signal peptide fused to a synthetic D ⁇ A cassette that encodes a streptag peptide sequence.
- the tagged-ompA fusion protein coding sequence is followed in the plasmid by a lacZ' derived sequence that encodes an ⁇ complementation peptide from the enzyme ⁇ - galactosidase.
- Both the "tagged" ompA fusion protein and the lacZ' ⁇ -peptide are expressed as a polycistronic messenger R ⁇ A and are under the transcriptional control of a P2 promoter.
- a transcriptional terminator sequence synthesized from oligonucleotides based on the transcriptional terminator from the E. coli genome unc operon is inserted into the plasmid after the lacZ' fragment.
- a target D ⁇ A sequence derived from the cauliflower mosaic virus 35S promoter (bases 271 to 287 of GenBank file X04879, (Rogers et al ibid.) is positioned between the promoter and ompA-fusion protein in a position where functional operator-repressor interactions are known to occur.
- the cauliflower mosaic virus 35S promoter target sequence was identified using a computer program that searches DNA sequences for perfect or imperfect palindromic sequences of a definable length. In the case of the operator target sequence used in pComp, two overlapping 14 base pair targets adjacent to the general transcription factor binding site TATA box of the 35S promoter were identified that possessed imperfect palindromic sequences, the outer four bases of which show 75% palindromicity.
- the DNA sequence of the cauliflower mosaic virus 35S promoter target used in pComp is given in Figure 4.
- the DNA sequence of the pComp plasmid is given in the Figure 5.
- Such targets that are particularly relevant to plant systems and that may bind and compete with other general or specific transcription factors for their binding sites and/or DNA binding sequences that are distinct from those of known transcription factors may alternatively be used.
- Such specific transcription factor binding sites in plants systems are exemplified but are not limited to the myb family, for example the MYB.PH3 transcriptional activator proteins (Solano, R., Nieto, C, Avila, J., Canas, L., Diaz, I., Paz-Ares, J. 1995 EMBO J. 14:1773- 1784), the G-box family, for example the Arabidopsis transcription factor GBF-1 (Schindler, U., Terzagi, W. Beckmann, H. Kadesch, T.
- O2 as exemplified by the Opaque-2 transcriptional activator of maize (Maddaloni, M., Donini, G., Balconi, C, Rizzi, E., Gallusci, P., Forlani, F., Lohmer, S., Thompson, R., Salamani, F., Housing, M. 1996 Mol. Gen. Genet. 250:647-654; Izawa, T., Foster, R., Chua, N.-H. 1993 J. Mol. Biol. 230: 1131-1144), the Athb-1 family (Sessa., G., Morelli, G., Ruberti, I. 1993 EMBO J.
- the cro repressor expression plasmid p434cro2 is modified to create pP2croT. This modification lowers the probability of selecting cro repressor variants that repress the expression of the lacZ' and ompA-fusion proteins of pComp by binding to the P2 promoter instead of the target operator DNA sequence.
- the modification is carried out by replacing the lacP promoter of p434cro2 that drives the expression of the cro repressor library with the relevant promoter sequence used in pComp.
- pP2cro This can be achieved by digesting the p434cro2 plasmid with HindUl and Pvu ⁇ restriction endonucleases and Hgating the resultant vector fragment with a synthetic DNA cassette encoding the P2 promoter that can be assembled from the oligonucleotide sequences shown in Figure 6.
- the resultant plasmid is named pP2cro.
- DNA sequence different from that used in the pComp plasmid as target operator can be included at the operator position of the pP2croT plasmid to allow counter-selection against possible repressor variants that might bind to an undesired DNA sequence.
- ⁇ P2cro can be modified such that the N-terminus of the expressed cro variants are fused with the SN40 T- antigen rnonopartite ⁇ LS having the protein sequence, P K K K R K N.
- the fusion of the ⁇ LS into pP2cro can be achieved by inserting the SN40 T-antigen rnonopartite ⁇ LS into the third codon of the 434 cro gene using a synthetic oligonucleotide cassette encoding D ⁇ A between the HmdHI and AflU restriction sites of pP2cro that included the SN40 T-antigen encoding sequence.
- the D ⁇ A sequences of these oligonucleotides are given in Figure 7.
- the resultant plasmid is named pP2croT.
- the complete sequence of plasmid ⁇ P2croT is shown in Figure 8.
- Combinatorial libraries of mutants of the cro repressor can be constructed in plasmid pP2croT using synthetic oligonucleotides encoding the D ⁇ A between the Sac ⁇ and RstEII restriction endonuclease sites of the plasmid. Except where the amino acid sequence is to be varied, as indicated in the example below, these oligonucleotides preserved the coding sequence of the cro protein variant expressed from pP2croT.
- An example of a library for use in selecting D ⁇ A binding variants of the 434 cro protein varies the amino acids present at the positions corresponding to K27, Q28, Q29 , S30 and L33 of the cro protein variant of pP2croT
- ⁇ S codons in the oligonucleotides for the unique codons in the respective positions in the cro gene.
- Other codon combinations as well as mutations at other positions can be made.
- the synthesized mutagenic oligonucleotide can be primed with two oligonucleotides for in vitro DNA synthesis reactions using T4 DNA polymerase and dTTP, dGTP, dCTP and dATP and appropriate buffer solutions.
- Figure 9 shows representative sequences of oligonucleotides that can be used. After extraction of the DNA with phenol/chloroform and isopropyl alcohol, the resultant double-stranded DNA cassette can be hydrolyzed with Eco91l, electrophoresed on a 3.8% Metaphor® agarose gel (FMC Corporation), and extracted from the gel using a QIAEX II® gel extraction kit (Qiagen GmbH).
- This double-stranded cassette then can be ligated into the vector containing fragment of pP2croT obtained from a Sacl / Eco911 restriction digestion reaction.
- the nominal size of the resultant pP2croT library is 3.3554432 x 10 .
- the preparation should be thoroughly desalted and electroporated into electro-competent DH5 ⁇ Escherichia coli such that at least >10 8 transformants are obtained. These cells are then pooled and allowed to grow in liquid LB ampicillin medium for 8 hours at which time the plasmid DNA is isolated from the culture.
- the purified plasmid DNA is referred to as pP2croT library 1.
- Escherichia coli cells from an appropriate strain, for example DH5 ⁇ are transformed with plasmid pComp and the cells are allowed to grow to mid-exponential stage in medium containing kanamycin.
- the cells are treated with an active protease to remove the surface exposed selection tags encoded by the ompA-tagged fusion protein of the plasmid. This is accomplished by adding the protease trypsin to the cell suspension at a final concentration of >4 mg/ml and incubating the cells for a time that can be determined in preliminary experiments that reduces the amounts of surface exposed ompA- tagged fusion proteins to levels that do not interfere with subsequent selection procedures.
- the cells treated as described above are transformed by electroporation with enough of the the pP2croT library 1 DNA to produce > 10 8 ampicillin and kanamycin resistant colony forming units.
- the cells are allowed to recover from the electroporation procedure by the addition of media that includes 0J g/1 IPTG without antibiotics for 1 hours at 37° and to grow for approximately 1 to 2 generations after addition of ampicillin to 50 ⁇ g/ml and kanamycin to 30 ⁇ g/ml.
- a preparation of magnetic particle beads (for example Dynabeads M500 subcellular® from Dynal, A.S., Norway) are coated as described by the manufacturer with a streptavidin protein, for example the variant Strep- Tactin® from IBA GmbH, Germany.
- the streptavidin protein variant coated magnetic beads should be washed free of unreacted streptavidin protem by washing at least two times with 2 ml cold buffer containing 100 mM TrisHCl 150 mM NaCl pH 8. After the final wash, the beads should be allowed to settle in a dense slurry. Excess buffer is then removed.
- StrepTactin-coated magnetic beads can be obtained from IBA GmbH. Multi-well plates are available commercially that are used in an automated variation of this approach to increase the throughput of the method.
- the E. coli population containing pP2croT library 1 and the pComp selection plasmid is harvested by centrifugation, washed once and resuspended in 2 ml cold buffer containing 100 mM TrisHCl 150 mM NaCl pH 8.
- Two hundred ⁇ l of a slurry of the streptavidin protein variant coated magnetic beads are added to the cells and allowed to incubate for at least 30 minutes.
- the tube containing the mixture then is put into the magnetic particle concentrator device (Dynal MPC®-S) and the liquid culture is removed and saved.
- Six additional, consecutive, step-wise elutions of cells bound to the beads are performed (0.5 ml buffer each containing either 0, OJ ⁇ M, l ⁇ M.
- Cro repressor gene containing plasmids are isolated by conventional molecular biological techniques from these cultures. The plasmids are then assayed after individual re- transformation into cells containing reported plasmids possessing either cauliflower mosaic virus 35S promoter DNA target operators, or reporter plasmids having no target operator DNA. Appropriate controls using cro fusion protein variants not able to bind DNA are assayed in parallel with the former samples. Variants of the cro fusion proteins are identified with these techniques that bind to the target DNA sequence from the cauliflower mosaic virus 35S promoter DNA.
- additional DNA binding protem variants are selected via binding to different target DNA sequences by individually substituting the DNA sequence given in Figure 4 in plasmid pComp by other target DNA sequences of interest, for example, from the human genome, HIV and other viral genomes, oncogenic papilloma genomes, other plant and plant- viral promoters, breast and prostate and other oncogene and proto-oncogenes and their promoters as well as others.
- Application of the techniques given in this example to the new separation-reporter plasmids results in the identification of variants of the DNA binding protein that specifically bind to these additional target DNA sequences.
- Example 3 Use of a separation-screening plasmid with offlp.4-derived separator gene and lacZ- derived reporter gene for the identification of 434 cro variants that bind cauliflower mosaic virus target sequences using alternative separation-reporter plasmids and alternative selection methods.
- a variation of the selection methods described in Example 2 that can be used to identify DNA binding protein fusions that bind new DNA sequences uses a hexa-histadine protein sequence ("His-Tag”) displayed on the surface of the E. coli cell in place of the Strept-Tag® peptide described for plasmid pComp.
- Figure 10 shows the DNA sequence of pDomp.
- This example of a selection-reporter plasmid employs a HisTag used to select for DNA binding protein variants that bind 35S promoter sequences from the cauliflower mosaic virus.
- the sequence of plasmid pDomp is identical in sequence to plasmid pComp of example 2 except for the sequence portion that defines the surface displayed tag of the ompA-fusion protein.
- a suitable combinatorial library of mutations of a DNA binding protem domain is constructed as described above for plasmid pP2croT and transformed into cells that contain plasmid pDomp.
- Cells that contain plasmid pDomp and that are competent for electro- transformation and subsequent selection and screening are prepared as described in Example 2 for cells containing plasmid pComp.
- Cells containing variants of the DNA binding protein that encode proteins able to repress the transcription of the genes for the ompA-HisTag separator and reporter proteins of pDomp can be enriched from the total population using either separation methods using Ni-ion chelation or separations using anti-histag specific antibodies.
- Enrichment procedures employing Ni-ion chelation techniques begin with the addition of at least 200 ⁇ l of a suspension of Ni-NTA magnetic agarose beads (obtained from Qiagen, h e.) that have been washed as described by the manufacturer and resuspended in 50 mM sodium phosphate containing 30 mM NaCl, pH 8 to the cell suspension containing the DNA binding protein library and pDomp that has been allowed to recover from electro- transformation as described in Example 2. This cell-magnetic bead suspension is incubated at 4° under mild agitation for 1 hour.
- the suspension is put into a magnetic particle concentrator.
- Six individual, consecutive step-wise elutions of cells bound to the beads are performed using 200 ⁇ l of a buffer containing 50 mM sodium phosphate containing 30 mM NaCl, pH 8 with 0, 20mM, 50mM, 100 mM, 150 mM and 250 mM imidazole.
- eluates of interest are then plated on IM2 indicator medium containing ampicillin and kanamycin such that between 500 and 1000 individual colonies per plate can be observed. Plates are allowed to incubate and the identification and isolation procedures as described in Example 2 are followed. Nariants of the cro fusion proteins can be identified with these techniques that bind to the target DNA sequence from the cauliflower mosaic virus 35S promoter DNA.
- an anti-his-tag antibody coated magnetic bead preparation is first prepared.
- Mouse IgG monoclonal antibody for example Penta-His Antibody, Qiagen GmbH
- Dynabeads Pan Mouse IgG Pan Mouse IgG (Dynal, Inc) as described by the bead manufacturer using a ratio of 0J to 1 ⁇ g IgG per 10 7 beads.
- the tube containing the mixture is then put into the magnetic particle concentrator device (Dynal MPC®-S) and the liquid culture is removed and saved.
- a variation of the selection methods described in example 2 is used to identify DNA binding protein fusions that bind new DNA sequences and uses a FLAG®-Tag protein epitope sequence (protein sequence DYKDDDK, TP Hopp, KS Prickett, V Price, RT Libby, CJ March, P Cerritti, DL Urdal, PJ Conlon. BioTechnology 6:1205-1210, 1988) displayed on the surface of the E. coli cell in place of the Strept-Tag® peptide described for plasmid pComp.
- FLAG®-Tag protein epitope sequence protein sequence DYKDDDK, TP Hopp, KS Prickett, V Price, RT Libby, CJ March, P Cerritti, DL Urdal, PJ Conlon. BioTechnology 6:1205-1210, 1988
- Figure 12 gives the DNA sequence of p ⁇ omp, an example of such a selection-reporter plasmid that employs a FLAG®Tag used to select for DNA binding protein variants that bind 35S promoter sequences from the cauliflower mosaic virus.
- the sequence of plasmid p ⁇ omp is identical in sequence to plasmid pComp of example 2 except for the sequence that defines the surface displayed tag of the ompA-fusion protein.
- a suitable combinatorial library of mutations of a DNA binding protein domain is constructed as described above for plasmid pP2croT and istransformed into cells that containplasmid p ⁇ omp.
- Cells that contain plasmid p ⁇ omp and that are competent for electro- transformation and subsequent selection and screening areprepared as in example 2 described for cells containing plasmid, pComp.
- Cells containing variants of the DNA binding protein that encode proteins able to repress the transcription of the genes for the ompA-FLAG®Tag separator and reporter proteins of pEomp are enriched from the total population using either separation methods with anti- FLAG®tag specific antibodies.
- an anti-FLAG®-tag M2 murine antibody coated magnetic bead preparation is first prepared.
- An M2 anti-FLAG®Tag antibody (available from several suppliers) is added to a suspension of Dynabeads Pan Mouse IgG (Dynal, Lie) as described by the bead manufacturer using a ratio of 0J to 1 ⁇ g IgG per 10 7 beads.
- the tube containing the mixture is then put into the magnetic particle concentrator device (Dynal MPC®-S) and the liquid culture is removed and saved.
- Example 2 The approximate number of cells in each of the eluates is then estimated as in Example 2. Aliquots of the eluates of interest (normally those eluted at lower peptide concentrations are then plated on IM2 indicator medium containing ampicillin and kanamycin such that between 500 and 1000 individual colonies per plate can be observed. Plates are allowed to incubate and the identification and isolation procedures as described in Example 2 are followed. Nariants of the cro fusion proteins can be identified with these techniques that bind to the target D ⁇ A sequence from the cauliflower mosaic virus 35S promoter D ⁇ A. F. Alternative Separation Epitope Tag Systems
- Epitope tag examples that can be used are exemplified by but not limited to the HA epitope (protem sequence YPYDVPDYA, HL Niman, RA Houghten, LA Walker, RA Reisfeld, IA Wilson, JM, Hogle, RA Lerner. Proc. Natl. Acad. Sci. USA 80:4949-4953, 1983; IA Wilson, HL Niman, RA Houghten, ML Cherenson, ML Connolly, RA Lerner.
- HA epitope protem sequence YPYDVPDYA, HL Niman, RA Houghten, LA Walker, RA Reisfeld, IA Wilson, JM, Hogle, RA Lerner. Proc. Natl. Acad. Sci. USA 80:4949-4953, 1983; IA Wilson, HL Niman, RA Houghten, ML Cherenson, ML Connolly, RA Lerner.
- Glu-Glu epitope protein sequence EEEEYMPME, T Grussenmeyer, KH Scheidtmann, MA Hutchinson, E Eckhart, G Walter. Proc. Natl. Acad. Sci. USA 82:7952-7054, 1985; B Rubinfeld , S Munemitsu, R Clark, L Conroy, K Watt, W Crosier, F McCormick, P Polakis. Cell 65: 1033- 1042,1991)
- KT3 epitope protein sequence PPEPET, H MacArthur, G Walter. J. Virol.
- Example 4 Use of Reporter Gene or Separation-reporter gene plasmids and combinatorial libraries of DNA binding proteins with Fluorescence Activated Cell Sorting (ACS to separate cells containing repressor variants that bind desired target DNA sequences from those that do not
- galactosidase analogs that increase in fluorescence upon hydrolysis by the ⁇ -galactosidase enzyme activity can be used to identify members of combinatorial libraries of mutations of DNA binding proteins that bind to desired target DNA sequences.
- Escherichia coli can be sorted on the basis of fluorescence intensity has been established for some time (Mia, F., Todd, P., Kompala, D.S. 1993 Biotechnology and Bioengineering 42: 708-715).
- Electro-competent cells containing a reporter plasmid as in Example 1 or a separation reporter plasmid as in Example 2 are prepared as described in these examples and combined with the combinatorial library of mutations of the DNA binding protein also as described in examples 1 or 2. After electroporation, cells are allowed to grow at 37° for 90 to 120 in M9 medium containing 0J g L IPTG without antibiotics at which time 30 ⁇ g/ml kanamycin and 50 ⁇ g/ml ampicillin are added. Growth at 37° is continued for another 60 minutes.
- Cells are then centrifuged and resuspended in M9 medium with antibiotics that contain 5 ⁇ M C 12 FDC. Staining is allowed to proceed for an additional 90 minutes at 37° in the dark at which point the cell suspension is made 5 mM in phenylethyl- ⁇ -D-thiogalactoside.
- the cells are assayed and sorted on the basis of the fluorescence of the fluorescein moiety using an argon laser at 488nm in a FACS apparatus.
- the FACS machine should be set to compensate for the intrinsic auto-fluorescence of the cell culture.
- Desired cell fractions are then plated on 1M2 indicator medium containing ampicillin and kanamycin such that between 500 and 1000 individual colonies per plate can be observed. Plates are allowed to incubate and the identification and isolation procedures as described in Example 2 are followed. Nariants of the cro fusion proteins can be identified with these techniques that bind to the target D ⁇ A sequence from the cauliflower mosaic virus 35S promoter D ⁇ A.
- fluorescently labeled secondary antibodies can be used in combination with primary antibody labeling of the surface displayed epitope tags.
- cells containing repressor variants that bind desired target D ⁇ A sequences can be separated from those that do not.
- the repressor variants can then be identified using conventional molecular biology techniques.
- Example 5 Methods of creating fusion proteins of D ⁇ A binding domains identified in experiments designed to identify new binding specificities with domains capable of directing compartrnentalization and with domains capable of enhancing transcriptional repression activities or transcriptional activation activities.
- D ⁇ A binding protein fusion protein variants when cloned into appropriate vectors containing appropriate transcription and translation control sequences can compete for binding with endogenous general transcription factors in the cells (for example, TATA binding protem) for the general transcription factor binding sequence, thereby decreasing expression from the targeted promoter.
- endogenous general transcription factors in the cells for example, TATA binding protem
- Sequences adjacent to the general transcription factor binding sequence when targeted by the D ⁇ A binding protein fusion protein variant can often provide specificity to the variant so that the desired general transcription factor binding site at a specific site in the chromosome can be targeted, h plant cells, for example, fusion protein variants possessing D ⁇ A binding domains that bind to sequences from, for example the cauliflower mosaic virus 35S promoter, can be used to decrease gene expression from such promoters.
- fusion protein variants possessing DNA binding domains that bind to can be used to decrease gene expression from the respective promoters, h lower eukaryotes and bacteria similarly, sequence specific D ⁇ A binding domains that target general or specific transcription factor binding sites can be used to decrease gene expression from the respective promoters.
- Nariants of the cro fusion proteins or other D ⁇ A binding proteins variants that have been identified that bind, for example, the integrated HIN promoter or the cauliflower mosaic virus 35S promoter, or that bind other targets in other desired promoters that were selected as described above can be further modified by fusion of transcriptional control domains to the C- terminus or ⁇ -terminus of the sequence derived from the mutagenized and selected D ⁇ A binding domain protein sequence.
- transcriptional control domains that enhance transcriptional repression properties of fusion proteins in plant cells are exemplified by, but not limited to a) the R2R3 Myb gene of Arabidopsis (AtMYB4 gene, amino acid residue numbers
- This particular domain has been shown when linked to a D ⁇ A binding domain specific for a cis-activating regulatory sequence of a promoter to activate transcription in both plant and mammalian cells.
- Other domains such as that derived from the N-terminal portion amino acids 39-82 or 41- 91 of the Opaque-2 transactivation factor from maize can be fused N-terminally or C-terminally with the DNA binding domain variants that bind the desired target DNA.
- domains as the transactivation domain from the VP16 and GAL4 proteins can be used.
- Figure 13 gives examples of AtMyb4, Oshoxl, GBF-1, Opaque2, GAL4 and VP16-derived transcriptional repression and activation domains that can be fused to DNA binding protein fusions to enhance transcription rates.
- transcriptional control proteins that enhance transcription of gene activity that are exemplified by, but not limited to the examples given here that can similarly be used to enhance transcription. Derivatives of these proteins can be fused to DNA binding domains such as derived here to increase transcriptional rates.
- variants can be created that replace the SV40 T antigen NLS sequences in pP2croT with NLS sequences active in a particular species, for example, the putative nuclear location sequences from Arabidopsis (amino acid residue sequence: KKSRRGPRSR, see for example Figure lc of Maes et al 2001 The Plant Cell 13:229-244), or other NLS sequences, for example AAKRVKLG, QAKKKKLDK, PKKKRKV, CNSAAFEDLRVLS and MNKIPIKDLLNPQC (Tung, C.-H. and Stein, S. 2000 Biocongugate Chemistry 11:605-618).
- DNA binding variants can also be constructed also that are intended to influence the transcription of sequences for the import of proteins into subcellular organelles such as mitochondria or chloroplasts, where for example, transcription of organelle specific genes can be influenced.
- DNA binding variants identified and selected as above can be created that use the sequences of dimeric cro fusion protein variant structures combined into a single chain versions of the corresponding dimeric proteins as exemplified for the 434 repressor (Chen, J.Q., Pongor, S., Simoncsits, A. 1997 Nucleic Acids Res. 25:2047-2054; Simoncsits, A., Chen, J.Q., Peripalle, P., Wang, S, Toro, I., Pongor, S. 1997 Mol. Biol. 267:118-131), ⁇ cro repressor (Jana, R., Hazbun, T.R., Fields, J.D., Mossing, M.C. 2000 Biochemistry 37:6446-6455) and the P22 phage arc repressor (Robinson, C.R., Sauer, R.T. 1996 Biochemistry 35:109-116).
- heterodimeric and single chain variants can be made that incorporate one monomeric structure of a DNA binding protein variant selected as above with a second variant monomeric structure that binds to a target DNA sequence that is different to that of the first variant.
- heterodimeric DNA binding proteins and single-chain variants thereof can be created that possess relatively long, non-palindromic binding sequences made up from the half-sites of the two originally identified homodimers. This method is similar to that taught by (Hollis et al.
- Promoter activity of a separator gene/ reporter gene polycistron or the promoter activity of a single reporter or separator gene can be optimized to a desired repressor protein by the following methods. It can be observed that strong or weak phenotypes of genes used for separation or reporter activities can mask some combinations of repressor operator transcriptional repression that can be observed when other promoters are utilized, hi order not to miss any candidates in selection experiments that use such non-optimized promoter reporter gene combinations, initial optimizations can be performed in a routine manner.
- Plasmid pZ434OR3 can be used to illustrate the methods. Plasmid pZ434OR3 ( Figure 13) possesses a nearly full length lacZ reporter gene with a relatively strong lacZ phenotype in comparison to the lacZ' ⁇ complementation reporter gene used in plasmid pP2HIVl. Plasmid pZ434OR3 also possesses a promoter combined with a target operator equivalent to the OR3 operator of phage 434 (sequence AGATCTAAGT TAGTGTATTG ACATGATAGA AGCACTCTAC TATATTCCTA GGAACAGTTT TTCTTGT).
- the promoter sequence was optimized to the strong lacZ-phenotype by first combinatorially mutagenizing it at several bases within the -10 Pribnow Schaller box and in the -35 consensus sequence (Pribnow, D. 1975 J. Mol. Biol. 99, 419-443; Schaller, H., Gray, C, Herrmann, K. 1975 Proc. Natl. Acad. Sci. USA 72:737-741). This was accomplished using cassette mutagenesis.
- the DNA cassettes for the to-be-optimized promoter of pZ434OR3 were constructed with oligonucleotides synthesized with degeneracies at positions within the -35 and -10 consensus sequences.
- the combinatorial library of promoter mutations can be reconstructed from the mutagenized promoter cassettes and double restricted (Bglll and Styl) pZ4340R3.
- the religated plasmid can then be transformed into an E. coli strain with a lacZ " phenotype and plated on 1M2 plates containing kanamycin. Colonies can be picked that show lacZ + phenotypes and plasmids can be prepared from overnight cultures made from these colonies.
- Plasmids can then be transformed into strains containing a repressor protein known to be able to bind and repress the target operator present in the plasmid. Colonies with optimally repressed lacZ phenotypes can then be isolated, plasmid can be purified, and the sequence of the optimized promoter mutant can be determined by DNA sequencing techniques.
- Example 7 Determination and optimization of the ideal distance between the promoter and target operator in the selection-reporter plasmid
- An optimal distance between the promoter used to drive transcription of the separation -reporter gene polycistron or single separation or reporter genes can be experimentally determined by the following techniques.
- a series of separation-reporter gene plasmids can be constructed from the to-be-optimized plasmid, such as pP2HINl, by restriction of for example the Styl site between the promoter and operator of the plasmid.
- D ⁇ A polymerase fill-in reactions and synthetic cassette and/or linker D ⁇ A re-ligations can be performed to generate a series of plasmids that have different D ⁇ A sequences and numbers of base-pairs between the promoter and operator sequences. The different distances when unknown can be experimentally determined by D ⁇ A sequencing techniques.
- DNA binding proteins can be desirable to use in combination with the separation and reporter techniques exemplified above for the identification of new variants that bind to desired DNA sequences.
- the structure of the protein can be optimized so that target DNA sequences can be subsequently identified.
- Homeodomain proteins are large DNA binding proteins involved in transcriptional control and development in eukaryotic cells that contain a relatively small domain (ca. 60 amino acid residues) that binds DNA. These small homeodomains can be expressed as relatively stable proteins and can be used as DNA binding domains that can repress transcription from target operators present in the reporter-separation plasmids described here.
- An example of such a homeodomain protein can be constructed from the vnd/NK-2 homeodomain proteins first described in Kim, Y. and Niremberg, M. 1989 Proc. Acad. Nath. Sci.
- Figure 14 gives an example of a plasmid that expresses a vnk NK-2 homeodomain.
- a modified reporter separation plasmid such as pComp
- a modified reporter plasmid such as pP2HTVl
- NK-2 binding sequences 5'ACTTGAGG
- NK-2 homeodomain Several combinatorial libraries of mutations of the NK-2 homeodomain can be made for example at positions conesponding to R5, K45, 146, Q50, H52, R53, Y54, and/or T56 (numbering as in the consensus homeodomain from Gehring et al, ibid, and Weiler, S. Gruschus, J.M., Tsao, D.H.H., Yu, L., Wang, L.-H., Nirenberg, M., Ferretti, J.A. 1998 J. Biol.Chem.
- HDLZ proteins are transcription factors that contain both a homeodomain and a leucine zipper dimerization domain (Sessa, G. Morelli, G. Ruberi, I. 1993 EMBO J. 12:3507-3517) that function most likely in vivo as homodimeric or heterodimeric oligomers.
- HDLZ-proteins have only been identified in plants, the small nature of the two domains, their relatively stable independent domain nature and the fact that leucine zipper domains and homeodomains are likely present in every eukaryotic organism, will allow skilled artisans to "mix and match" these two domain types to create new HDLZ proteins from DNA/ protein sequences from within any desired species. This will be especially important when new transcriptional control proteins are desired that are not transgenic or that should elicit only minimal immunological responses from a given species.
- HDLZ proteins can also be expressed as relatively stable proteins and can be used as homo- or heterodimeric DNA binding domains that can repress transcription from target operators present in the reporter-separation plasmids described here.
- An example of such an HDLZ protein that can be used with the methods presented here can be constructed from for example the ATHB-1 or ATHB-2 proteins described in Sessa, G. Morelli, G. Ruberti, I. 1993 EMBO J. 12:3507-3517 and Sessa, G. Morelli, G. Ruberti, 1.1997 J. Mol. Biol. 274:303-309.
- Figure 15 gives an example of a plasmid, pHDLZl, that expresses an ATHB-1 HDLZ-fusion protein.
- ATHB-1 HDLZ-fusion protein Several combinatorial libraries of mutations of the ATHB-1 HDLZ-fusion protein can be made for example at positions 45, 46, 50, 52, 53, 54, and/or 56 corresponding to the homeodomain consensus sequence numbering from Gehring et al. (ibid.) that can be used with separation- reporter gene plasmids or reporter gene plasmids having optimally located target sequences in the methods described above to identify new variants of the HDLZ fusion bind new D ⁇ A target sequences.
- Leucine zipper domain variants of the above HDLZ fusion proteins can be made that preferentially form heterodimeric or homodimeric structures.
- Zinc finger proteins can be used with the methods described here for the identification of new variants that bind altered target D ⁇ A sequences.
- An example of such a zinc finger protein with three individual fingers is the Zif268 immediate early protein (Pavletich, ⁇ .P. and Pabo, CO. 1991 Science 252:809-812).
- a plasmid, for example pKFZif ( Figure 16) that encodes a truncated Zif268 protein can be used to create and express combinatorial libraries of Zif268 variants that can be used with the methods described here to identify DNA binding specificity variants of a desired sequence specificity.
- the sites to be mutagenized by combinatorial methods are the residues 1, 2, 3, 5 and 6 of the individual zinc finger ⁇ -helices as well as the residue -1 that just precedes the zinc finger ⁇ -helices.
- a 9 bp target DNA sequence is cloned between the Styl and Kpnl sites of separation-reporter plasmid pComp and also reporter plasmid like pP2HINl.
- the three finger protein is optimized in three steps, each step being composed of library screens of each individual finger versus a target sequence chimera made from a partially desired sequence and a partial Zif268 consensus binding sequence. These chimeras are constructed such that in the first screen, a library of the first finger is screened versus a target chimera containing three to four bases of the desired sequence combined with six to 7 bases of the Zif268 binding sequence. Consecutive screens of libraries of the remaining fingers versus desired sequences at the appropriate subsites combined with known binding sequences for the remaining fingers yield individual finger variants specific for the desired 9 bp sequence when combined in the appropriate order.
- the repression of these target sequences preferably is optimized by including mismatches in the Zif268 binding site sequences. This embodiment reduces the affinity of the protein, lowering its ability to repress the target chimera. Higher affinity binding variants can then be identified that have an increased affinity for the target chimera by virtue of an increased affinity to the desired subsite. D. Optimization of a zinc-finger homeodomain DNA binding protein fusion for use with a separation reporter plasmid.
- Zinc finger homeodomain fusion proteins described elsewhere are useful in the methods described here for the identification of new variants that bind altered target DNA sequences.
- An example of such a zinc finger homeodomain fusion protein is that reported by Pomerantz, J.L., Sharp, P.A., Pabo, CO. 1995 Science 267:93-96.
- a plasmid, for example pZFHD ( Figure 17) that encodes a similar zinc finger homeodomain fusion protein is used with the methods described above to identify DNA binding specificity variants of a desired sequence specificity.
- Target DNA sequences that reflect the partial subsites of the high affinity 5'TAATGATGGGCG sequence known for the ZFHD are sequentially identified from libraries of the zinc fingers and homeodomain and combined into the final desired target sequence.
- DNA binding protein fusions can be constructed such that dimerization of monomers will occur. This can be advantageous for certain selections using palindromic and partial palindromic target sequences. Optimization of the distance between half sites can be performed using known partial site binding sequences as described above.
- a zinc finger- leucine zipper fusion can be constructed that can be used with the methods described here for the identification of new variants that bind altered target DNA sequences.
- An example of such a zinc finger- leucine zipper fusion protem is given in Figure 18.
- a zinc finger- homeodomain-leucine zipper fusion can similarly be constructed from for example Zif zinc fingers 1 and 2 and the ATHB-1 homeodomain-leucine zipper domains that can be used with the methods described here for the identification of new variants that bind altered target DNA sequences.
- An example of such a zinc finger- homeodomain-leucine zipper fusion protein is given in Figure 19.
- DNA binding domain-hormone dependent dimerization domain fusion proteins like those used by for example Braselmann et al. (1993), Wang et al. (1994) and Beerli et al. (2000) can also be constructed, that when complexed with an appropriate small molecule compound, induce dimerization processes that can lead to DNA binding affinity and specificity increases.
- a plasmid vector like the pP2croT that encodes a small molecule-dependent dimeric DNA binding protein composed of the a progesterone-dependent dimerization domain fused to a zinc finger DNA binding domain is shown in Figure 20.
- This plasmid, pZFPRl can be used in experiments to identify variants that bind desired DNA target sequences when screening and selection experiments are performed in the presence of an appropriate progesterone analog like RU486.
- An analog example using a zinc-finger fusion with an estrogen receptor dimerization domain is given in Example 22 (pZFERl).
- This DNA binding domain estrogen dependent dimerization fusion protem encoding plasmid can be used in the presence of estrogen analogs to similarly identify variants that bind desired DNA target sequences.
- DNA binding domains can be fused with peptides that direct the dimerization of proteins such as those found in Wang, B.S and Pabo, CO. 1999 Proc. Natl. Acad. Sci. USA 96: 9568-9573 to create fusion proteins that can be used with the methods described here for the identification of new variants that bind altered target DNA sequences.
- Example 9 Use of DNA binding dependent transcriptional activation and separation-reporter gene expression to identify new DNA binding variants that bind new DNA sequences.
- the enhancement of transcription of separation tag genes such as those present in plasmids, pComp, that are cloned behind target DNA binding sites recognized by DNA binding protein domain-protein interaction domain fusions can be used to identify new DNA binding variants from combinatorial libraries made from the DNA binding domain.
- Yeast and bacterial 2- and 1-hybrid systems are known (Chien, C.-T., Bartel, P.L., Sternglanz, R., Fields, S. 1991 Proc. natl. Acad.
- Plasmid pPN4 is a derivative of plasmid pKFzif that encodes a yeast GAL1 IP fused to the Zif268 DNA binding domain and a RNA polymerase alpha subunit (rpoA) - yeast GAL4 protein fusion in a second cistron.
- Libraries of zinc fingers in pPN4 can be constructed as described in Examples 8 C, D and E.
- a derivative of pComp, pOLH4a can be constructed that has a weak promoter for the separation tag gene and reporter gene (same structural genes as used in Example 2, pComp) and an independent cistron that encodes the yeast HIS3 gene with the same weak bacterial promoter. Transcription of both structural gene sets can be activated by the Zif-RpoA fusion protein encoded on pPN4.
- the strength of the weak promoters present and the relative positions in the pOLH4a plasmid can be optimized by the methods in examples 6 and 7.
- Each of the relevant structural gene sets are isolated in pOHL2 with transcriptional termination sequences and each is bounded on its transcriptionally upstream side with a desired target operator sequence for Zif- RpoA fusion proteins or zinc-finger variants thereof produced as described in Example 8C
- the pPN4 libraries are combined with plasmid pOLH4a in lacZ- hisB- E. coli cells or when pPN4 libraries that are transformed with lacZ- hisB- E.
Landscapes
- Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Wood Science & Technology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biochemistry (AREA)
- Physics & Mathematics (AREA)
- Microbiology (AREA)
- Plant Pathology (AREA)
- Cell Biology (AREA)
- Toxicology (AREA)
- Gastroenterology & Hepatology (AREA)
- Medicinal Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
- Peptides Or Proteins (AREA)
- Preparation Of Compounds By Using Micro-Organisms (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2002225623A AU2002225623A8 (en) | 2000-11-17 | 2001-11-16 | Creation and identification of proteins having new dna binding specificities |
US10/416,708 US20040161753A1 (en) | 2001-11-16 | 2001-11-16 | Creation and identification of proteins having new dna binding specificities |
AU2002225623A AU2002225623A1 (en) | 2000-11-17 | 2001-11-16 | Creation and identification of proteins having new dna binding specificities |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US24954600P | 2000-11-17 | 2000-11-17 | |
US60/249,546 | 2000-11-17 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002040632A2 true WO2002040632A2 (en) | 2002-05-23 |
WO2002040632A3 WO2002040632A3 (en) | 2009-06-11 |
Family
ID=22943953
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/043107 WO2002040632A2 (en) | 2000-11-17 | 2001-11-16 | Creation and identification of proteins having new dna binding specificities |
Country Status (2)
Country | Link |
---|---|
AU (2) | AU2002225623A1 (en) |
WO (1) | WO2002040632A2 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5747253A (en) * | 1991-08-23 | 1998-05-05 | Isis Pharmaceuticals, Inc. | Combinatorial oligomer immunoabsorbant screening assay for transcription factors and other biomolecule binding |
US5773218A (en) * | 1992-01-27 | 1998-06-30 | Icos Corporation | Method to identify compounds which modulate ICAM-related protein interactions |
US5871902A (en) * | 1994-12-09 | 1999-02-16 | The Gene Pool, Inc. | Sequence-specific detection of nucleic acid hybrids using a DNA-binding molecule or assembly capable of discriminating perfect hybrids from non-perfect hybrids |
US5955280A (en) * | 1995-04-11 | 1999-09-21 | The General Hospital Corporation | Reverse two-hybrid system |
-
2001
- 2001-11-16 WO PCT/US2001/043107 patent/WO2002040632A2/en not_active Application Discontinuation
- 2001-11-16 AU AU2002225623A patent/AU2002225623A1/en not_active Abandoned
- 2001-11-16 AU AU2002225623A patent/AU2002225623A8/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5747253A (en) * | 1991-08-23 | 1998-05-05 | Isis Pharmaceuticals, Inc. | Combinatorial oligomer immunoabsorbant screening assay for transcription factors and other biomolecule binding |
US5773218A (en) * | 1992-01-27 | 1998-06-30 | Icos Corporation | Method to identify compounds which modulate ICAM-related protein interactions |
US5871902A (en) * | 1994-12-09 | 1999-02-16 | The Gene Pool, Inc. | Sequence-specific detection of nucleic acid hybrids using a DNA-binding molecule or assembly capable of discriminating perfect hybrids from non-perfect hybrids |
US5955280A (en) * | 1995-04-11 | 1999-09-21 | The General Hospital Corporation | Reverse two-hybrid system |
Also Published As
Publication number | Publication date |
---|---|
AU2002225623A8 (en) | 2009-07-30 |
WO2002040632A3 (en) | 2009-06-11 |
AU2002225623A1 (en) | 2002-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101268939B1 (en) | Phage display using cotranslational translocation of fusion polypeptides | |
EP0610448B1 (en) | Peptide library and screening method | |
US7416847B1 (en) | In vitro peptide or protein expression library | |
US5989814A (en) | Screening methods in eucaryotic cells | |
KR20190082318A (en) | CRISPR / CPF1 system and method | |
EP2022856A2 (en) | Improvements in or relating to binding proteins for recognition of DNA | |
AU714424B2 (en) | In vivo selection of RNA-binding peptides | |
RU2702087C2 (en) | New methods for displaying cyclic peptides on bacteriophage particles | |
JPS6229966A (en) | Pho a-variant e. coli-sb44, its production, isolation of phoa-gene-containing plasmid utilizing said e. coli, productionof novel plasmid, plasmid vector and pho a-gene- containing plasmid, production of novel strain and production of alkaliphosphatase | |
EP1053347B1 (en) | Peptide detection method | |
JP7521028B2 (en) | Method for selecting cells based on CRISPR/Cas-controlled incorporation of a detectable tag into a target protein | |
EP2190989B1 (en) | Method for manufacturing a modified peptide | |
US8852866B2 (en) | Method for producing and identifying soluble protein domains | |
AU2002341204A1 (en) | Method for producing and identifying soluble protein domains | |
WO2002040632A2 (en) | Creation and identification of proteins having new dna binding specificities | |
EP0995797A1 (en) | Methods for detecting and isolating nuclear transport proteins | |
US20020045188A1 (en) | Methods for validating polypeptide targets that correlate to cellular phenotypes | |
Zhang et al. | Affinity Selection of DNA-Binding Proteins Displayed on Bacteriophage λ | |
Harada et al. | Screening RNA-binding libraries using a bacterial transcription antitermination assay | |
EP3448873B1 (en) | Engineered fha domains | |
KR20030062510A (en) | Method for screening of protein-protein interaction by double library two-hybrid system and the recombinant yeast strains used for the system | |
Bhat et al. | The tobacco BY-2 cell line as a model system to understand in planta nuclear coactivator interactions | |
AU1009499A (en) | Eukaryotic cell-based system for identifying gene modulators | |
KR20070035499A (en) | Process for producing polypeptide | |
WO2007019603A1 (en) | A method for detecting the interaction of a polypeptide of interest with a target nucleotide sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10416708 Country of ref document: US |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC FORM 1205A OF 10-09-03 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase in: |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |