AU2733595A

AU2733595A - N-terminally extended proteins expressed in yeast

Info

Publication number: AU2733595A
Application number: AU27335/95A
Authority: AU
Inventors: Jakob Brandt; Thomas Borglum Kjeldsen; Knud Vad
Original assignee: Novo Nordisk AS
Current assignee: Novo Nordisk AS
Priority date: 1994-06-17
Filing date: 1995-06-16
Publication date: 1996-01-15
Also published as: IL114160A0; NO965411L; JP3730255B2; BR9508053A; ZA954983B; EP0765395A1; NO965411D0; IL114160A; CZ363796A3; SG47883A1; CA2192943A1; WO1995035384A1; PL317761A1; FI965030A0; HU9603476D0; MX9606536A; CN1152942A; FI965030A; HUT75840A; JPH10501695A

Description

N-terminally Extended Proteins Expressed in Yeast

FIELD OF INVENTION

The present invention relates to polypeptides expressed and processed in yeast, a DNA construct comprising a DNA sequence encoding such polypeptides, vectors carrying such DNA frag¬ ments and yeast cells transformed with the vectors, as well as a process of producing heterologous proteins in yeast.

BACKGROUND OF THE INVENTION

Yeast organisms produce a number of proteins synthesized in- tracellularly, but having a function outside the cell. Such extracellular proteins are referred to as secreted proteins. These secreted proteins are expressed initially inside the cell in a precursor or a pre-form containing a presequence ensuring effective direction of the expressed product across the membrane of the endoplasmic reticulum (ER) . The presequence, normally named a signal peptide, is generally cleaved off from the desired product during translocation. Once entered in the secretory pathway, the protein is transported to the Golgi apparatus. From the Golgi the protein can follow different routes that lead to compartments such as the cell vacuole or the cell membrane, or it can be routed out of the cell to be secreted to the external medium (Pfeffer, S.R. and Rothman, J.E. Ann.Rev.Biochem. 56 (1987), 829-852) .

Several approaches have been suggested for the expression and secretion in yeast of proteins heterologous to yeast. Euro¬ pean Publication No. 0088632A describes a process by which proteins heterologous to yeast are expressed, processed and secreted by transforming a yeast organism with an expression vector harbouring DNA encoding the desired protein and a signal peptide, preparing a culture of the transformed organism, growing the culture and recovering the protein from the culture medium. The signal peptide may be the desired proteins own signal peptide, a heterologous signal peptide or a hybrid of native and heterologous signal peptide.

A problem encountered with the use of signal peptides hetero- logous to yeast might be that the heterologous signal peptide does not ensure efficient translocation and/or cleavage after the signal peptide.

The Saccharomyces cerevisiae MFαl (α-factor) is synthesized as a prepro form of 165 amino acids comprising a 19 amino acids long signal- or prepeptide followed by a 64 amino acids long "leader" or propeptide, encompassing three N-linked glycosylation sites followed by (LysArg(Asp/Glu, Ala)_2.3α- factor)₄ (Kurjan, J. and Herskowitz, I. Cell 30 (1982), 933- 943) . The signal-leader part of the preproMFαl has been widely employed to obtain synthesis and secretion of heterologous proteins in S^ cerivisiae.

Use of signal/leader peptides homologous to yeast is known from i.a. US Patent No. 4,546,082, European Publications Nos.

0116201A, 0123294A, 0123544A, 0163529A, and 0123289A and European Patent No. 0100561B.

In EP 0123289A utilization of the S__s. cerevisiae α-factor pre¬ cursor is described whereas EP 100561 describes the utilization of the Saccharomyces cerevisiae PH05 signal and PCT Publication No. WO 95/02059 describes the utilization of YAP3 signal peptide for secretion of foreign proteins.

US Patent No. 4,546,082 and European Publications Nos. 0016201A, 0123294A, 0123544A, and 0163529A describe processes by which the α-factor signal-leader from Saccharomyces cerevisiae (MFαl or MFα2) is utilized in the secretion pro- cess of expressed heterologous proteins in yeast. By fusing a DNA sequence encoding the S^ cerevisiae MFαl signal/leader sequence at the 5^• end of the gene for the desired protein secretion and processing of the desired protein was demonstrated.

EP 206,783 discloses a system for the secretion of polypep¬ tides from £>_;_ cerevisiae whereby the α-factor leader sequence has been truncated to eliminate the four α-factor peptides present on the native leader sequence so as to leave the leader peptide itself fused to a heterologous polypeptide via the α-factor processing site Lys-Arg-Glu-Ala-Glu-Ala. This construction is indicated to lead to an efficient process of smaller peptides (less than 50 amino acids) . For the se¬ cretion and processing of larger polypeptides, the native α- factor leader sequence has been truncated to leave one or two α-factor peptides between the leader peptide and the polypep¬ tide.

A number of secreted proteins are routed so as to be exposed to a proteolytic processing system which can cleave the pep¬ tide bond at the carboxy end of two consecutive basic amino acids. This enzymatic activity is in S__^_ cerevisiae encoded by the KEX 2 gene (Julius, D.A. et al. , Cell 37 (1984b), 1075). Processing of the product by the KEX 2 protease is needed for the secretion of active S_;_ cerevisiae mating factor αl (MFαl or α-factor) but is not involved in the secretion of active S. cerevisiae mating factor a.

Secretion and correct processing of a polypeptide intended to be secreted is obtained in some cases when culturing a yeast organism which is transformed with a vector constructed as indicated in the references given above. In many cases, how¬ ever, the level of secretion is very low or there is no se¬ cretion, or the proteolytic processing may be incorrect or incomplete. As described in PCT Publication No. WO 90/10075 this is believed to be ascribable, to some extent, to an insufficient exposure of the processing site present between the C-terminal end of the leader peptide and the N-terminal end of the heterologous protein so as to render it inaccessible or, at least, less accessible to proteolytic cleavage.

WO 90/10075 describes a yeast expression system with improved processing of a heterologous polypeptide obtained by providing certain modifications near the processing site at the C-terminal end of the leader peptide and/or the N- terminal end of a heterologous polypeptide fused to the leader peptide. Thus, it is possible to obtain a higher yield of the correctly processed protein than is obtainable with unmodified leader pep ide-heterologous polypeptide constructions.

SUMMARY OF THE INVENTION

The present invention describes modifications of the N- terminal end of the heterologous polypeptide designed as extensions which can be cleaved off either by naturally occurring yeast proteases before purification from the culture media or by in vitro proteolysis during or subsequently to purification of the product from the culture media.

The present invention relates to a DNA construct encoding a polypeptide having the following structure signal peptide-leader peptide-X¹-X²-X³-X⁴-X⁵-X⁶-X⁷-heterologous protein wherein X¹ is Lys or Arg; X² is Lys or Arg, X¹ and X² together defining a yeast processing site; X³ is Glu or Asp;

X⁴ is a sequence of amino acids with the following structure (A " B)_n wherein

A is Glu or Asp,

B is Ala, Val, Leu or Pro, and n is 0 or an integer from 1 to 5, and when n>2 each A and B is the same or different from the other A(s) and B(s) , or X⁴ is a sequence of amino acids with the following structure

^(C). wherein

C is Glu or Asp, and m is 0 or an integer from 1 to 5;

X⁵ is a peptide bond or is one or more amino acids which may be the same or different; X⁶ is a peptide bond or an amino acid residue selected from the group consisting of Pro, Asp, Thr, Glu, Ala and Gly; and X⁷ is Lys or Arg.

In the present context, the term "signal peptide" is under¬ stood to mean a presequence which is predominantly hydropho- bic in nature and present as an N-terminal sequence on the precursor form of an extracellular protein expressed in yeast. The function of the signal peptide is to allow the he¬ terologous protein to be secreted to enter the endoplasmic reticulum. The signal peptide is normally cleaved off in the course of this process. The signal peptide may be hetero¬ logous or homologous to the yeast organism producing the pro¬ tein but a more efficient cleavage of the signal peptide may be obtained when it is homologous to the yeast organism in question.

The expression "leader peptide" is understood to indicate a predominantly hydrophilic peptide whose function is to allow the heterologous protein to be secreted to be directed from the endoplasmic reticulum to the Golgi apparatus and further to a secretory vesicle for secretion into the medium, (i.e. exportation of the expressed protein or polypeptide across the cell wall or at least through the cellular membrane into the periplasmic space of the cell) . The expression "heterologous protein" is intended to indicate a protein or polypeptide which is not produced by the host yeast organism in nature.

X³-X⁴-X⁵-X⁶-X⁷ together form an extension at the N-terminal of the heterologous polypeptide. This extension not only increases the fermentation yield but is due to the presence of X³, protected against dipeptidyl aminopeptidase (DPAP A) processing, resulting in a homogenous N-terminal of the polypeptide. The extension may be constructed in such a way that it will be cleaved off by naturally occurring yeast proteases other than DPAP such as, for instance, yeast aspartic protease 3 (YAP3) before purification of the heterologous protein product from the culture media. Alternatively, the extension may be constructed in such a way that it is resistant to proteolytic cleavage during fermentation so that the N-terminally extended heterologous protein product may be purified from the culture media for subsequent in vitro maturation, e.g. by trypsin, Achromobacter lyticus protease I or enterokinase.

In a still further aspect, the invention relates to a process for producing a heterologous protein in yeast, comprising cultivating the transformed yeast strain in a suitable medium to obtain expression and secretion of the heterologous protein, after which the protein is isolated from the medium.

DETAILED DESCRIPTION OF THE INVENTION

In the peptide structure (A - B)_n, n is preferably 2-4 and more preferably 3.

In preferred polypeptides according to the invention X³ may be Glu, A may be Glu, B may be Ala, X⁵ may be a peptide bond or Glu, or Glu Pro Lys Ala, or X⁶ may be Pro or a peptide bond. Examples of possible N-terminal extensions X³-X⁴-X⁵-X⁶-X⁷ are:

Glu Glu Ala Glu Ala Glu (Pro/Ala) (Glu/Lys) (Ala/Glu/Lys/Thr) Arg Ala Pro Arg,

Glu Glu Ala Glu Ala Glu Pro Lys Ala (Thr/Pro) Arg,

Glu Glu Ala Glu Ala Glu Ala Glu Pro Arg,

Glu Glu Ala Glu Ala Glu Ala Glu Pro Lys,

Glu Glu Ala Glu Ala Glu Ala Glu Arg,

Glu Glu Ala Glu Ala Glu Ala (Asp/Ala/Gly/Glu) Lys,

Glu Glu Ala Glu Ala Glu Ala (Pro/Leu/lie/Thr) Lys,

Glu Glu Ala Glu Ala Glu Ala Arg,

Glu Glu Ala Glu Ala Glu (Glu/Asp/Gly/Ala) Lys,

Glu Glu Ala Glu Ala Pro Lys,

Glu Glu Ala Pro Lys,

Asp Asp Ala Asp Ala Asp Ala Asp Pro Arg,

Glu Glu Glu Glu Pro Lys,

Glu Glu Glu Pro Lys,

Asp Asp Asp Asp Asp Lys, and

Glu Glu Pro Lys.

The signal peptide sequence of the polypeptide of the inven- tion may be any signal peptide which ensures an effective di- rection of the expressed polypeptide into the secretory path¬ way of the cell. The signal peptide may be a naturally oc¬ curring signal peptide or a functional part thereof, or it may be a synthetic peptide. Suitable signal peptides have been found to be the α-factor signal peptide, the signal peptide of mouse salivary amylase, a modified carboxypeptidase signal peptide or the yeast BAR1 signal peptide and the yeast aspartic protease 3 (YAP3) signal peptide. The mouse salivary amylase signal sequence is desc- ribed by 0. Hagenbύchle et al., Nature 289, 1981, pp. 643- 646. The carboxypeptidase signal sequence is described by L.A. Vails et al. , Cell 48. 1987, pp. 887-897. The BAR1 signal peptide is disclosed in PCT Publication No. WO 87/02670. The yeast aspartic protease 3 signal peptide is described in PCT Publication No. 95/02059.

The leader peptide sequence of the polypeptide of the inven¬ tion may be any leader peptide which is functional in direct¬ ing the expressed polypeptide through the endoplasmic reticulum and further along the secretory pathway. Possible leader sequences which are suited for this purpose are natural leader peptides derived from yeast or other organisms, such as the α-factor leader or a functional analogue thereof. The leader peptide may also be a synthetic leader peptide, e.g. one of the synthetic leader peptides disclosed in PCT Publication No. WO 89/02463 or WO 92/11378.

The N-terminally extended heterologous protein produced by the method of the invention may be any protein which may advantageously be produced in yeast. Examples of such pro¬ teins are aprotinin, tissue factor pathway inhibitor or other protease inhibitors, and insulin or insulin precursors, insulin analogues, insulin-like growth factor I or II, human or bovine growth hormone, interleukin, tissue plasminogen activator, glucagon, glucagon-like peptide-1 (GLP-1) , Factor VII, Factor VIII, Factor XIII, platelet-derived growth fac- tor, enzymes, or a functional analogue of anyone of these proteins. In the present context, the term "functional analogue" is meant to indicate a polypeptide with a similar function as the native protein (this is intended to be understood as relating to the nature rather than the level of biological activity of the native protein) . The polypeptide may be structurally similar to the native protein and may be derived from the native protein by addition of one or more amino acids to either or both the C- and N-terminal end of the native protein, substitution of one or more amino acids at one or a number of different sites in the native amino acid sequence, deletion of one or more amino acids at either or both ends of the native protein or at one or several sites in the amino acid sequence, or insertion of one or more amino acids at one or more sites in the native amino acid sequence. Such modifications are well known for several of the proteins mentioned above.

Examples of suitable insulin precursors and insulin analogues are B(l-29) -Ala-Ala-Lys-A(l-21) (as described in, e.g., EP 163 529), B(l-27)-Asp-Lys-Ala-Ala-Lys-A(l-21) (as described in, e.g., PCT Publication No. 95/00550), B(l-29)-Ala-Ala-Arg- A(l-21) (as described in , e.g., PCT Publication No. 95/07931) , and B(1-29) -Ser-Asp-Asp-Ala-Arg-A(1-21) .

The DNA construct of the invention encoding the polypeptide of the invention may be prepared synthetically by established standard methods, e.g. the phosphoamidite method described by S.L. Beaύcage and M.H. Caruthers, Tetrahedron Letters 22, 1981, pp. 1859-1869, or the method described by Matthes et al., EMBO Journal 3. 1984, pp. 801-805. According to the phosphoamidite method, oligonucleotides are synthesized, e.g. in an automatic DNA synthesizer, purified, duplexed and ligated to form the synthetic DNA construct. A currently preferred way of preparing the DNA construct is by polymerase chain reaction (PCR), e.g. as described in Sambrook et al., Molecular Cloning; A Laboratory Manual, Cold Spring Harbor, NY, 1989) . The DNA construct of the invention may also be of genomic or cDNA origin, for instance obtained by preparing a genomic or cDNA library and screening for DNA sequences coding for all or part of the polypeptide of the invention by hybridization using synthetic oligonucleotide probes in accordance with standard techniques (cf. Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, 1989) . In this case, a genomic or cDNA sequence encoding a signal and leader peptide may be joined to a genomic or cDNA sequence encoding the heterologous protein, after which the DNA sequence may be modified at a site corresponding to the amino acid sequence X¹-X -X³-X⁴-X⁵-X⁶-X⁷ of the polypeptide, e.g. by inserting synthetic oligonucleotides encoding the desired amino acid sequence for homologous recombination in accordance with well-known procedures.

Finally, the DNA construct may be of mixed synthetic and ge¬ nomic, mixed synthetic and cDNA or mixed genomic and cDNA origin prepared by annealing fragments of synthetic, genomic or cDNA origin (as appropriate) , the fragments corresponding to various parts of the entire DNA construct, in accordance with standard techniques. Thus, it may be envisaged that the DNA sequence encoding the heterologous protein may be of ge¬ nomic or cDNA origin, while the sequence encoding the signal and leader peptide as well as the sequence encoding the N- terminal extension X-X²-X³-X⁴-X⁵-X⁶-X⁷ may be prepared synthetically.

In a further aspect, the invention relates to a recombinant expression vector which is capable of replicating in yeast and which carries a DNA construct encoding the above-defined polypeptide. The recombinant expression vector may be any vector which is capable of replicating in yeast organisms. In the vector, the DNA sequence encoding the polypeptide of the invention should be operably connected to a suitable promoter sequence. The promoter may be any DNA sequence which shows transcriptional activity in yeast and may be derived from genes encoding proteins either homologous or heterologous to yeast. The promoter is preferably derived from a gene en¬ coding a protein homologous to yeast. Examples of suitable promoters are the Saccharomyces cerevisiae Mαl, TPI, ADH or PGK promoters.

The DNA sequence encoding the polypeptide of the invention may also be operably connected to a suitable terminator, e.g. the TPI terminator (cf. T. Alber and G. Kawasaki, J. Mol. Appl. Genet. 1. 1982, pp. 419-434).

The recombinant expression vector of the invention further comprises a DNA sequence enabling the vector to replicate in yeast. Examples of such sequences are the yeast plasmid 2μ replication genes REP 1-3 and origin of replication. The vec¬ tor may also comprise a selectable marker, e.g. the Schizo- saccharomyces pombe TPI gene as described by P.R. Russell, Gene 40, 1985, pp. 125-130.

The procedures used to ligate the DNA sequences coding for the polypeptide of the invention, the promoter and the ter¬ minator, respectively, and to insert them into suitable yeast vectors containing the information necessary for yeast re¬ plication, are well known to persons skilled in the art (cf. , for instance, Sambrook et al., op.cit. ) . It will be under¬ stood that the vector may be constructed either by first pre¬ paring a DNA construct containing the entire DNA sequence coding for the polypeptide of the invention and subsequently inserting this fragment into a suitable expression vector, or by sequentially inserting DNA fragments containing genetic information for the individual elements (such as the signal, leader or heterologous protein) followed by ligation.

The yeast organism used in the process of the invention may be any suitable yeast organism which, on cultivation, pro¬ duces large amounts of the heterologous protein or polypep¬ tide in question. Examples of suitable yeast organisms may be strains of the yeast species Saccharomyces cerevisiae, Sac¬ charomyces kluyveri, Schizosaccharo yces pombe or Saccharo¬ myces uvarum. The transformation of the yeast cells may for instance be effected by protoplast formation followed by transformation in a manner known per se. The medium used to cultivate the cells may be any conventional medium suitable for growing yeast organisms. The secreted heterologous protein, a significant proportion of which will be present in the medium in correctly processed form, may be recovered from the medium by conventional procedures including separating the yeast cells from the medium by centrifugation or filtration, precipitating the proteinaceous components of the supernatant or filtrate by means of a salt, e.g. ammonium sulphate, followed by purification by a variety of chromatographic procedures, e.g. ion exchange chromatography, affinity chromatography, or the like.

After secretion to the culture medium, the protein may be subjected to various procedures to remove the sequence X³-X⁴- X⁵-X⁶-X⁷.

When X⁶ is Pro, Thr, Ser, Gly, or Asp, the extensions are found to be stably attached to the heterologous protein during fermentation, protecting the N-terminal of the heterological protein against the proteolytic activity of yeast proteases such as DPAP. The presence of an N-terminal extension on the heterologous protein may also serve as a protection of the N-terminal amino group of the heterologous protein during chemical processing of the protein, i.e. it may serve as a substitute for a BOC or similar protecting group. In such cases the amino acid sequence X³-X⁴-X⁵-X⁶-X⁷ may be removed from the recovered heterologous protein by means of a proteolytic enzyme which is specific for a basic amino acid (e.g. Lys or Arg) so that the terminal extension is cleaved off at X⁷. Examples of such proteolytic enzymes are trypsin, Achromobacter lyticus protease I, Enterokinase, Fusarium oxysporum trypsin-like protease, and yeast aspartic protease 3 (YAP3) of Saccharomyces cerevisiae.

YAP3 has been characterized applying various prohormones and expressions of heterologous protein in Saccharomyces cerevisiae (Niamh, X.C. et al., FEBS Letters, 332 , p. 273- 276, 1993; Bourbonnais, Y. et al., EMBO Journal, .12, p. 285- 294, 1993; Egel-Mitani, M. et al. , YEAST, 6 , p. 127-137, 1990) .

YAP3 appears during large scale fermentations of

Saccharomyces cerevisiae. It is active when the pH is from pH 3 to about pH 6. The appearance of YAP3 is most pronounced using minimal types of media for production. In such media the cells are starved of glucose and/or nitrogen, but other conditions which limit the growth conditions may be used. The YAP3 protease requires a defined motif flanking the cleavage site. Such a motif is present in the N-terminal extension when the amino acid immediately preceding X⁷ is Ala or Glu (but not when X⁶ is Pro, Thr, Ser, Gly or Asp) .

Thus, when X⁶ is a peptide bond (and the amino acid preceding X⁷ is Ala or Glu) or Ala or Glu, the extensions may to be cleaved from the heterologous protein during fermentation, most likely subsequent to the secretion thereof from the yeast cells - a process which depends on YAP3 acting on its substrate (the peptide bond after lysine or arginine) in the yeast cells or in the growth media. YAP3 inside, attached to, or escaped from the yeast cells may selectively cleave off the extensions such that the non-extended product may be purified from the growth media. The amino acid sequence X³-X⁷, wherein X⁶ Ala or Glu or is a peptide bond (and the amino acid preceding X⁷ is Ala or Glu) may be removed from the heterologous protein in the medium by a process involving subjecting the yeast cells to stress to make them release YAP3 into the medium, whereby the amino acid sequence X³-X⁷ is cleaved off. The stress to which the yeast cells are subjected may comprise reducing the pH of the culture medium to below 6.0, preferably below 5, or starving the yeast cells of glucose and/or nitrogen.

In the process of the invention for producing a heterologous protein, the yeast cell may further be transformed with one or more genes encoding a protease which is specific for basic amino acid residues so that, on cultivation of the cell the gene or genes are expressed, the consequent production of protease ensuring a mo're complete cleavage of X³ - X⁷ from the heterologous polypeptide.

In a preferred embodiment, one or more additional genes encoding YAP3 may be used to transform the yeast cell so that, on cultivation of the cell the gene or genes are overexpressed, the consequent overproduction of YAP3 ensuring a more complete cleavage of X³-X⁷ from the heterologous polypeptide.

In the present context, yeast aspartic protease 3 (YAP3) embraces the native YAP3 enzyme as well as enzymes derived from the native enzyme, wherein the C-terminal has been modified in order to ensure efficient release of the YAP3 protein from the cell or wherein any modification has taken place maintaining the proteolytic activity.

Alternatively the protease may be A.lyticus protease I, Enterokinase or trypsin.

The present invention is described in further detail in the following examples which are not in any way intended to limit the scope of the invention as claimed.

The invention is described with reference to the drawings, wherein Fig. 1 shows a general scheme for the construction of plasmids containing genes expressing N-terminally extended polypeptides.

In Fig. 1, the following symbols are used: 1: Denotes the TPI gene promoter sequence from S. cerevisiae. 2: Denotes the region encoding a signal/leader peptide

(e.g. from the α-factor gene of S. cerevisiae) . 3: Denotes the region encoding a heterologous polypeptide.

3*: Denotes the region encoding a N-terminal extended heterologous polypeptide. 4: • Denotes the TPI gene terminator sequence of S. cerevisiae. Pl: Denotes a synthetic oligonucleotide PCR primer determining the structure of the N-terminal extension. P2: Denotes a universal PCR primer for the amplification of region 3. POT: Denotes TPI gene from S. pombe..

2μ Ori: Denotes a sequence from S. cerevisiae 2 μ plasmid including its origin of DNA replication in S. cerevisiae. Ap^R: Sequence from pBR322 /pUC13 including the ampicilli resistance gene and an origin of DNA replication in

E. coli.

Fig. 2 shows the DNA sequence in pJB59 encoding the insulin precursor B_chain(l-27) -Asp-Lys-Ala-Ala-Lys- A_chain(l-21) N-terminally fused to the 85 residues which make up the α-factor signal/leader peptide in which Leu in position 82 and Asp in position 83 have been substituted by Met and Ala, respectively, and Fig. 3 the DNA sequence of pAK623 encoding GLP-l₇__{36 a} N- terminally fused to the synthetic signal/leader sequence " YAP3/Sl_pAVA"

Fig. 4 the DNA sequence of pKV142 encoding B_chajn(l-29) -Ala- Ala-Arg-A_chajn(l-21) N-terminally fused to the 85 residues which make up the α-factor signal/leader peptide in which Leu in position 82 and Asp in position 83 have been substituted by Met and Ala, respectively.

Fig. 5 the DNA sequence of pAK679 encoding B_chain(l-29) -Ala- Ala-Lys-A_chaι-_n(l-21) N-terminally fused to the synthetic signal/leader sequence " YAP3/LA19"

Fig. 6 shows HPLC chromatogra s of culture supernatants containing the insulin precursor B_chajn(l-27)Asp-Lys- Ala-Ala-Lys-A_chain(l-21) with or without N-terminal extensions; and with or without in vivo or in vitro processing of the N-terminal extensions.

Fig. 7 shows the products according to figure 4 separated by size on a 10% Tricine-SDS-PAGE gel.

Fig. 8 shows the effect of the presence of YAP3 coexpression on the yield derived from the HPLC data in pJB176 compared to pJB64.

EXAMPLES

Plasmids and DNA All expressions plasmids are of the C-POT type (see figure 1), similar to those described in WO EP 171 142, which are characterized by containing the Schizosaccharomyces pombe triose phosphate iso erase gene (POT) for the purpose of plasmid selection and stabilization in S. cerevisiae. The plasmids furthermore contain the S. cerevisiae triose phosphate isomerase promoter (region 1 in figure 1) and terminator (region 4 in figure 1) . These sequences are identical to the corresponding sequences in plasmid pKFN1003 (described in WO 90/100075) as are all sequences except the sequence of the EcoRI-Xbal fragment encoding the signal/leader/product (region 2 and 3 in figure 1) . In order to express different heterologous proteins, the EcoRI-Xbal fragment of pKFN1003 is simply replaced by an EcoRI-Xbal fragment encoding the signal/leader/product of interest. Such EcoRI-Xbal fragments may be synthesized using synthetic oligonucleotides and PCR according to standard techniques (cf. Sambrook et al., 1989 supra) .

Figure 1 shows the general scheme used for the construction of plasmids containing genes expressing N-terminally extended polypeptides, the scheme including the following steps.

A sample of the C-POT plasmid vector is digested with restriction nucleases EcoRV and Xbal and the largest DNA fragment is isolated using standard molecular techniques (Sambrook J, Fritsch EL and Maniatis T, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1989) .

Another sample of C-POT plasmid (which may be the same or different from the plasmid mentioned above) is digested with restriction nucleases EcoRV and Ncol and the fragment comprising region 1 and 2 is isolated.

Polymerase Chain Reaction (PCR) is performed using the Gene Amp PCR reagent kit (Perkin Elmer, 761

Main Avewalk, CT 06859, USA) according to the manufacturer's instructions and the synthetic oligonucleotide primers Pl and P2 on a template (which may be the same or different from the plasmids mentioned above) encoding the heterological polypeptide of interest. Pl is designed in such a way that it includes a recognition site for restriction nuclease Ncol

(5^'CCATGG3^') followed by the sequences encoding a KEX2 processing site, the N-terminal extension and 10-15 nucleotides identical to the sequence encoding the original N-terminal of the heterologous protein of interest. P2 (e.g. 5^'-

AATTTATTTTACATAACACTAG-3 ^' ) is designed in such a way that it amplifies region 3 and the flanking recognition site for restriction nuclease Xbal (5^'- TCTAGA-3 ^' ) in PCRs with Pl using standard techniques described in Sambrook et al., supra.

The PCR product is digested with restriction nucleases Ncol and Xbal and the digested fragment is isolated.

The fragments isolated are ligated together by T4 DNA ligase under standard conditions (described in

Sambrook et al., supra.

The ligation mixture is used to transform competent E.coli cells ap^r" and selected for ampicillin resistance according to Sambrook et al., supra.

Plasmids are isolated from the resulting E.coli clones using standard molecular techniques (described in Sambrook et al., supra.

DNA Sequencing is performed using enzymatic chain termination (Sequenase, United States Biochemicals) according to the manufacturer's instructions in order to verify (or determine) the DNA sequences encoding the N-terminal extended polypeptide and to ensure that it is in frame with the DNA sequence encoding the signal/leader peptide of region 2 .

The plasmid is used to transform the yeast strain MT663 and selected for growth on glucose, as described in detail below:

Yeast transformation: S. cerevisiae strain MT663 (E2-7B XE11- 36 a/α, Δtpi/Δtpi, pep 4-3/pep 4-3) (the yeast strain MT663 was deposited in the Deutsche Sammlung von Mikroorganismen und Zellkulturen in connection with filing WO 92/11378 and was given the deposit number DSM 6278) was grown on YPGaL (1% Bacto yeast extract, 2% Bacto peptone, 2% galactose, 1% lactate) to an O.D. at 600 nm of 0.6.

100 ml of culture was harvested by centrifugation, washed with 10 ml of water, recentrifugated and resuspended in 10 ml of a solution containing 1.2 M sorbitol, 25 mM Na₂EDTA pH = 8.0 and 6.7 mg/ l dithiotreitol. The suspension was incubated at 30°C for 15 minutes, centrifuged and the cells resuspended in 10 ml of a solution containing 1.2 M sorbitol, 10 mM Na₂EDTA, 0.1 M sodium citrate, pH 0 5.8, and 2 mg Novozym^®234. The suspension was incubated at 30°C for 30 minutes, the cells collected by centrifugation, washed in 10 ml of 1.2 M sorbitol and 10 ml of CAS (1.2 M sorbitol, 10 mM CaCl₂, 10 mM Tris HCI (Tris = Tris(hydroxymethyl)aminomethane) pH = 7.5) and resuspended in 2 ml of CAS. For transformation, 1 ml of CAS-suspended cells was mixed with approx. 0.1 μg of plasmid DNA and left at room temperature for 15 minutes. 1 ml of (20% polyethylene glycol 4000, 10 mM CaCl₂, 10 mM Tris HCI, pH = 7.5) was added and the mixture left for a further 30 minutes at room temperature. The mixture was centrifuged and the pellet resuspended in 0.1 ml of SOS (1.2 M sorbitol, 33% v/v YPD, 6.7 mM CaCl₂) and incubated at 30°C for 2 hours. The suspension was then centrifuged and the pellet resuspended in 0.5 ml of 1.2 M sorbitol. Then, 6 ml of top agar (the SC medium of Sherman et al., Methods in Yeast Genetics, Cold Spring Harbor Laboratory (1982)) containing 1.2 M sorbitol plus 2.5% agar) at 52°C was added and the suspension poured on top of plates containing the same agar-solidified, sorbitol containing medium.

Example 1

Construction of pJB108 and pJB109

Plasmid pJB59 is a derivative of pKFN1003 in which the EcoRI-

Xbal fragment encodes the insulin precursor B_chal._n(l-27) -Asp- Lys-Ala-Ala-Lys-A_chaι-_n(l-21) N-terminally fused to a signal/leader sequence corresponding to the 85 residues of the α-factor prepro signal peptide in which Leu in position 82 and Asp in position 83 have been substituted by Met and Ala, respectively (Figure 2) . The EcoRI-Xbal fragment is synthesized in an applied biosystems DNA synthesizer (Perkin Elmer DNA Thermal Cycler) in accordance with the manufacturer's instructions.

Plasmid constructs designed to express N-terminally extended insulin precursor B_{C a}i_n(^1-27) -Asp-Lys-Ala-Ala-Lys-A_chajn(l-21) were obtained by means of a Pl-primer with the following sequence

Ncol 5* -GGGGTATCCATGGCTAAGAGAGAAGAAGCTGAAGCTGAAGCT(AC)CAAAGTTCGTTAACCAACAC-3' GlyLeuSerMetAlaLysArgGluGluAlaGluAlaGluAla Xaa LysPheValAsnGlnHis

the P2-primer: 5' -AATTTATTTTACATAACACTAG-3' and the plasmid pJB59 according to the general scheme described above. PCR and cloning resulted in a construct wherein the DNA sequence encoding the insulin precursor B_chain(l-27) -Asp-Lys-Ala-Ala- Lys-A_chaι-_n(l-21) is preceded by a DNA sequence encoding the N- terminal extension Glu-Glu-Ala-Glu-Ala-Glu-Ala-Xaa-Lys, where Xaa is either Pro (pJB108; Pro encoded by CCA) or Thr (pJB109; Thr encoded by ACA). Example 2

Construction of pJB44, PJB107 , and pJB126 Plasmid constructs designed to express additional N- terminally extended versions of the insulin precursor B_cha-_n(l- 27) -Asp-Lys-Ala-Ala-Lys-A_chain(l-21) were made by means of a Pl-primer with the following sequence

Ncol 5' -GGGGTATCCATGGCTAAGAGAGAAGAAGCTGAAGCTGAAGCTG(CAG)(AC)AAAGTTCGTTAACCAACAC-3' GlyLeuSerHetAlaLysArgGluGluAlaGluAlaGluAla Xaa LysPheValAsnGlnHis the P2-primer: 5' -AATTTATTTTACATAACACTAG-3' and the plasmid pJB59 as described in example 1. Among the resulting plasmids pJB44, pJB107 and pJB126 where isolated. These plasmids encode the insulin precursor B_chain(l-27) -Asp-Lys-Ala-Ala-Lys-A_chain(l-21) preceded by the N-terminal extension Glu-Glu-Ala-Glu-Ala-Glu- Ala-Xaa-Lys, where Xaa is either Glu (pJB44 ; Glu encoded by GAA) , Asp (pJB126; Asp encoded by GAC) or Gly (pJB107; Gly encoded by GGC) .

Example 3.

Construction of pJB64 and pJBHO Plasmid constructs designed to express N-terminally extended insulin precursor ^B _chajn(l-27) -Asp-Lys-Ala-Ala-Lys-A_chai-_n(l-21) were obtained by means of a Pl-primer with the following sequence

Ncol 5' -GGGGTATCCATGGCTAAGAGAGAAGAAGCTGAAGCTGAAGGC AOAAAGTTCGTTAACCAACAC-3' GlyLeuSerMetAlaLysArgGluGluAlaGluAlaGlu Xaa LysPheValAsnGlnHi s the P2-primer: 5' -AATTTATTTTACATAACACTAG-3' and the plasmid pJB59 as described in example 1. Among the resulting plasmids pJB64 and pJBHO were isolated. These plasmids encode N- terminal extensions of B_chain(l-27) -Asp-Lys-Ala-Ala-Lys-A_chain(l- 21) preceded by a DNA sequence encoding the N-terminal extension Glu-Glu-Ala-Glu-Ala-Glu-Xaa-Lys, where Xaa is either Glu (pJBHO; Glu encoded by GAA) or Ala (pJB64 ; Ala encoded by GCA) . Example 4

Construction of PAK663

Plasmid pAK623 is a derivative of pKNF 1003 in which the EcoRI- Xbal fragment encodes GLP-l_7.36Ala N-terminally fused to a synthetic signal leader sequence YAP3/Sl_pAVA (Fig. 3) .

The EcoRI-Xbal fragment was synthesized in an Applied Biosystems DNA synthesizer according to the manufacturer's instructions.

Plasmid constructs designed to express GLP-1_7.36ALA with the N- terminal extension in form of Glu-Glu-Ala-Glu-Ala-Glu-Ala-Glu- Arg was obtained by means of a Pl-primer with the following sequence

Ncol 5' -AACGTTGCCATGGCTCCAGCTCCAGCTAAGAGAGAAGAAGCTGAAGCTGAAGCTGAAAGACATGCTGAAGGT-3' AsnvalAlaMetAlaProAlavalAlaLysArgGluGluAlaGluAlaGluAlaGluArgHisAlaGluGly the P2-primer: 5' -AATTTATTTTACATAACACTAG-3' and the plasmid pAK623 according to the method described above resulting in plasmid pAK663.

Example 5

Expression of N-terminal extended products and the removal of the extensions

Yeast strain MT663 transformed with the C-POT plasmids described above (pJB59, pJB108, pJB109, pJB44, pJB126, pJB107, pJB64 and pJBllO) were grown on YPD (1% yeast extract, 2% peptone and 2% glucose) agar plates. Single colonies were used to start 5 ml liquid cultures in YPD broth pH = 6.0, which were shaken for 72 hours at 30°C. Yields of products were determined directly on culture supernatants by the method described by Snel, Damgaard and Mollerup, Chromatographia 24, 1987, pp. 329- 332. 23

The results with the yeast strains expressing N-terminally extended B_chain(l-27) -Asp-Lys-Ala-Ala-Lys-A_chajn(l-21) compared to the non-extended form (pJB59) are shown below

Plasmid N-terminal extension Yield pJB59 100% pJB108 Glu-Glu-Ala-Glu-Ala-Glu-Ala-Pro-Lys 275% pJB109 Glu-Glu-Ala-Glu-Ala-Glu-Ala-Thr-Lys 300% pJB44 Glu-Glu-Ala-Glu-Ala-Glu-Ala-Glu-Lys 325%* pJB126 Glu-Glu-Ala-Glu-Ala-Glu-Ala-Asp-Lys 300% pJB107 Glu-Glu-Ala-Glu-Ala-Glu-Ala-Gly-Lys 275% pJB64 Glu-Glu-Ala-Glu-Ala-Glu-Ala-Lys 250%* pJBllO Glu-Glu-Ala-Glu-Ala-Glu-Glu-Lys 225%*

Yields marked by (*) asterisk denotes that the product in these cases is a mixture of N-terminated extended B_chal-_n(l-27) -Asp-Lys- Ala-Ala-Lys-A_chajn(l-21) and non-extended B_chain(1-27) -Asp-Lys-Ala- Ala-Lys-A_chajn(l-21) , the latter being a result of the in vivo cleavage of the extension described below and illustrated in Figure 4a and 5.

In case of pAK663 expressing GLP-l_7.36Ala N-terminally extended by Glu-Glu-Ala-Glu-Ala-Glu-Ala-Arg the yield was found to be 20 fold higher than pAK623 expressing non-extended GLP-l₇._36Ala.

Example 6

Removal of N-terminal extensions in vivo

Culture supernatants obtained from cultures of yeast strains transformed with plasmid pJB59, pJB64, pJB44 and pJB108 (see above) were evaluated by HPLC chromatography (Figure 4a) and parallel samples were run on a 10% Tricine-SDS-PAGE gel (Figure 5, lanes 1, 2, 3 and 4) . From the HPLC chromatograms it appears that the culture supernatants from yeasts with plasmid pJB44 (chromatogram 3) and pJB64 (chromatogram 2) contain both the N- terminally extended as well as non-extended B_chaι-_n(l-27)Asp-Lys- Ala-Ala-Lys-A(l-21) , whereas culture supernatants from yeast with pJB108 chromatogram 4) only contain the N-terminally extended form. In case of pJB64 about 50% of the precursor is found in non-extended form (See also Fig. 5 lane 2) whereas this form only represent a minor part of pJB44.

These results illustrate the ability of yeast to cleave off N- terminal extensions selectively in vivo when the extension is either Glu-Glu-Ala-Glu-Ala-Glu-Ala-Lys (pJB64) or Glu-Glu-Ala- Glu-Ala-Glu-Ala-Glu-Lys (pJB44) and the inability of yeast to cleave off an N-terminal extension in the form of Glu-Glu-Ala- Glu-Ala-Glu-Ala-Pro-Lys (pJBlOδ) . The proteolytic activity responsible for cleaving off the extensions may be associated with enzymes in the secretory pathway such as membrane-bound YAP3 in the trans-Golgi system of the yeast cells.

Example 7

Removal of N-terminal extensions in vitro

The culture supernatants described above were used as substrates for proteolytic cleavage with either partially purified YAP3 enzyme isolated from yeast strain ME7δ3 overex- pressing YAP3 (Egel-Mitani et al. 1990) or Achromobacter lyti- cus protease I.

YAP3 assay was performed as follows: 4μl of YAP3 enzyme 800 μl of cell free growth media Samples were incubated for 15 h at 37"C in 0,1 M Na citrate buffer, pH 4.0

Achromobacter lvticus protease I assay performed as follows: lOμg A. lvticus protease I lml of cell free growth media Samples were incubated for 1 h at 37°C in 0,1 M Tris buffer, pH 8.75

Figure 4b and 4c show the results evaluated by HPLC chro¬ matography obtained from the YAP3 and A. lyticus protease I digestions, respectively. Parallel samples were run on 10% Tricine-SDS-PAGE (Fig. 5 lane 5-12) .

From the chromatograms of Fig. 4b it appears that the YAP3 enzyme is able to cleave off N-terminal extensions selectively when these are in form of Glu-Glu-Ala-Glu-Ala-Glu-Ala-Lys (pJB64) or Glu-Glu-Ala-Glu-Ala-Glu-Ala-Glu-Lys (pJB44) but not Glu-Glu-Ala-Glu-Ala-Glu-Ala-Pro-Lys (pJB108) (See also Fig. 5 lane 6, 7 and δ) . This results clearly indicates that YAP3 or YAP3-like enzyme(s) are responsible for the partial cleavage of N-terminal extensions seen in vivo (cf. Example 6) .

From the chromatograms of Fig 4c it appears that digestions with A. lvticus protease I in all cases result in the same product, namely B_chain(1-27) -Asp-Lys-(connected by disulfide bonds)-A_chaι-_n(1-21) which is the end result of proteolytic cleavage after all Lys-residues found in the B_chajn(l-27)Asp-Lys- ala-Ala-Lys-A_chain(l-21) insulin precursor including those found between the B and A chain in the precursor.

In the case of the digestion of the culture supernatant from yeast transformed with pJB59 (expressing the non-extended B_chal._n(1-27)Asp-Lys-Ala-Ala-Lys-A_chaι-_n(1-21) ) a product is seen which does not appear in the other digestions. This product, Arg-B_chajn(1-27) -Asp-Lys-(connected by disulfide bonds) -A_chain(l- 21) , results from A. lyticus protease I cleavage of secreted leader-precursor in the growth media. A. lyticus protease I cleaves between the Lys and Arg residues in the dibasic KEX2 site of the product which has escaped KEX2 cleavage in the secretory pathway of the yeast cells. Example δ

Removal of N-terminal extensions by over-expressing YAP3

The YAP3 gene cloned by Egel-Mitani et al. (Yeast 6 (1990) pp. 127-137) was inserted into the C-POT plasmid pJB64 encoding Glu-Glu-Ala-Glu-Ala-Glu-Ala-Lys-B_chain(l-27) -Pro-Lys-Ala-Ala-Lys- A_chajn(l-21) (see example 3) in the following way:

The 2.5 kb Sall/SacI fragment containing the YAP3 gene was isolated from plasmid pME76δ (Egel-Mitani et al. Yeast 6_. (1990) pp. 127-137) and inserted into the Sail and SacI site of plasmid pIC19R (March et al. Gene 32 (19δ4) pp. 4δl-4δ5) . From the resulting plasmid designated pME834 a 2.5 kb Sall/Xhol fragment containing the YAP3 gene was isolated and inserted into the unique Sail site placed between the POT and the Ap^R sequences of pJB64. One resulting plasmid was designated pJB176.

Yeast strain MT663 was transformed with pJB176 and analyzed as described in example 5.

As can be seen from the HPLC data on yield shown in Figure 8 the presence of the YAP3 gene in pJB176 clearly effects a higher percent of non-extended B_chain(l-27) -Pro-Lys-Ala-Ala-Arg- A_chain(l-21) in the culture-supernatant of the corresponding yeast transformants compared to the yeast transformant of pJB64.

This result illustrate the ability to make yeast strains with an enhanced capacity to cleave off N-terminal extensions selectively in vivo by manipulating the level of proteolysis caused by YAP3. Example 9

Construction of PKV143

Plasmid pKV142 is a derivative of pKFN1003 in which the EcoRI- Xbal fragment encodes the insulin precursor B_chain(l-29)-Ala-Ala- Arg-A_chaι._n(l-21) N-terminally fused to a signal/leader sequence corresponding to the 85 residues of the α-factor prepro signal peptide in which Leu in position δ2 and Asp in position δ3 have been substituted by Met and Ala, respectively (Figure 4) .

Plasmid constructs designed to express B_chain(l-29)-Ala-Ala-Arg- A_chaι-_n(l-21) with a N-terminal extension in form Asp-Asp-Ala-Asp- Ala-Asp-Ala-Asp-Pro-Arg was obtained by means of a Pl-primer with the following sequence

Ncol 5' -GGGGTATCCATGGCTAAGAGAGACGACGCTGACGCTGACGCTGACCCAAGATTCGTTAACCAACACTTGTGCGG-3' GlyLeuSerMetAlaLysArgAspAspAIaAspAlaAspAIaAspProArgPheVaIAsnGInHisLeuCys

the P2-primer: 5' -AATTTATTTTACATAACACTAG-3' and the plasmid pKV142 according to the general scheme described above.

Example 10

Construction of pKV102

Plasmid constructs designed to express B_cha1n(l-29)-Ala-Ala-Arg- A_cha1n(l-21) with a N-terminal extension in form Glu-Glu-Ala-Glu- Ala-Glu-Ala-Glu-Pro-Lys-Ala-Thr-Arg was obtained by means of a Pl-primer with the following sequence

Ncol

5' -GGGGTATCCATGGCTAAGAGAGAAGAAGCTGAAGCTGAAGCTGAACCAAAGGCTACAAGATTCGTTAACCAACACTTGTGCGG-3' GtyLeuSerMetAlaLysArgGluGluAlaGluAlaGluAlaGluProLysAlaThrArgPheValAsnGlnHisLeuCys

the P2-primer: 5' -AATTTATTTTACATAACACTAG-3' and the plasmid pKV142 according to the general scheme described above. 2 δ

Example 11

Expression of N-terminal extended B_chj1-_πf1-29) -Ala-Ala-Arg- Δchal l-21)

Yeast strain MT663 transformed with the C-POT plasmids pKV142 pKV143 and pKV102 and analyzed as described in example 5.

The results with the yeast strains expressing N-terminally extended B_chain(l-29) -Ala-Ala-Arg-A_cha-_n(l-21) compared to the non- extended form (pKV142) are shown below

Plasmid N-terminal extension Yield pKV142 100% pKV143:Asp-Asp-Ala-Asp-Ala-Asp-Ala-Asp-Pro-Arg 263% pKV102:Glu-Glu-Ala-Glu-Ala-Glu-Ala- Glu-Pro-Lys-Ala-Thr-Arg 400%

Example 12

Construction and expression of pIM69 and pIM70

Plasmid pAK679 is a derivative of pKFN1003 in which the EcoRI- Xbal fragment encodes the insulin precursor B_chain(l-29) -Ala-Ala- Lys-A_cha-_n(l-21) N-terminally fused to a synthetic signal/leader sequence YAP3/LA19 (Figure 5) .

Plasmid constructs designed to express N-terminally extended insulin precursor B_chain(l-29) -Ala-Ala-Lys-A_chajn(l-21) were obtained by a procedure involving two successive PCR reaction. The first PCR reaction was performed by means of the primer with the following sequence

5'-GAAGAAGAAGAAGAAGAACCAAAGTTCGTTAACCAACAC-3' GluGluGluGluGluGluProLysPheValAsnGlnHis the P2-primer 5' -AATTTATTTTACATAACACTAG-3' and the plasmid pAK679.

The second PCR reaction was performed by means of the Pl-primer

Ncol

5^* -GTTGTTAACTTGATCTCCATGGCTAAGAGAGAAGAA-3' ValValAsnLeul leSerMetAlaLysArgGluGLu

the P2-primer 5' -AATTTATTTTACATAACACTAG-3' using the PCR product of the first PCR reaction as the DNA-template for the second PCR reaction.

The PCR product of the second PCR reaction was cut with Ncol and Xbal and ligated into pAK679 according to the general scheme described above. Among the resulting plasmids two where identified to encode N-terminal extensions of B_chajn(l-29) -Ala- Ala-Lys-A_chajn(l-21) in form of Glu-Glu-Glu-Pro-Lys (pIM70) and Glu-Glu-Glu-Glu-Pro-Lys (pIM69) respectively.

Yeast strain MT663 was transformed with the C-POT plasmids pAK579, pIM69 and pIM70 and analyzed as described in example 5.

Whereas the yield of non-extended B_chain(l-29) -Ala-Ala-Lys- A_chain(l-21) from yeast with pAK579 was found to be practicably nothing, yeast with pIM69 and pIM70 was found to produce large quantity of Glu-Glu-Glu-Glu-Pro-Lys-B_chain(l-29) -Ala-Ala-Lys- A_chain(l-21) and Glu-Glu-Glu-Pro-Lys-B_chain(l-29)-Ala-Ala-Lys- A_chain(l-21) respectively. SEQUENCE LISTING

(1) GENERAL INFORMATION:

(i) APPLICANT:

(A) NAME: Novo Nordisk A/S

(B) STREET: Novo Alle

(C) CITY: Bagsvaerd

(E) COUNTRY: Denmark

(F) POSTAL CODE (ZIP) : 2δδ0

(G) TELEPHONE: +45 4444 δδδδ (H) TELEFAX: +45 4449 3256 (I) TELEX: 37304

(ii) TITLE OF INVENTION: N-Terminally Extended Proteins Expressed in Yeast

(iii) NUMBER OF SEQUENCES: 52

(iv) COMPUTER READABLE FORM:

(A) MEDIUM TYPE: Floppy disk

(B) COMPUTER: IBM PC compatible

(C) OPERATING SYSTEM: PC-DOS/MS-DOS

(D) SOFTWARE: Patentin Release #1.0, Version #1.25 (EPO)

(2) INFORMATION FOR SEQ ID NO: 1:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 6 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: Saccharomyces cerevisiae

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1:

Lys Arg Glu Ala Glu Ala 1 5

(2) INFORMATION FOR SEQ ID NO: 2:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 13 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTT-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2:

Glu Glu Ala Glu Ala Glu Xaa Xaa Xaa Arg Ala Pro Arg 1 5 10

(2) INFORMATION FOR SEQ ID NO: 3:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 11 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTT-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3:

Glu Glu Ala Glu Ala Glu Pro Lys Ala Xaa Arg 1 5 10

(2) INFORMATION FOR SEQ ID NO: 4:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4:

Glu Glu Ala Glu Ala Glu Ala Glu Pro Arg 1 5 10

(2) INFORMATION FOR SEQ ID NO: 5:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5:

Glu Glu Ala Glu Ala Glu Ala Glu Pro Lys 1 5 10

(2) INFORMATION FOR SEQ ID NO: 6:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6:

Glu Glu Ala Glu Ala Glu Ala Glu Arg 1 5

(2) INFORMATION FOR SEQ ID NO: 7:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7:

Glu Glu Ala Glu Ala Glu Ala Xaa Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 8:

(i) SEQUENCE OIARACΓERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8:

Glu Glu Ala Glu Ala Glu Ala Xaa Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 9:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: δ amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: Glu Glu Ala Glu Ala Glu Ala Arg 1 5

(2) INFORMATION FOR SEQ ID NO: 10:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: δ amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10:

Glu Glu Ala Glu Ala Glu Xaa Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 11:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 7 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11:

Glu Glu Ala Glu Ala Pro Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 12:

(i) SEQUENCE (-HARACTERISTICS:

(A) LENGTH: 5 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12:

Glu Glu Ala Pro Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 13:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 4 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13:

Glu Glu Pro Lys

1

(2) INFORMATION FOR SEQ ID NO: 14:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 22 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

( i) SEQUENCE DESCRIPTION: SEQ ID NO: 14: AATΓTATΠT _ACΆTAACACT AG 22

(2) INFORMATION FOR SEQ ID NO: 15: (I) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 63 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: GGGGTATCCA TGGCTAAGAG AGAAGAAGCT GAAGCTGAAG CTNCAAAGTT CGTTAACCAA 60 CAC 63

(2) INFORMATION FOR SEQ ID NO: 16:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 64 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: GGGGTATCCA TGGCTAAGAG AGAAGAAGCT GAAGCTGAAG CTGNNAAAGT TCGTTAACCA 60 ACAC 64

(2) INFORMATION FOR SEQ ID NO: 17:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 61 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: GGGGTATCCA TGGCTAAGAG AGAAGAAGCT GAAGCTGAAG GNAAAGTTCG TTAACCAACA 60 C 61

(2) INFORMATION FOR SEQ ID NO: 18:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 72 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: AACGTTGCCA TGGCTCCAGC TCCAGCTAAG AGAGAAGAAG CTGAAGCTGA AGCTGAAAGA 60 CATGCTGAAG GT 72

(2) INFORMATION FOR SEQ ID NO: 19:

(i) SEQUENCE CHARACTERISTICS: ,

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE: (A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19:

Glu Glu Ala Glu Ala Glu Ala Pro Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 20:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20:

Glu Glu Ala Glu Ala Glu Ala Thr Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 21:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21:

Glu Glu Ala Glu Ala Glu Ala Glu Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 22:

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 9 amino acids (B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPπON: SEQ ID NO: 22:

Glu Glu Ala Glu Ala Glu Ala Asp Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 23:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23:

Glu Glu Ala Glu Ala Glu Ala Gly Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 24:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 8 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic 40

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24:

Glu Glu Ala Glu Ala Glu Ala Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 25:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 8 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25:

Glu Glu Ala Glu Ala Glu Glu Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 26:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 9 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26:

Glu Glu Ala Glu Ala Glu Ala Pro Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 27:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 13 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27:

Glu Glu Ala Glu Ala Glu Ala Glu Pro Lys Ala Thr Arg 1 5 10

(2) INFORMATION FOR SEQ ID NO: 2δ:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: δ amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTT-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2δ:

Glu Glu Ala Glu Ala Glu Ala Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 29:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 5 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) HYPOTHETICAL: NO

(iii) ANTT-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: Glu Glu Ala Pro Lys 1 5

(2) INFORMATION FOR SEQ ID NO: 30:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 660 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA

(iii) HYPOTHETICAL: NO

(iii) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 127..540

(ix) FEATURE:

(A) NAME/KEY: sig_peptide

(B) LOCATION: 127..381

(ix) FEATURE:

(A) NAME/KEY: matjpeptide

(B) LOCATION: 382..540

(xi) SEQUENCE DESCRIPπON: SEQ ID NO: 30:

GTTTGTATTC TTITCTTGCT TAAATCTATA ACTACAAAAA ACACATACAG GAATTCCATT 60

CAAGAATAGT TCAAACAAGA AGATTACAAA CTATCAATTT CATACACAAT ATAAAOGATT 120

AAAAGA ATG AGA TTT CCT TCA ATT TTT ACT GCA GTT TTA TTC GCA GCA 168 Met Arg Phe Pro Ser He Phe Thr Ala Val Leu Phe Ala Ala -85 -80 -75

TCC TCC GCA TTA GCT GCT CCA GTC AAC ACT ACA ACA GAA GAT GAA ACG 216 Ser Ser Ala Leu Ala Ala Pro Val Asn Thr Thr Thr Glu Asp Glu Thr -70 -65 -60

GCA CAA ATT CCG GCT GAA GCT GTC ATC GGT TAC TCA GAT TTA GAA GGG 264 Ala Gin He Pro Ala Glu Ala Val He Gly Tyr Ser Asp Leu Glu Gly -55 -50 -45 -40

GAT TTC GAT GTT GCT GTT TTG CCA TTT TCC AAC AGC ACA AAT AAC GGG 312 Asp Phe Asp Val Ala Val Leu Pro Phe Ser Asn Ser Thr Asn Asn Gly -35 -30 -25 TTA TTG TTT ATA AAT ACT ACT ATT GCC AGC ATT GCT GCT AAA GAA GAA 360 Leu Leu Phe He Asn Thr Thr He Ala Ser He Ala Ala Lys Glu Glu -20 -15 -10

GGG GTA TCC ATG GCT AAG AGA TTC GTT AAC CAA CAC TTG TGC GGT TCC 408 Gly Val Ser Met Ala Lys Arg Phe Val Asn Gin His Leu Cys Gly Ser -5 1 5

CAC TTG GTT GAA GCT TTG TAC TTG GTT TGC GGT GAA AGA GGT TTC TTC 456 His Leu Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe 10 15 20 25

TAC ACT GAC AAG GCT GCT AAG GGT ATC GTT GAA CAA TGC TGT ACC TCC 504 Tyr Thr Asp Lys Ala Ala Lys Gly He Val Glu Gin Cys Cys Thr Ser 30 35 40

ATC TGC TCC TTG TAC CAA TTG GAA AAC TAC TGC AAC TAGACGCAGC 550

He Cys Ser Leu Tyr Gin Leu Glu Asn Tyr Cys Asn 45 50

CXDGCAGGCTC TAGAAACTAA GATTAATATA ATTATATAAA AATATTATCT TCTTTTCTTT 610

ATATCTAGTG TTATGTAAAA TAAATTGATG ACTACGGAAA GCTAGCTTTT 660

(2) INFORMATION FOR SEQ ID NO: 31:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 138 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31:

Met Arg Phe Pro Ser He Phe Thr Ala Val Leu Phe Ala Ala Ser Ser -δ5 -δO -75 -70

Ala Leu Ala Ala Pro Val Asn Thr Thr Thr Glu Asp Glu Thr Ala Gin -65 -60 -55

He Pro Ala Glu Ala Val He Gly Tyr Ser Asp Leu Glu Gly Asp Phe -50 -45 -40

Asp Val Ala Val Leu Pro Phe Ser Asn Ser Thr Asn Asn Gly Leu Leu -35 -30 -25

Phe He Asn Thr Thr He Ala Ser He Ala Ala Lys Glu Glu Gly Val -20 -15 -10

Ser Met Ala Lys Arg Phe Val Asn Gin His Leu Cys Gly Ser His Leu . -5 1 5 10 Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr 15 20 25

Asp Lys Ala Ala Lys Gly He Val Glu Gin Cys Cys Thr Ser He Cys 30 35 40

Ser Leu Tyr Gin Leu Glu Asn Tyr Cys Asn 45 50

(2) INFORMATION FOR SEQ ID NO: 32:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 516 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA

(iii) HYPOTHETICAL: NO

(iii) ANTT-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: synthetic

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 133..411

(ix) FEATURE:

(A) NAME/KEY: sig_peptide

(B) LOCATION: 133..321

(ix) FEATURE:

(A) NAME/KEY: mat_peptide

(B) LOCATION: 322..411

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32:

TGTITGTATT CTTTTCTIGC TTAAATCTAT AACTACAAAA AACACATACA GGAATTCCAT 60

TCAAGAATAG TTCAAACAAG AAGATTACAA ACTATCAATT TCATACACAA TATAAACGAC 120

GGTACCAAAA TA ATG AAA CTG AAA ACT GTA AGA TCT GCG GTC CTT TOG 16δ Met Lys Leu Lys Thr Val Arg Ser Ala Val Leu Ser -63 -60 -55

TCA CTC TTT GCA TCT CAG GTC CTT GGC CAA CCA ATT GAC GAC ACT GAA 216 Ser Leu Phe Ala Ser Gin Val Leu Gly Gin Pro He Asp Asp Thr Glu -50 -45 -40 TCT AAC ACT ACT TCT GTC AAC TTG ATG GCT GAC GAC ACT GAA TCT ATC 264 Ser Asn Thr Thr Ser Val Asn Leu Met Ala Asp Asp Thr Glu Ser He -35 -30 -25 -20

AAC ACT ACT TTG GTT AAC TTG GCT AAC GTT GCC ATG GCT CCA GCT CCA 312 Asn Thr Thr Leu Val Asn Leu Ala Asn Val Ala Met Ala Pro Ala Pro -15 -10 -5

GCT AAG AGA CAT GCT GAA GGT ACC TTC ACC TCT GAC GTC TOG AGT TAC 360 Ala Lys Arg His Ala Glu Gly Thr Phe Thr Ser Asp Val Ser Ser Tyr 1 5 10

TTG GAA GGC CAA GCT GCT AAG GAG TTC ATC GCT TGG TTG GTT AAG GGC 40δ Leu Glu Gly Gin Ala Ala Lys Glu Phe He Ala Trp Leu Val Lys Gly 15 20 25

GCT TAGTCTAGAA ACTAAGATTA ATATAATTAT ATAAAAATAT TATCTTCTTT 461

Ala 30

TCTTTATATC TAGTGTTATG TAAAATAAAT TGATGACTAC GGAAAGCTAG CTTTT 516

(2) INFORMATION FOR SEQ ID NO: 33:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 93 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPHON: SEQ ID NO: 33:

Met Lys Leu Lys Thr Val Arg Ser Ala Val Leu Ser Ser Leu Phe Ala -63 -60 -55 -50

Ser Gin Val Leu Gly Gin Pro He Asp Asp Thr Glu Ser Asn Thr Thr -45 -40 -35

Ser Val Asn Leu Met Ala Asp Asp Thr Glu Ser He Asn Thr Thr Leu -30 -25 -20

Val Asn Leu Ala Asn Val Ala Met Ala Pro Ala Pro Ala Lys Arg His -15 -10 -5 1

Ala Glu Gly Thr Phe Thr Ser Asp Val Ser Ser Tyr Leu Glu Gly Gin 5 10 15

Ala Ala Lys Glu Phe He Ala Trp Leu Val Lys Gly Ala 20 25 30

(2) INFORMATION FOR SEQ ID NO:34: (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 10 amino acids ^•

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (v) FRAGMENT TYPE: internal

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34:

Asp Asp Ala Asp Ala Asp Ala Asp Pro Arg 1 5 10

(2) INFORMATION FOR SEQ ID NO:35:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 6 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35:

Glu Glu Glu Glu Pro Lys 1 5

(2) INFORMATION FOR SEQ ID NO:36:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 5 amino acids

(B) TYPE: airdno acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36:

Glu Glu Glu Pro Lys 1 5

(2) INFORMATION FOR SEQ ID NO:37:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 6 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

( i) SEQUENCE DESCRIPTION: SEQ ID NO:37:

Asp Asp Asp Asp Asp Lys 1 5

(2) INFORMATION FOR SEQ ID NO:3δ:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 61 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (v) FRAGMENT TYPE: internal 4 δ

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3δ:

Glu Glu Ala Glu Ala Glu Ala Lys Phe Val Asn Gin His Leu Cys Gly 1 5 10 15

Ser His Leu Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Phe 20 25 30

Phe Tyr Thr Pro Lys Ala Ala Lys Gly He Val Glu Gin Cys Cys Thr 35 40 45

Ser He Cys Ser Leu Tyr Gin Leu Glu Asn Tyr Cys Asn 50 55 60

(2) INFORMATTON FOR SEQ ID NO:39:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 53 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39:

Phe Val Asn Gin His Leu Cys Gly Ser His Leu Val Glu Ala Leu Tyr 1 5 10 15

Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Ala Ala Arg 20 25 30

Gly He Val Glu Gin Cys Cys Thr Ser He Cys Ser Leu Tyr Gin Leu 35 40 45

Glu Asn Tyr Cys Asn 50

(2) INFORMATION FOR SEQ ID N0:40:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40:

Asp Asp Ala Asp Ala Asp Ala Asp Pro Arg 1 5 10

(2) INFORMATION FOR SEQ ID NO: 41:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 74 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (V) FRAGMENT TYPE: internal

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: GGGGTATCCA TGGCTAAGAG AGACGAOGCT GACGCTGACG CTGACCCAAG ATTCGTTAAC 60 CAACΆCTTGT GCGG 74

(2) INFORMATION FOR SEQ ID NO:42:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 13 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (v) FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42:

Glu Glu Ala Glu Ala Glu Ala Glu Pro Lys Ala Thr Arg 1 5 10

(2) INFORMATION FOR SEQ ID NO:43:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: δ3 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(xi) SEQUENCE DESCRIPHON: SEQ ID NO:43: GGGGTATCCA TGGCTAAGAG AGAAGAAGCT GAAGCTGAAG CTGAACCAAA GGCTACAAGA 60 TTCGTTAACC AACΆCTTGTG CGG δ3

(2) ΓNFORMATION FOR SEQ ID NO:44:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 53 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTT-SENSE: NO (v) FRAGMENT TYPE: internal

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44:

Phe Val Asn Gin His Leu Cys Gly Ser His Leu Val Glu Ala Leu Tyr 1 5 10 15

Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Ala Ala Lys 20 25 30 Gly He Val Glu Gin Cys Cys Thr Ser He Cys Ser Leu Tyr Gin Leu 35 40 45

Glu Asn Tyr Cys Asn 50

(2) INFORMATION FOR SEQ ID NO:45:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 39 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: GAAGAAGAAG AAGAAGAACC A GTTCGTT AACCAACAC 39

(2) INFORMATION FOR SEQ ID NO:46:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 36 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: GTTGTTAACT TGATCTCCAT GGCTAAGAGA GAAGAA 36

(2) INFORMATION FOR SEQ ID NO:47:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 59 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: peptide (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO (v) FRAGMENT TYPE: internal

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47:

Glu Glu Glu Glu Pro Lys Phe Val Asn Gin His Leu Cys Gly Ser His 1 5 10 15

Leu Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Hie Hie Tyr 20 25 30

Thr Pro Lys Ala Ala Lys Gly He Val Glu Gin Cys Cys Thr Ser He 35 40 45

Cys Ser Leu Tyr Gin Leu Glu Asn Tyr Cys Asn 50 55

(2) TNFORMATTON FOR SEQ ID NO:4δ:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 5δ amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4δ:

Glu Glu Glu Pro Lys Phe Val Asn Gin His Leu Cys Gly Ser His Leu 1 5 10 15

Val Glu Ala Leu Tyr Leu Val Cys Gly Glu Arg Gly Hie Hie Tyr Thr 20 25 30

Pro Lys Ala Ala Lys Gly He Val Glu Gin Cys Cys Thr Ser He Cys 35 40 45 Ser Leu Tyr Gin Leu Glu Asn Tyr Cys Asn 50 55

(2) INPORMATTON FOR SEQ ID NO:49:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 10 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: peptide

(iii) rlYPOTHETTCAL: NO

(iv) ANTI-SENSE: NO

(v) FRAGMENT TYPE: internal

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49:

Glu Glu Ala Glu Ala Glu Ala Glu Pro Lys 1 5 10

(2) INFORMATION FOR SEQ ID NO:50:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 660 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA

(iii) HYPOTHETICAL: NO

(iv) ANTI-SENSE: NO

(v) FRAGMENT TYPE: internal

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:50:

GTTTGTATTC TTITCTTGCT TAAATCTATA ACTACAAAAA ACACATACAG GAATTCCATT 60

CAAGAATAGT TCAAACAAGA AGATTACAAA CTATCAATTT CATACACAAT ATAAAOGATT 120

AAAAGAATGA GATTTCCTTC AATTTTTACT GCAGTTTTAT TCGCAGCATC CTCOGCATTA lδO

GCTGCTCCAG TCAACACTAC AACAGAAGAT GAAACGGCAC AAATTCCGGC TCAAGCTGTC 240

ATOGGTTACT CAGATTTAGA AGGGGATTTC GATGTTGCTG TTITGCCATT TTCCAACAGC 300 ACAAATAAOG GGTTATTGTT TATAAATACT ACTATTGCCA GCATTGCTGC TAAAGAAGAA 360

GGGGTATCCA TGGCTAAGAG ATTCGTTAAC CΑACACTTGT GCGGTTCCCA CTΓGGTTGAA 420

GCTTTGTACT TGGTTTGCGG TGAAAGAGGT TTCTTCTACA CTCCTAAGGC TGCTAGAGGT 460

ATTGTOGAAC AATGCTGTAC CTCX^TCTGC TCXTTGTACC AATTGGAAAA CTACTGCAAC 540

TAGACGCAGC CπSCAGGCTC TAGAAACTAA GATTAATATA ATTATATAAA AATATTATCT 600

TCTITrciTT ATATCTΆGTG TTATGTAAAA TAAATTGATG ACTACGGAAA GCTAGCΓTTT 660

(2) INFORMATTON FOR SEQ ID NO: 51:

(i) SEQUENCE CHARACTERISTICS :

(A) LENGTH: 600 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51 :

TGTTTGTATT CTITTCTTGC TTAAATCTAT AACTACAAAA AACACATACA GGAATTCCAT 60

TCAAGAATAG TTCAAACAAG AAGATTACAA ACTATCAATT TCATACACAA TATAAACGAC 120

GGTACCAAAA TAATGAAACT GAAAACTGTA AGATCTGOGG TCCTTTCGTC ACTCTTTGCA lδO

TCTCAGGTCC TTGGCCAACC AATTGACGAC ACTGAATCTA ACACTACTTC TGTCAACITG 240

ATGGCTGACG ACACTGAATC TAGATTCGCT ACTAACACTA CTTTGGCTTT GGATGTTGTT 300

AACTTGATCT CCATGGCTAA GAGATTCGTT AACCAACACT TGTGCGGTTC CX^CTTGGTT 360

GAAGCTTTGT ACTIGGTTTG OGGTGAAAGA GGTTICTTCT ACACTCCTAA GGCTGCTAAG 420

GCTATTGTCG AACAATGCTG TACCTCCATC TGCTCCTTGT ACCAATTGGA AAACTACTGC 480

AACTAGAOGC AGCCCGCAGG CTCTAGAAAC TAAGATTAAT ATAATTATAT AAAAATATTA 540

TCTTCTTTTC TTTATATCTA GTGTTATGTA AAATAAATTG ATGACTACGG AAAGCTAGCT 600

(2) INFORMATION FOR SEQ ID NO: 52 : (i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 55 amino acids

(B) TYPE: amino acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52:

Phe Val Asn Gin His Leu Cys Gly Ser His Leu Val Glu Ala Leu Tyr 1 5 10 15

Leu Val Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Ser Asp Asp 20 25 30

Ala Arg Gly He Val Glu Gin Cys Cys Thr Ser He Cys Ser Leu Tyr 35 40 45

Gin Leu Glu Asn Tyr Cys Asn 50 55

Claims

1. A DNA construct encoding polypeptide having the following structure signal peptide-leader peptide-X¹-X²-X³-X⁴-X⁵-X⁶-X⁷-heterologous protein

X¹ is Lys or Arg;

X² is Lys or Arg, X¹ and X² together defining a yeast processing site;

X³ is Glu or Asp X⁴ is a sequence of amino acids with the following structure

(A " B)_n wherein

A is Glu or Asp, B is Ala, Val, Leu or Pro, and n is 0 or an integer from 1 to 5, and when n>2 each A and B is the same or different from the other A(ε) and B(s) , or X⁴ is a sequence of amino acids with the following structure

(^C)m wherein

C is Glu or Asp, and m is 0 or an integer from 1 to 5; X⁵ is a peptide bond or is one or more amino acids which may be the same or different; X⁶ is a peptide bond or an amino acid residue selected from the group consisting of Pro, Asp, Thr, Ser, Glu, Ala and Gly; and X⁷ is Lys or Arg.

2. A DNA construct according to claim 1, wherein X³ is Glu.

3. A DNA construct according to claim 1, wherein A is Glu.

4. A DNA construct according to claim 1, wherein B is Ala

5. A DNA construct according to claim 1, wherein n is preferably 2-4, more preferably 3.

6. A DNA construct according to claim 1, wherein X⁵ is a peptide bond, Glu, Asp or Glu-Pro-Lys-Ala.

57. A DNA construct according to claim 1, wherein X⁶ is Pro, or a peptide bond.

8. A DNA construct according to claim 1, wherein the signal peptide is the α-factor signal peptide, the yeast aspartic protease 3 signal peptide, of mouse salivary amylase, the

10 carboxypeptidase signal peptide, or the yeast BARl signal peptide.

9. A DNA construct according to claim 1, wherein the leader peptide is a natural leader peptide such as the α-factor leader peptide, or a synthetic leader peptide.

1510. A DNA construct according to claim 1, wherein the heterolo¬ gous protein is selected from the group consisting of aprotinin, tissue factor pathway inhibitor or other protease inhibitors, insulin-like growth factor I or II, human or bovine growth hormone, interleukin, tissue plasminogen activator,

20 glucagon, glucagon-like peptide-1, Factor VII, Factor VIII, Factor XIII, platelet-derived growth factor, enzymes, insulin or an insulin precursor, and a functional analogue of any of these proteins.

11. A DNA construct according to claim 10, wherein the hetero- 25 logous protein is insulin or an insulin precursor or a functional analogue thereof.

12. A DNA construct according to any of claims 1-11, wherein the amino acid sequence X³-X⁷ is

Glu Glu Ala Glu Ala Glu Ala Pro Lys, 30 Glu Glu Ala Glu Ala Glu Ala Glu Pro Lys Ala Thr Arg, Glu Glu Ala Glu Ala Glu Ala Lys,

Glu Glu Ala Pro Lys,

Glu Glu Ala Glu Ala Glu Ala Glu Pro Lys,

Asp Asp Ala Asp Ala Asp Ala Asp Pro Arg,

Glu Glu Glu Glu Pro Lys,

Glu Glu Glu Pro Lys or

Asp Asp Asp Asp Asp Lys.

13. A recombinant expression vector which is capable of re¬ plicating in yeast and which carries a DNA construct according to claim 12.

14. A yeast strain which is capable of expressing a heterolo¬ gous protein and which is transformed with a vector according to claim 13.

15. A process for producing a heterologous protein, comprising cultivating a yeast cell according to claim 14 in a suitable medium to obtain expression and secretion of the heterologous protein, after which the protein is isolated from the culture medium.

16. A process according to claim 15, wherein the amino acid sequence X³-X⁷ is removed from the recovered heterologous protein by a process involving treatment with a proteolytic enzyme which is specific for a basic amino acid.

17. A process according to claim 16, wherein the proteolytic enzyme is selected from the group consisting of trypsin, Achromobacter lvticus protease I, Enterokinase, Fusarium oxysporum trypsin-like protease, and YAP3.

18. A process according to claim 16, wherein the amino acid sequence X³-X⁷, wherein X⁶ is a peptide bond, Ala, or Glu, is removed from the heterologous protein in the medium by a process involving subjecting the yeast cells to stress to make them release yeast aspartic protease-3 into the medium, whereby the amino acid sequence X³-X⁷ is cleaved off.

19. A process according to claim 18, wherein the stress to which the yeast cell are subjected comprises reducing the pH of

5 the culture medium to below 6.0, or starving the yeast cells by subjecting them to limiting growth conditions.

20. A process according to claim 19, wherein the pH of the culture medium is reduced to below 5.

21. A process according to claim 15, wherein the yeast cell is 10 further transformed with one or more genes encoding a protease which is specific for basic amino acid residues so that, on cultivation of the cell the gene or genes are expressed, the consequent production of protease ensuring a more complete cleavage of X³ - X⁷ from the heterologous polypeptide.

1522. A process according to claim 21, wherein the gene coding for the protease is a gene coding for YAP3 so that, on cultivation of the cell the YAP3 gene or genes are overexpressed, the consequent overproduction of YAP3 ensuring a more complete cleavage of X³-X⁷ from the heterologous

20 polypeptide.

23. A process according to claim 21, wherein the gene coding for the protease codes for A.lvticus protease I or trypsin.