JP2009075737A

JP2009075737A - Semi-supervised learning method, device, and program

Info

Publication number: JP2009075737A
Application number: JP2007242419A
Authority: JP
Inventors: Norihito Teramoto; 礼仁寺本
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-09-19
Filing date: 2007-09-19
Publication date: 2009-04-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a general-purpose, practical and high-accuracy classifier having a small number of learning parameters, and allowing application of supervised learning as a lower learning machine. <P>SOLUTION: Training data are stored. Unlabeled data and test data are stored. The supervised learning is performed by the training data. A gradient is stored. The training data, the unlabeled data, and the test data are combined, and the supervised learning is repeatedly performed. A learned discriminant function is stored. A label of the test data is predicted by use of the discriminant function. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、アンサンブル学習を用いてラベル有りデータだけでなくラベル無しデータも学習することによる予測精度の高い半教師あり学習方法、半教師あり学習装置及び半教師あり学習プログラムに関する。 The present invention relates to a semi-supervised learning method, a semi-supervised learning apparatus, and a semi-supervised learning program with high prediction accuracy by learning not only labeled data but also unlabeled data using ensemble learning.

ラベル有りデータを訓練データとして学習機械に学習させたうえで、テストデータのラベルを予測する学習方式は、教師あり学習と呼ばれている。 A learning method for predicting the label of the test data after causing the learning machine to learn the labeled data as training data is called supervised learning.

教師あり学習の方法としては、ブースティング、バギング、サポートベクターマシンなどが有名であり、様々なデータにおいて適用されている。ブースティング、バギング、サポートベクターマシンについては、非特許文献１乃至３に記載されている。 As supervised learning methods, boosting, bagging, support vector machines, and the like are well known and applied to various data. Non-patent documents 1 to 3 describe boosting, bagging, and support vector machines.

しかしながら、教師あり学習で高精度の分類器を構成するためには十分な量の訓練データが必要である。この点、一般的にはデータを人手でラベル付けするために、十分な量の訓練データを得るためには非常に時間や労力を要するという問題がある。また、得られたデータの分布に偏りが大きい場合は、テストデータに対する予測精度が低いという問題が指摘されている。 However, a sufficient amount of training data is required to construct a highly accurate classifier in supervised learning. In this regard, there is a problem that it takes a lot of time and labor to obtain a sufficient amount of training data in order to label the data manually. Moreover, when the distribution of the obtained data is largely biased, it has been pointed out that the prediction accuracy for the test data is low.

これらの問題に対して、半教師あり学習と呼ばれる手法が提案されている。従来のブースティングやサポートベクターマシンの方法論を拡張することにより、半教師あり学習が実現されている事例が非特許文献４乃至７に記載されている。 A technique called semi-supervised learning has been proposed for these problems. Non-Patent Documents 4 to 7 describe examples in which semi-supervised learning is realized by extending conventional boosting and support vector machine methodologies.

ここで、半教師あり学習とは、訓練データだけでなく、ラベル無しデータあるいはテストデータの分布も考慮した学習方式を指し、訓練データが少数の場合であっても高精度の分類器を構成することを目的とする学習方式をいう。 Here, semi-supervised learning refers to a learning method that considers not only training data but also the distribution of unlabeled data or test data, and constitutes a high-precision classifier even when the number of training data is small. This is a learning method for this purpose.

なお以下の説明では、ラベル無しデータおよびテストデータを単にテストデータと記載する。
Y Freund, RE Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 1997, 23-27. L. Breiman, Bagging Predictors, Machine Learning, 1996, 123-140. J. P. Vert, K. Tsuda and B. Scholkopf, "A primer on kernel methods",in Kernel Methods in Computational Biology, MIT Press, 2004, 35-70. F.d’Alche Buc, Y. Grandvalet, and C. Ambroise, Semi-supervised margin boost, Advances in Neural Information Processing Systems 14, MIT Press, 2002. Y. Grandvalet, F.d’Alche Buc, and C. Ambroise, Boosting mixture models for semi-supervised learning, ICANN, Springer-Verlag, 2002, 41-48. K. P. Benett, A. Demiriz, and R. Macin, Exploiting unlabeled data in ensemble methods, SIGKDD, ACM, 2002. O. Chapelle and A. Zien, Semi-supervised classification by low density separation, In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 2005. J Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, 2001, 1189-1232. Boaz Leskes, The Value of Agreement, a New Boosting Algorithm, Master Thesis, Section Computational Science and Institute for Logic, Language & Computation, University of Amsterdam, 2005. A. Buja and Y. S. Lee, Data Mining Criteria for Tree-Based Regression and Classification, Proceedings of KDD 2001, 27-36. UCI Machine Learning Repository Content Summary ［平成１９年９月１２日検索］、インターネット〈https://mlearn.ics.uci.edu/MLSummary.html〉 In the following description, unlabeled data and test data are simply referred to as test data.
Y Freund, RE Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 1997, 23-27. L. Breiman, Bagging Predictors, Machine Learning, 1996, 123-140. JP Vert, K. Tsuda and B. Scholkopf, "A primer on kernel methods", in Kernel Methods in Computational Biology, MIT Press, 2004, 35-70. F.d'Alche Buc, Y. Grandvalet, and C. Ambroise, Semi-supervised margin boost, Advances in Neural Information Processing Systems 14, MIT Press, 2002. Y. Grandvalet, F.d'Alche Buc, and C. Ambroise, Boosting mixture models for semi-supervised learning, ICANN, Springer-Verlag, 2002, 41-48. KP Benett, A. Demiriz, and R. Macin, Exploiting unlabeled data in ensemble methods, SIGKDD, ACM, 2002. O. Chapelle and A. Zien, Semi-supervised classification by low density separation, In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 2005. J Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, 2001, 1189-1232. Boaz Leskes, The Value of Agreement, a New Boosting Algorithm, Master Thesis, Section Computational Science and Institute for Logic, Language & Computation, University of Amsterdam, 2005. A. Buja and YS Lee, Data Mining Criteria for Tree-Based Regression and Classification, Proceedings of KDD 2001, 27-36. UCI Machine Learning Repository Content Summary [Searched on September 12, 2007], Internet <https://mlearn.ics.uci.edu/MLSummary.html>

もっとも、上記した非特許文献４乃至７に記載の方法では、以下に示すような問題点がある。 However, the methods described in Non-Patent Documents 4 to 7 have the following problems.

まず、非特許文献４及び５に記載の方法は、ブースティングに用いる下位学習機械として半教師あり学習を行う学習機械が必要であり、従来の教師あり学習の手法を用いることができない。そのため、適用可能なデータ形式に制限があり汎用性に乏しいという問題がある。また、大規模データに対して、計算時間を要するという問題もある。 First, the methods described in Non-Patent Documents 4 and 5 require a learning machine that performs semi-supervised learning as a lower learning machine used for boosting, and cannot use a conventional supervised learning method. Therefore, there is a problem that applicable data formats are limited and versatility is poor. In addition, there is a problem that it takes a calculation time for large-scale data.

一方、非特許文献６に記載の方法では、非特許文献４，５において問題であったブースティングの下位学習機械として、教師あり学習の学習機械を用いることが可能である。しかし、学習の定式化に問題があるため、分類精度の向上効果が乏しく、性能が劣化する場合もあることが報告されている。 On the other hand, in the method described in Non-Patent Document 6, it is possible to use a supervised learning machine as a boosting low-order learning machine that was a problem in Non-Patent Documents 4 and 5. However, it has been reported that there is a problem in the formulation of learning, so that the effect of improving the classification accuracy is poor and the performance may deteriorate.

また、非特許文献７に記載の方法では、サポートベクターマシンの目的関数にテストデータに関する損失関数を導入しているが、計算方法や最適化法が煩雑なため、多数の学習パラメータのチューニングや計算時間を要する問題がある。 In the method described in Non-Patent Document 7, a loss function related to test data is introduced into the objective function of the support vector machine. However, since the calculation method and the optimization method are complicated, tuning and calculation of many learning parameters are required. There is a problem that takes time.

そこで、本発明は上述した問題点を鑑み、学習理論に基づき、ブースティングを用いてラベル無しデータに対する予測値の分散を最小化しつつ訓練データの損失関数も最小化することで、下位学習機械として教師あり学習を適用可能な汎用的かつ学習パラメータの少ない実用的な高精度の分類器を提供することを目的とする。 Therefore, in view of the above-described problems, the present invention is based on learning theory, and by using boosting to minimize the variance of predicted values for unlabeled data while minimizing the loss function of training data, It is an object of the present invention to provide a practical high-precision classifier that can be applied to supervised learning and has a small number of learning parameters.

本発明の第１の観点によれば、方法としてアンサンブル学習に基づく半教師あり学習方法において、記憶データ、テストデータ及びラベル無しデータを記憶するデータ記憶ステップと、前記データ記憶ステップにおいて記録した訓練データに基づいて教師あり学習を行う初期学習ステップと、前記初期学習ステップにおいて算出されたデータを記憶する算出結果記憶ステップと、前記データ記憶ステップにおいて記録した訓練データとラベル無しデータの属性を結合したデータ及び前記算出結果記憶ステップで記憶しているデータに基づいて教師あり学習を反復的に行う学習ステップとを備えることを特徴とする半教師あり学習方法が提供される。 According to a first aspect of the present invention, in a semi-supervised learning method based on ensemble learning as a method, a data storage step for storing storage data, test data and unlabeled data, and training data recorded in the data storage step An initial learning step for performing supervised learning based on the above, a calculation result storing step for storing data calculated in the initial learning step, and data obtained by combining attributes of training data and unlabeled data recorded in the data storing step And a learning step for repeatedly performing supervised learning based on the data stored in the calculation result storing step.

更に、本発明の第２の観点によれば、第１の装置としてアンサンブル学習に基づく半教師あり学習装置において、訓練データを記憶する訓練データ記憶部と、ラベル無しデータおよびテストデータを記憶するテストデータ記憶部と、訓練データにより教師あり学習を行う初期学習部と、勾配を記憶するパラメータ記憶部と、訓練データとラベル無しデータおよびテストデータを結合し、教師あり学習を反復的に行う学習部と、学習された判別関数を記憶する判別関数記憶部と、判別関数を用いてテストデータのラベルを予測する判別部を備えることを特徴とする半教師あり学習装置が提供される。 Furthermore, according to the second aspect of the present invention, in the semi-supervised learning device based on ensemble learning as the first device, a training data storage unit that stores training data, and a test that stores unlabeled data and test data A data storage unit, an initial learning unit that performs supervised learning using training data, a parameter storage unit that stores gradients, and a learning unit that combines training data, unlabeled data, and test data to repeatedly perform supervised learning And a discriminant function storage unit for storing the learned discriminant function, and a discriminator unit for predicting the label of the test data using the discriminant function.

更に、本発明の第３の観点によれば、第２の装置としてアンサンブル学習に基づく半教師あり学習装置において、記憶データ、テストデータ及びラベル無しデータを記憶するデータ記憶手段と、前記データ記憶手段において記録した訓練データに基づいて教師あり学習を行う初期学習手段と、前記初期学習手段において算出されたデータを記憶する算出結果記憶手段と、前記データ記憶手段において記録した訓練データとラベル無しデータの属性を結合したデータ及び前記算出結果記憶手段で記憶しているデータに基づいて教師あり学習を反復的に行う学習手段とを備えることを特徴とする半教師あり学習装置が提供される。 Further, according to a third aspect of the present invention, in the semi-supervised learning device based on ensemble learning as the second device, data storage means for storing storage data, test data and unlabeled data, and the data storage means Initial learning means for performing supervised learning based on the training data recorded in the above, calculation result storage means for storing data calculated in the initial learning means, training data recorded in the data storage means and unlabeled data There is provided a semi-supervised learning apparatus comprising learning means for repeatedly performing supervised learning based on data obtained by combining attributes and data stored in the calculation result storage means.

更に、本発明の第４の観点によれば、第１のプログラムとしてアンサンブル学習に基づく半教師あり学習プログラムにおいて、訓練データを記憶する訓練データ記憶機能と、ラベル無しデータおよびテストデータを記憶するテストデータ記憶機能と、訓練データにより教師あり学習を行う初期学習機能と、勾配を記憶するパラメータ記憶機能と、訓練データとラベル無しデータおよびテストデータを結合し、教師あり学習を反復的に行う学習機能と、学習された判別関数を記憶する判別関数記憶機能と、判別関数を用いてテストデータのラベルを予測する判別機能をコンピュータに実現させることを特徴とする半教師あり学習プログラムが提供される。 Furthermore, according to the fourth aspect of the present invention, in a semi-supervised learning program based on ensemble learning as a first program, a training data storage function for storing training data, and a test for storing unlabeled data and test data Data storage function, initial learning function that performs supervised learning using training data, parameter storage function that stores gradients, and learning function that combines training data with unlabeled data and test data to perform supervised learning repeatedly And a discriminant function storage function for storing the learned discriminant function, and a discriminant function for predicting the label of the test data using the discriminant function.

更に、本発明の第５の観点によれば、第２のプログラムとしてアンサンブル学習に基づく半教師あり学習プログラムにおいて、記憶データ、テストデータ及びラベル無しデータを記憶するデータ記憶機能と、前記データ記憶機能において記録した訓練データに基づいて教師あり学習を行う初期学習機能と、前記初期学習機能において算出されたデータを記憶する算出結果記憶機能と、前記データ記憶機能において記録した訓練データとラベル無しデータの属性を結合したデータ及び前記算出結果記憶機能で記憶しているデータに基づいて教師あり学習を反復的に行う学習機能とをコンピュータに実現させることを特徴とする半教師あり学習プログラムが提供される。 Furthermore, according to the fifth aspect of the present invention, in the semi-supervised learning program based on ensemble learning as the second program, a data storage function for storing storage data, test data and unlabeled data, and the data storage function An initial learning function for performing supervised learning based on the training data recorded in the above, a calculation result storage function for storing data calculated in the initial learning function, and training data and unlabeled data recorded in the data storage function. Provided is a semi-supervised learning program that causes a computer to realize a learning function that repeatedly performs supervised learning based on data combined with attributes and data stored in the calculation result storage function .

本発明によれば、訓練データだけでなく、ラベル無しデータの分布も学習することから、少数の訓練データから高精度の分類器を構成することが可能になる。 According to the present invention, not only training data but also the distribution of unlabeled data is learned, so that a high-precision classifier can be configured from a small number of training data.

次に、本発明の実施形態について図面を用いて説明する。 Next, embodiments of the present invention will be described with reference to the drawings.

本発明は訓練データだけなくテストデータも用いて、ブースティングによる学習を行い、下位学習機械として任意の教師あり学習の学習機械を利用することで、訓練データが少ない状況においても、高精度の分類器を構成する新しい方法である。以下に本発明を実施するための形態について、図面を参照して説明する。 The present invention uses not only training data but also test data to perform learning by boosting, and uses an arbitrary supervised learning machine as a subordinate learning machine, so that even if there is little training data, high-precision classification It is a new way to configure the vessel. EMBODIMENT OF THE INVENTION Below, the form for implementing this invention is demonstrated with reference to drawings.

図１を参照すると、本発明の実施形態は、キーボード等の入力装置１と、プログラム制御により動作するデータ処理装置２と、情報を記憶する記憶装置３と、ディスプレイ装置や印刷装置等の出力装置４を備える。 Referring to FIG. 1, an embodiment of the present invention includes an input device 1 such as a keyboard, a data processing device 2 that operates under program control, a storage device 3 that stores information, and an output device such as a display device and a printing device. 4 is provided.

データ処理装置２は、初期学習部２１と、学習部２２と、判別部２３を備える。 The data processing device 2 includes an initial learning unit 21, a learning unit 22, and a determination unit 23.

初期学習部２１は、訓練データのみを用いてブースティングによる学習を行い、テストデータの勾配を計算する部分である。学習部２２は、訓練データ及びテストデータの勾配をラベルとしてブースティングによる学習を行い、判別関数を更新する部分である。判別部２３は、学習された判別関数を用いてテストデータのラベルを予測する部分である。 The initial learning unit 21 is a part that performs learning by boosting using only training data and calculates the gradient of the test data. The learning unit 22 is a part that performs learning by boosting using the gradients of the training data and test data as labels and updates the discriminant function. The discriminating unit 23 is a part that predicts the label of the test data using the learned discriminant function.

記憶装置３は、訓練データを格納する訓練データ記憶部３１と、テストデータを格納するテストデータ記憶部３２と、損失関数の勾配を格納するパラメータ記憶部３３と、判別関数を格納する判別関数記憶部３４を備える。 The storage device 3 includes a training data storage unit 31 that stores training data, a test data storage unit 32 that stores test data, a parameter storage unit 33 that stores a gradient of a loss function, and a discriminant function storage that stores a discriminant function. The unit 34 is provided.

次に図１、図２及び図３を参照して、本発明を実施するための形態の動作について、説明する。 Next, the operation of the embodiment for carrying out the present invention will be described with reference to FIGS.

まず、入力装置１によって実行指示が与えられ、訓練データ記憶部３１及びテストデータ記憶部３２からデータ処理装置２に訓練データとテストデータが入力される（図２、ステップＡ１）。 First, an execution instruction is given by the input device 1, and training data and test data are input from the training data storage unit 31 and the test data storage unit 32 to the data processing device 2 (FIG. 2, step A1).

次に、初期学習部２１により、訓練データを用いた判別関数Ｆの教師あり学習が行われる（図２、ステップＡ４）。なお初期学習部２１の具体的動作（ステップＡ２及びステップＡ３）については下記する。 Next, supervised learning of the discriminant function F using training data is performed by the initial learning unit 21 (step A4 in FIG. 2). The specific operation (step A2 and step A3) of the initial learning unit 21 will be described below.

そして、判別関数を反復的に更新し（図２、ステップＡ５）、判別関数による訓練データの勾配を計算する（図２、ステップＡ６）。 Then, the discriminant function is iteratively updated (FIG. 2, step A5), and the gradient of the training data by the discriminant function is calculated (FIG. 2, step A6).

その後、算出された勾配がパラメータ記憶部３３、判別関数が判別関数記憶部３４に記憶される。 Thereafter, the calculated gradient is stored in the parameter storage unit 33 and the discriminant function is stored in the discriminant function storage unit 34.

図２を参照して、初期学習部２１の具体的な動作について、説明する。 A specific operation of the initial learning unit 21 will be described with reference to FIG.

まず、訓練データ記憶部３１とテストデータ記憶部３２から訓練データ及びテストデータがデータ処理装置２に入力される（図２、ステップＡ１）。 First, training data and test data are input from the training data storage unit 31 and the test data storage unit 32 to the data processing device 2 (FIG. 2, step A1).

次に初期学習部２１は、反復回数にｔ_１、縮小パラメータにν_１を設定する（図２、ステップＡ２）。また、反復回数を計数する指示パラメータＴを１に初期化する（図２、ステップＡ３）。 Next, the initial learning unit 21 sets t _{1 as} the number of iterations and ν ₁ as a reduction parameter (FIG. 2, step A2). Further, the instruction parameter T for counting the number of iterations is initialized to 1 (FIG. 2, step A3).

訓練データを用いた判別関数Ｆの教師あり学習を行う（図２、ステップＡ４）。 Supervised learning of the discriminant function F using training data is performed (FIG. 2, step A4).

ラウンドＴ−１で得られた判別関数Ｆ_Ｔ−１に、学習により得られた判別関数Ｆを加えることにより、判別関数Ｆ_Ｔとして更新する（図２、ステップＡ５）。数式として、以下に示す。 The discriminant function F _T-1 obtained in the Round T-1, by adding a discriminant function F obtained by learning, updating a discriminant function F _T (FIG. 2, step A5). The formula is shown below.

ここで、ν_１（０＜ν_１≦１）は縮小パラメータであり、オーバーフィッティングを防ぐ正則化のために導入されている。ν１が導入されていることにより汎化誤差を最小にするようなＦ_Ｔに収束することとなる。

Here, ν ₁ (0 <ν ₁ ≦ 1) is a reduction parameter, and is introduced for regularization to prevent overfitting. ν1 becomes possible to converge the generalization error F _T that minimizes by is introduced.

次に、判別関数Ｆ_Ｔを用いて、損失関数を最小化する訓練データの勾配を求める（図２、ステップＡ６）。 Next, using the discriminant function F _T, determining the gradient of the training data to minimize the loss function (Fig. 2, step A6).

ここで、教師あり学習におけるブースティングの損失関数は、訓練データにおける誤り率を最小化することにより、テストデータの誤り率も同時に最小化することを目的として、設計されている。非特許文献８には、勾配ブースティングと呼ばれる方法で損失関数を判別関数により微分することで、損失関数の最小化方向を探索するという方法が記載されている。 Here, the boosting loss function in supervised learning is designed for the purpose of simultaneously minimizing the error rate of the test data by minimizing the error rate of the training data. Non-Patent Document 8 describes a method of searching for a minimizing direction of a loss function by differentiating the loss function with a discriminant function by a method called gradient boosting.

具体的な損失関数としては、以下の数式で表されるような関数がある。 As a specific loss function, there is a function represented by the following formula.

ここで、Ｌは損失関数、ｙｉはｉ番目のデータのクラスラベル、ｙｉ∈（−１，１）、Ｆ（ｘｉ）は判別関数であり、ｉ番目のデータの属性ｘｉが与えられたときの出力を表す。損失関数は凸関数であるので、損失関数ＬをＦ（ｘｉ）で微分し、負をとることによって、損失関数の最小化方向が得られる。

Here, L is a loss function, yi is a class label of the i-th data, yiε (−1,1), F (xi) is a discriminant function, and an attribute xi of the i-th data is given. Represents the output. Since the loss function is a convex function, the loss function L can be minimized by differentiating the loss function L with F (xi) and taking the negative value.

ここで、ｇｉはデータｘｉの勾配であり、Ｄｌａｂｅｌｅｄは訓練データの属性とラベルの組の集合である。

Here, gi is the gradient of data xi, and Labeled is a set of attribute data and label pairs of training data.

次に、反復回数の指示パラメータＴに１を加える（図２、ステップＡ７）。そしてＴの値が予め定めた定数ｔに達すれば、学習を終了する。一方、定数ｔに達していない場合は、勾配をラベルとした判別関数の教師あり学習（図２、ステップＡ４）に戻る（図２、ステップＡ８）。 Next, 1 is added to the instruction parameter T for the number of iterations (FIG. 2, step A7). When the value of T reaches a predetermined constant t, the learning is finished. On the other hand, if the constant t has not been reached, the process returns to supervised learning (FIG. 2, step A4) of the discriminant function with the gradient as the label (FIG. 2, step A8).

次に、判別関数Ｆ_Ｔを用いてテストデータの勾配を算出する（図２、ステップＡ９）。 Then, to calculate the gradient of the test data by using the discriminant function F _T (FIG. 2, step A9).

ここで、テストデータはラベル無しデータである。そのため、テストデータについては、訓練データと同じ損失関数や勾配を用いることができない。しかし、非特許文献９において、下位学習機械の集合から得られるラベル無しデータの予測値の分散を小さくすることにより、ブースティングの汎化性能が向上できることが示されている。すなわち、ラベル無しデータの予測値の分散を最小にするような勾配を、テストデータについて求めれば性能を向上させることが可能である。 Here, the test data is unlabeled data. Therefore, the same loss function and gradient as the training data cannot be used for the test data. However, Non-Patent Document 9 shows that the generalization performance of boosting can be improved by reducing the variance of predicted values of unlabeled data obtained from a set of lower learning machines. That is, the performance can be improved if a gradient that minimizes the variance of the predicted value of unlabeled data is obtained for the test data.

そこで、逐次的に分散を最小化することのできる勾配を以下のようにして導出する。下位学習機械の数がＬと下位学習機械の数がＬ＋１における予測値の分散をそれぞれ、VＬ，ＶＬ＋１とすると、 Therefore, a gradient capable of sequentially minimizing the variance is derived as follows. Assuming that the variance of the predicted values when the number of lower learning machines is L and the number of lower learning machines is L + 1 is VL and VL + 1, respectively.

と書ける。ここで、ｆｌはｌ番目に学習された判別関数であり、ｘｉはテストデータである。ここで、Ｌ＋１番目の下位学習機械で求めるべき勾配をｇとすると、

Can be written. Here, fl is the l-th learned discriminant function, and xi is test data. Here, if the gradient to be obtained by the (L + 1) th subordinate learning machine is g,

と書き直すことができる。ｇはΔＶ＝ＶＴ−ＶＴ＋１を最大化するように求めればよい。そこで、ｇによりΔＶの微分が０になるようにｇを定める。

Can be rewritten. g may be obtained so as to maximize ΔV = VT−VT + 1. Therefore, g is determined so that the derivative of ΔV becomes 0 by g.

ここで、ΔＶがｇに関する２階微分が常に負であることから、凹関数であり、大域的な最大値を求めることができることが保証されている。また、上記のｇをΔＶに代入すると、ΔＶ＞０であるので、単調に分散を減少させることができる。

Here, since ΔV is always negative in the second derivative with respect to g, it is a concave function, and it is guaranteed that a global maximum value can be obtained. Further, when the above g is substituted for ΔV, ΔV> 0, so that dispersion can be monotonously reduced.

初期学習部２１によって得られた訓練データとテストデータの勾配はパラメータ記憶部３３に格納し、学習された判別関数Ｆ_Ｔは判別関数記憶部３４に格納する。 The gradient of training data and test data obtained by the initial learning unit 21 is stored in the parameter storage unit 33, and the learned discriminant function _FT is stored in the discriminant function storage unit 34.

初期学習部２１は勾配ブースティングと同一のアルゴリズムであり、テストデータの勾配を複数の判別関数から予測値の分散を求めるために学習を行う。 The initial learning unit 21 is the same algorithm as gradient boosting, and learns the gradient of test data in order to obtain the variance of predicted values from a plurality of discriminant functions.

次に、学習部２２の動作を図３を用いて説明する。 Next, the operation of the learning unit 22 will be described with reference to FIG.

まず、訓練データ記憶部３１とテストデータ記憶部３２から訓練データ及びテストデータが学習部２２に入力される。また、初期学習部２１によって得られた訓練データとテストデータの勾配がパラメータ記憶部３３から学習部２２に入力される（図３、ステップＢ１）。 First, training data and test data are input to the learning unit 22 from the training data storage unit 31 and the test data storage unit 32. Further, the gradient of the training data and the test data obtained by the initial learning unit 21 is input from the parameter storage unit 33 to the learning unit 22 (FIG. 3, step B1).

反復回数ｔ_２、縮小パラメータν_２が設定される（図３、ステップＢ２）。反復回数を計数する指示パラメータＴを１に初期化し（図３、ステップＢ３）、訓練データとテストデータの勾配をラベルとして訓練データ及びテストデータを用いた判別関数Ｆの教師あり学習を行う（図３、ステップＢ４）。 The number of iterations t ₂ and the reduction parameter ν ₂ are set (FIG. 3, step B2). The instruction parameter T for counting the number of iterations is initialized to 1 (FIG. 3, step B3), and supervised learning of the discriminant function F using the training data and test data is performed using the gradient of the training data and test data as a label (FIG. 3). 3, Step B4).

次に、ラウンドＴ−１で得られた判別関数Ｆ_Ｔ−１に、学習により得られた判別関数Ｆを加えることにより、判別関数Ｆ_Ｔとして更新する（図３、ステップＢ５）。
得られた判別関数Ｆ_Ｔを用いて、訓練データ及びテストデータの勾配を計算する（図３、ステップＢ６）。 Then, the discriminant function F _T-1 obtained in the Round T-1, by adding a discriminant function F obtained by learning, updating a discriminant function F _T (FIG. 3, step B5).
Obtained using a discriminant function F _T, calculate the slope of the training data and the test data (FIG. 3, step B6).

初期学習部２１と同様にして、訓練データ及びテストデータの勾配をそれぞれで求めた後、反復回数の指示パラメータＴに１を加える（図３、ステップＢ７）。 Similarly to the initial learning unit 21, the gradients of the training data and the test data are obtained, and then 1 is added to the instruction parameter T for the number of iterations (FIG. 3, step B7).

Ｔが予め定めた定数ｔ_２に達すれば、学習を終了する。一方、達していない場合は、勾配をラベルとした判別関数の教師あり学習（図３、ステップＢ４）に戻る（図３、ステップＢ８）。学習された判別関数は判別関数記憶部３４に格納する。 Once you reach a constant t ₂ where T is a predetermined, it ends the learning. On the other hand, if not reached, the process returns to supervised learning (FIG. 3, step B4) of the discriminant function with the gradient as a label (FIG. 3, step B8). The learned discriminant function is stored in the discriminant function storage unit 34.

判別学習部２３では、テストデータをテストデータ記憶部３２から入力し、判別関数記憶部３４から判別関数を入力し、テストデータのラベルを予測する。 In the discriminative learning unit 23, test data is input from the test data storage unit 32, and a discriminant function is input from the discriminant function storage unit 34 to predict the label of the test data.

ラベル情報としては、例えば医学・生物学分野の場合、疾患や薬効の有無、病態の進行度の他に生存時間などを用いることができる。教師付き学習の方法としては、例えばバギング、ブースティングなどのアンサンブル学習や、サポートベクターマシン、決定木、生存木を用いることができる。なお、上記したラベル情報や教師付き学習方法は例示であり、他のラベル情報や教師付き学習方法を利用することも可能である。 As the label information, for example, in the medical / biological field, the presence / absence of a disease or a drug effect, the progress of a disease state, and the like can be used. As a supervised learning method, for example, ensemble learning such as bagging or boosting, a support vector machine, a decision tree, or a survival tree can be used. Note that the above-described label information and supervised learning method are merely examples, and other label information and supervised learning methods can be used.

そして、テストデータの予測結果は出力装置４から出力される。 The prediction result of the test data is output from the output device 4.

なお、半教師あり学習装置は、ハードウェア、ソフトウェア又はこれらの組合せにより実現することができる。 The semi-supervised learning device can be realized by hardware, software, or a combination thereof.

次に、本発明の実施例について説明する。 Next, examples of the present invention will be described.

実施に用いるデータとして、糖尿病に関する臨床情報を機械学習ベンチマークデータのＵＣＩレポジトリ（非特許文献１１参照）から取得した。 As data used for implementation, clinical information on diabetes was obtained from the UCI repository of machine learning benchmark data (see Non-Patent Document 11).

糖尿病発症の有無（属性名：ｄｉａｂｅｔｅｓ）を診察された７６８人の患者のうち臨床情報８項目に基づき、性能評価を行った。７６８人の患者のうち糖尿病と診断された者は２６８人、発症していないと診断された者は５００人である。臨床情報の属性における欠損値については、カテゴリーデータについては、最頻カテゴリー、数値データについては、中央値により補完した。 Performance evaluation was performed based on 8 items of clinical information among 768 patients examined for the presence or absence of diabetes onset (attribute name: diabets). Of the 768 patients, 268 have been diagnosed with diabetes and 500 have been diagnosed with no onset. The missing values in the attributes of clinical information were supplemented by the most frequent category for category data and the median for numerical data.

本発明の学習パラメータとして、縮小パラメータν_１，ν_２＝１とし、訓練データの損失関数として、勾配ブースティングのＡｄａｂｏｏｓｔタイプの指数関数を用いた。下位学習機械としては、決定木の１つであるＣＡＲＴを用いた。ＣＡＲＴの詳細については、非特許文献１０に記載されている。なお説明の便宜上、本発明の方法をＳＳＢｏｏｓｔ（Ｓｅｍｉ−ＳｕｐｅｒｖｉｓｅｄＢｏｏｓｔｉｎｇ）と記載する。 The reduction parameters ν ₁ and ν ₂ = 1 were used as learning parameters of the present invention, and the Adaboost type exponential function of gradient boosting was used as the loss function of training data. CART, which is one of the decision trees, was used as the lower learning machine. Details of CART are described in Non-Patent Document 10. For convenience of explanation, the method of the present invention is referred to as SSBoost (Semi-Supervised Boosting).

本発明の対照法として、訓練データのみを用いた勾配ブースティングとＣＡＲＴを用いた。なお、勾配ブースティングの詳細は非特許文献８に記載されている。便宜上、対照法をＡｄａｂｏｏｓｔと記載する。ＳＳＢｏｏｓｔはＡｄａｂｏｏｓｔと全く同様に訓練データについて学習を２０回行い（ｔ１＝２０）、その後、２０回テストデータの勾配も用いて学習を行った（ｔ２＝２０）。 As a control method of the present invention, gradient boosting using only training data and CART were used. Details of gradient boosting are described in Non-Patent Document 8. For convenience, the control method is referred to as Adaboost. SSBoost performed learning about training data 20 times (t1 = 20) in exactly the same manner as Adaboost, and then learned 20 times using the gradient of test data (t2 = 20).

Ａｄａｂｏｏｓｔの反復回数は４０回とした。ＳＳＢｏｏｓｔとＡｄａｂｏｏｓｔは計４０回の反復学習を行い、同等の条件下で、公平な性能比較を行った。性能比較の結果を図４に示す。 The number of Adaboost iterations was 40. SSBoost and Adaboost performed a total of 40 repetitive learnings, and performed a fair performance comparison under the same conditions. The results of the performance comparison are shown in FIG.

図４に示された結果から、本発明の方法ＳＳＢｏｏｓｔはＡｄａｂｏｏｓｔ及びＣＡＲＴと比較して、分類性能が常に高いことが分かる。 From the results shown in FIG. 4, it can be seen that the method SSBoost of the present invention always has high classification performance compared to Adaboost and CART.

以上より、本発明を実施することにより下位学習機械として教師あり学習を適用可能な高精度の分類器を実現する事ができる。 As described above, by implementing the present invention, it is possible to realize a highly accurate classifier to which supervised learning can be applied as a lower learning machine.

本発明の実施形態の基本的構成を表す図である。It is a figure showing the basic composition of the embodiment of the present invention. 本発明の実施形態における初期学習部の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the initial learning part in embodiment of this invention. 本発明の実施形態における反復学習部の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the iterative learning part in embodiment of this invention. 本発明の実施形態の対照法として、訓練データのみを用いた勾配ブースティングとＣＡＲＴを用いた性能比較の結果を表す図である。It is a figure showing the result of the performance comparison using the gradient boosting which used only training data, and CART as a control method of embodiment of this invention.

Explanation of symbols

１入力装置
２データ処理装置
３記憶装置
４出力装置
２１初期学習部
２２学習部
２３判別手段
３１訓練データ記憶部
３２テストデータ記憶部
３３パラメータ記憶部
３４判別関数記憶部 1 Input device 2 Data processing device 3 Storage device 4 Output device 21 Initial learning unit 22 Learning unit 23 Discriminating means 31 Training data storage unit 32 Test data storage unit 33 Parameter storage unit 34 Discriminant function storage unit

Claims

In a semi-supervised learning method based on ensemble learning,
A data storage step for storing stored data, test data and unlabeled data;
An initial learning step for performing supervised learning based on the training data recorded in the data storage step;
A calculation result storing step for storing data calculated in the initial learning step;
A learning step for repeatedly performing supervised learning based on data obtained by combining the attributes of the training data recorded in the data storage step and unlabeled data and the data stored in the calculation result storage step. Semi-supervised learning method.

The semi-supervised learning method according to claim 1,
The semi-supervised learning method, wherein the supervised learning in the learning step is performed using the gradient of the training data and the gradient of the test data calculated in the initial learning step as labels.

The semi-supervised learning method according to claim 1 or 2,
A semi-supervised learning method, wherein the supervised learning method in the initial learning step is boosting or gradient boosting.

The semi-supervised learning method according to any one of claims 1 to 3,
A semi-supervised learning method, wherein a loss function related to training data in the initial learning step is a convex function.

The semi-supervised learning method according to any one of claims 1 to 4,
A semi-supervised learning method, wherein a loss function related to training data in the initial learning step is an exponential function.

The semi-supervised learning method according to any one of claims 1 to 5,
A semi-supervised learning method, further comprising an initial regularization step of multiplying a discriminant function learned in each round in the initial learning step by a reduction parameter.

The semi-supervised learning method according to any one of claims 1 to 6,
A semi-supervised learning method, wherein the lower learning machine in the initial learning step is a decision tree.

The semi-supervised learning method according to any one of claims 1 to 7,
A semi-supervised learning method, wherein the supervised learning method in the learning step is boosting or gradient boosting.

The semi-supervised learning method according to any one of claims 1 to 8,
The semi-supervised learning method, wherein the unlabeled data in the learning step is test data.

The semi-supervised learning method according to any one of claims 1 to 9,
A semi-supervised learning method, wherein a loss function related to training data in the learning step is a convex function.

The semi-supervised learning method according to any one of claims 1 to 10,
A semi-supervised learning method, wherein a loss function related to training data in the learning step is an exponential function.

The semi-supervised learning method according to any one of claims 1 to 11,
A semi-supervised learning method characterized in that, in the learning step, a variance of predicted values of a lower learning machine with respect to unlabeled data is minimized.

The semi-supervised learning method according to any one of claims 1 to 12,
A semi-supervised learning method, further comprising a regularization step of multiplying a discriminant function learned in each round in the learning step by a reduction parameter.

The semi-supervised learning method according to any one of claims 1 to 13,
A semi-supervised learning method, wherein the lower learning machine in the learning step is a decision tree.

The semi-supervised learning method according to any one of claims 1 to 14,
A semi-supervised learning method, further comprising a prediction step of predicting a label of test data based on a learning result obtained in the learning step.

In a semi-supervised learning device based on ensemble learning,
A training data storage unit that stores training data, a test data storage unit that stores unlabeled data and test data, an initial learning unit that performs supervised learning using training data, a parameter storage unit that stores gradients, and training data A learning unit that repeatedly performs supervised learning, a discriminant function storage unit that stores the learned discriminant function, and a discriminant that predicts the label of the test data using the discriminant function A semi-supervised learning apparatus characterized by comprising a unit.

In a semi-supervised learning device based on ensemble learning,
Data storage means for storing stored data, test data and unlabeled data;
Initial learning means for performing supervised learning based on the training data recorded in the data storage means;
Calculation result storage means for storing data calculated in the initial learning means;
Learning means for repeatedly performing supervised learning based on data obtained by combining the attributes of the training data recorded in the data storage means and unlabeled data and the data stored in the calculation result storage means. A semi-supervised learning device.

The semi-supervised learning device according to claim 17,
The semi-supervised learning apparatus, wherein the supervised learning in the learning means is performed using the gradient of the training data and the gradient of the test data calculated by the initial learning means as labels.

The semi-supervised learning device according to claim 17 or 18,
A semi-supervised learning apparatus, wherein the supervised learning method in the initial learning means is boosting or gradient boosting.

The semi-supervised learning device according to any one of claims 17 to 19,
A semi-supervised learning apparatus, wherein a loss function related to training data in the initial learning means is a convex function.

The semi-supervised learning device according to any one of claims 17 to 20,
A semi-supervised learning apparatus, wherein a loss function related to training data in the initial learning means is an exponential function.

The semi-supervised learning device according to any one of claims 17 to 21,
A semi-supervised learning apparatus, further comprising first regularization means for multiplying a discriminant function learned in each round in the initial learning means by a reduction parameter.

The semi-supervised learning device according to any one of claims 17 to 22,
A semi-supervised learning apparatus, wherein the lower learning machine in the initial learning means is a decision tree.

The semi-supervised learning device according to any one of claims 17 to 23,
A semi-supervised learning apparatus, wherein the supervised learning method in the learning means is boosting or gradient boosting.

The semi-supervised learning device according to any one of claims 17 to 24,
A semi-supervised learning apparatus, wherein the unlabeled data in the learning means is test data.

The semi-supervised learning device according to any one of claims 17 to 25,
A semi-supervised learning apparatus, wherein a loss function related to training data in the learning means is a convex function.

The semi-supervised learning apparatus according to any one of claims 17 to 26,
A semi-supervised learning apparatus, wherein a loss function relating to training data in the learning means is an exponential function.

The semi-supervised learning device according to any one of claims 17 to 27,
The semi-supervised learning device characterized in that in the learning means, the variance of the predicted value of the lower learning machine with respect to unlabeled data is minimized.

The semi-supervised learning device according to any one of claims 17 to 28,
A semi-supervised learning apparatus, further comprising second regularization means for multiplying a discriminant function learned in each round in the learning means by a reduction parameter.

The semi-supervised learning device according to any one of claims 17 to 29,
A semi-supervised learning apparatus, wherein the lower learning machine in the learning means is a decision tree.

In a semi-supervised learning program based on ensemble learning,
Training data storage function for storing training data, test data storage function for storing unlabeled data and test data, initial learning function for supervised learning using training data, parameter storage function for storing gradients, and training data A learning function that combines unlabeled data and test data to perform supervised learning repeatedly, a discriminant function storage function that stores learned discriminant functions, and a discriminant that predicts test data labels using discriminant functions A semi-supervised learning program characterized by having a computer realize the function.

In a semi-supervised learning program based on ensemble learning,
A data storage function for storing stored data, test data and unlabeled data;
An initial learning function for performing supervised learning based on the training data recorded in the data storage function;
A calculation result storage function for storing data calculated in the initial learning function;
The computer realizes a learning function that repeatedly performs supervised learning based on data obtained by combining the attributes of the training data recorded in the data storage function and unlabeled data and the data stored in the calculation result storage function. A semi-supervised learning program characterized by