CN108171280A

CN108171280A - A kind of grader construction method and the method for prediction classification

Info

Publication number: CN108171280A
Application number: CN201810098965.7A
Authority: CN
Inventors: 夏耘海; 李燕伟; 王甲樑
Original assignee: Guoxin Youe Data Co Ltd
Current assignee: Guoxin Youe Data Co Ltd
Priority date: 2018-01-31
Filing date: 2018-01-31
Publication date: 2018-06-15

Abstract

The present invention provides a kind of grader construction method and the method for prediction classification, wherein, which includes：Obtain the training dataset of multiple training samples；Wherein, training dataset includes attribute information and classification information；Attributive character is extracted in dependence information；Using attributive character as initial argument, at least one wheel model training is carried out using corresponding classification information as initial dependent variable；Wherein, each round model training is participated in by least two candidate families；By the combination of the model of error rate minimum in each round model training, the grader as completion training.Pass through grader construction method provided by the invention and the method for prediction classification, at least one wheel model training is carried out based at least two candidate families to obtain grader, and the classification of target sample can be predicted according to trained grader, the problem of avoiding using precision of prediction caused by single sorting technique and poor prediction accuracy, the precision of prediction and accuracy are higher.

Description

A kind of grader construction method and the method for prediction classification

Technical field

The present invention relates to field of computer technology, classify in particular to a kind of grader construction method and prediction Method.

Background technology

Sorting algorithm be namely based on disaggregated model sample to be detected from optional classification chosen into best classification it is assumed that For example, debt-credit disaggregated model for possible debt-credit or can not possibly borrow a client loaning bill intent classifier of loan platform It borrows.

Corresponding to above-mentioned disaggregated model, mainly include model training stage, model Qualify Phase and model application stage.Its In, in the above-mentioned model training stage, seeking to establish model, usage history data set establishes disaggregated model, model Qualify Phase, It is to verify to build vertical above-mentioned model jointly according to history data set, is such as verified using cross validation mode, model application Stage is the data that unknown classification is predicted according to the disaggregated model of foundation.

Common sorting algorithm includes decision tree classification, Bayes's classification, neural network classification and logistic regression classification Etc. sorting algorithms.However, above-mentioned sorting algorithm uses single sorting technique, when being applied to actual traffic data, due to The limitation of algorithm in itself, it is impossible to reach a preferable precision of prediction and prediction accuracy.

Invention content

In view of this, the purpose of the present invention is to provide a kind of grader construction method and the method for prediction classification, to carry The precision of high-class prediction and accuracy.

In a first aspect, the present invention provides a kind of grader construction method, the method includes：

Obtain the training dataset of multiple training samples；Wherein, the training dataset includes attribute information and classification is believed Breath；

Attributive character is extracted from the attribute information；

Using the attributive character as initial argument, at least one is carried out using corresponding classification information as initial dependent variable Take turns model training；Wherein, each round model training is participated in by least two candidate families；

By the combination of the model of error rate minimum in each round model training, the grader as completion training.

With reference to first aspect, the present invention provides the first possible embodiment of first aspect, wherein, often wheel training Perform following operation：

The argument value of current independent variable is determined and when antecedents are because becoming based on the training data that epicycle training uses Magnitude is trained at least two candidate families for participating in epicycle training；

According to epicycle training as a result, determining that error rate is minimum from least two candidate families for participating in epicycle training First candidate family；And

Determine the error training sample of classification results mistake that first candidate family obtains in epicycle training；

Rule is updated based on default weight, the weight for the training sample that malfunctions is updated；

According to the present weight of the multiple training sample, stratified sampling processing is carried out to multiple training samples, is obtained down One wheel training needs the training data used, is trained into next round.

The possible embodiment of with reference to first aspect the first, second the present invention provides first aspect are possible Embodiment, wherein, for being trained except the first round in addition to other wheel training, true based on the training data that uses of epicycle training Surely it before corresponding to the argument value of independent variable and the dependent variable value of corresponding dependent variable, further includes：

The feature that training data to be used is needed to include training sample is trained based on the determining epicycle of upper wheel training end, Structure is for the new attributive character of epicycle training；And

The new attributive character is determined as current independent variable, corresponding classification information is determined as working as antecedents.

The possible embodiment of with reference to first aspect the first, the third the present invention provides first aspect are possible Embodiment, wherein, for being trained except the first round in addition to other wheel training, according to epicycle train as a result, from participate in epicycle The first minimum candidate family of error rate is determined at least two trained candidate families, is specifically included：

Based on the attribute information of the multiple training sample, the change certainly of the current independent variable of correspondence of each training sample is determined Magnitude and the dependent variable value when antecedents；

The correspondence argument value of each training sample is inputted into epicycle respectively and completes at least two trained candidate families, is obtained To the classification results of each training sample；

According to the classification results of obtained each training sample and corresponding classification information, trained at least from epicycle is participated in The first minimum candidate family of error rate is determined in two candidate families.

It is any possible in the possible embodiment of with reference to first aspect the first to the third possible embodiment Embodiment, the present invention provides the 4th kind of possible embodiment of first aspect, wherein, determining that error rate is minimum After first candidate family, further include：

According to the lowest error rate and the default value relationship of Model Weight, the weight of first candidate family is determined； Wherein, the default value relationship meets that error rate is smaller, and Model Weight is higher；

It is specific to wrap as the grader for completing training by the combination of the model of error rate minimum in each round model training It includes：

By the weighted array of the model of error rate minimum in each round model training and its corresponding Model Weight, as completion Trained grader.

It is any possible in the possible embodiment of with reference to first aspect the first to the third possible embodiment Embodiment, the present invention provides the 5th kind of possible embodiment of first aspect, wherein, according to the multiple training sample Present weight, stratified sampling processing is carried out to multiple training samples, obtaining next round training needs training data to be used, tool Body includes：

From the multiple training sample, determine that present weight is more than all or part of training sample of initial weight, As the first training sample；

According to the quantity of the first determining training sample and preset quantity relationship, determine that the present weight of corresponding number is small In the second training sample of initial weight；And

First training sample and second training sample are corresponded into attribute information and classification information as next Wheel training needs the training data used.

The 5th kind of possible embodiment with reference to first aspect, the 6th kind the present invention provides first aspect are possible Embodiment, wherein, it is described that first training sample and second training sample are corresponded into attribute information and classification letter Before breath needs training data to be used as next round training, further include：

Distribution characteristics synthesis preset quantity third training sample based on first training sample；

It is described using first training sample and second training sample correspond to attribute information and classification information as Next round training needs training data to be used, including：

The third training sample of first training sample, second training sample and the synthesis is corresponded into attribute letter Breath and classification information need training data to be used as next round training.

The 5th kind of possible embodiment with reference to first aspect, the 7th kind the present invention provides first aspect are possible Embodiment, wherein, before next round training is entered, further include：

Training data to be used is needed to carry out class imbalance processing determining next round training.

It is any possible in the possible embodiment of with reference to first aspect the first to the third possible embodiment Embodiment, the present invention provides the 8th kind of possible embodiment of first aspect, wherein, from participate in epicycle training to After the first minimum candidate family of error rate is determined in few two candidate families, further include：

Determine the not up to default error rate threshold of the lowest error rate.

Second aspect, the present invention provides a kind of the first possible embodiments based on first aspect, first aspect The method of any possible trained grader prediction classification of embodiment, the side into the 8th kind of possible embodiment Method includes：

Obtain the attribute information of target sample；

For the model of error rate minimum that every training in rotation is got, based on the attribute information of the target sample, determining should Model uses the corresponding characteristic value of attributive character；

The corresponding characteristic value of each error rate least model is inputted into corresponding model respectively and obtains each error rate least model point Not corresponding classification results；

Based on the corresponding Model Weight of each error rate least model, each classification results are weighted with summation, and will obtain And value be determined as the classification results of the target sample.

Grader construction method provided by the invention obtains the training dataset of multiple training samples first；Wherein, it instructs Practice data set and include attribute information and classification information；Then attributive character is extracted in dependence information；Attributive character is made again For initial argument, at least one wheel model training is carried out using corresponding classification information as initial dependent variable；Wherein, each wheel model Type training is participated in by least two candidate families；Finally by the combination of the model of error rate minimum in each round model training, make To complete the grader of training.By grader construction method provided by the invention and the method for prediction classification, based at least two A candidate family carries out at least one wheel model training to obtain grader, can predict target sample according to trained grader Classification, the problem of avoiding using precision of prediction caused by single sorting technique and poor prediction accuracy, prediction Precision and accuracy it is higher.

For the above objects, features and advantages of the present invention is enable to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate Appended attached drawing, is described in detail below.

Description of the drawings

It in order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range, for those of ordinary skill in the art, without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 shows the flow chart of a kind of grader construction method that the embodiment of the present invention is provided；

Fig. 2 shows the flow charts of another grader construction method that the embodiment of the present invention is provided；

Fig. 3 shows the flow chart of another grader construction method that the embodiment of the present invention is provided；

Fig. 4 shows the flow chart of another grader construction method that the embodiment of the present invention is provided；

Fig. 5 shows the flow chart of another grader construction method that the embodiment of the present invention is provided；

Fig. 6 shows the flow chart of another grader construction method that the embodiment of the present invention is provided；

Fig. 7 shows the flow chart of the method for a kind of prediction classification that the embodiment of the present invention is provided；

Fig. 8 shows the structure diagram of a kind of grader construction device that the embodiment of the present invention is provided；

Fig. 9 shows the structure diagram of a kind of computer equipment that the embodiment of the present invention is provided；

Figure 10 shows the structure diagram of the device of a kind of prediction classification that the embodiment of the present invention is provided；

Figure 11 shows the structure diagram of a kind of computer equipment that the embodiment of the present invention is provided；.

Specific embodiment

Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention The technical solution in the embodiment of the present invention is clearly and completely described in middle attached drawing, it is clear that described embodiment is only It is part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is real Applying the component of example can be configured to arrange and design with a variety of different.Therefore, below to provide in the accompanying drawings the present invention The detailed description of embodiment is not intended to limit the range of claimed invention, but is merely representative of the selected reality of the present invention Apply example.Based on the embodiment of the present invention, institute that those skilled in the art are obtained under the premise of creative work is not made There is other embodiment, shall fall within the protection scope of the present invention.

In view of relevant classification algorithm using single sorting technique, when being applied to actual traffic data, due to calculating The limitation of method in itself, it is impossible to reach a preferable precision of prediction and prediction accuracy.Based on this, the embodiment of the present invention provides A kind of grader construction method and the method for prediction classification, to improve the precision of classification prediction and accuracy.

It is the flow chart of grader construction method provided in an embodiment of the present invention referring to Fig. 1, applied to computer equipment, Above-mentioned grader construction method includes the following steps：

S101, the training dataset for obtaining multiple training samples；Wherein, training dataset includes attribute information and classification is believed Breath.

Here, the attribute information and classification information for the ease of being concentrated to above-mentioned training data understand, in conjunction with by means of It borrows prediction scene and is specifically described the method that the embodiment of the present invention obtains above- mentioned information.In scene is predicted in debt-credit, above-mentioned attribute letter Breath can include but is not limited to：User basic information (such as name, age, occupation, identity information), shopping information are (as done shopping Time, shopping site etc.), the credit amount that above-mentioned classification information may include whether the record of debt-credit and borrow or lend money.It is right In above-mentioned attribute information, the embodiment of the present invention can be the data accurately opened from internet site (such as day cat, Amazon) Interface is obtained, and can also be using web crawlers technology, such as a kind of python (the explanation type computer programs of object-oriented Design language) realize the function of reptile, the attribute information of desired acquisition is crawled local computer equipment；For above-mentioned point For category information, the embodiment of the present invention can be carried out by obtaining the pertinent transaction information of bank card, credit card that user hold It determines, the related credit information of network loan platform that can also be bound by user is determined.

What deserves to be explained is above-mentioned debt-credit prediction scene is only a specific example, classification provided in an embodiment of the present invention Device construction method can carry out the actual traffic data under various application scenarios prediction classification, with strong applicability.

Attributive character is extracted in S102, dependence information.

Here, attributive character refers to carrying out treated result to above-mentioned attribute information.The embodiment of the present invention can be right Above-mentioned attribute information is filtered, type conversion, the processing such as derivative, with the attributive character that obtains that treated.Wherein, above-mentioned filtering Processing refers to being filtered missing information, duplicate message in attribute information etc. operation, and the above-mentioned type conversion process can be with It is that attribute information is normalized, under the unification a to referential of the data of separate sources, to compare up in this way Just significant, above-mentioned derivation process is referred to according to attribute information obtained additional attribute information for statistical analysis, such as For attribute information including shopping information, by the derivative to shopping information, can obtain certain user average shopping number, At most spend the associated statistical informations such as how much amount of money, the price range done shopping.The embodiment of the present invention can be based on actually obtaining Attribute information, adaptive carry out Feature Selection.

S103, using attributive character as initial argument, carried out corresponding classification information as initial dependent variable at least One wheel model training；Wherein, each round model training is participated in by least two candidate families；

S104, the combination by the model of error rate minimum in each round model training, as the grader for completing training.

The embodiment of the present invention for each round model training, can at least there are two candidate family participate in, wherein, Above-mentioned candidate family can be neural network model, SVM (Support Vector Machine, support vector machines) model, Arbitrary combination in logistic (recurrence) model, can also be the arbitrary combination of other disaggregated models.

What deserves to be explained is following manner, which may be used, in the embodiment of the present invention determines whether grader reaches convergence.First Whether kind of mode, the exercise wheel number that grader may be used in the embodiment of the present invention reach the judgement of default exercise wheel number (such as 3 wheel) Mode, if exercise wheel number reaches predetermined threshold value, it is determined that the output of above-mentioned grader has reached convergence, if frequency of training does not reach To predetermined threshold value, it is determined that not up to restrain.The second way, the embodiment of the present invention can also use the output category of grader As a result whether the output error between actual classification information (such as error 0.0001) is less than the judgment mode of default error, if Output error is less than default error, it is determined that the output of above-mentioned grader has reached convergence, if output error is less than or equal in advance If error, it is determined that not up to restrain.Either above-mentioned any judgment mode, after determining to reach convergence, by all wheel models The combination of the model of error rate minimum is as above-mentioned grader in type training.

For it is above-mentioned often wheel training for, referring to Fig. 2, grader construction method provided in an embodiment of the present invention further include as Lower content：

S201, the training data used based on epicycle training determine the argument value of current independent variable and when antecedents Dependent variable value is trained at least two candidate families for participating in epicycle training；

S202, according to epicycle train as a result, from participate in epicycle train at least two candidate families in determine error rate The first minimum candidate family；

S203, the error training sample of classification results mistake that the first candidate family obtains in epicycle training is determined；

S204, rule is updated based on default weight, the weight for the training sample that malfunctions is updated；

S205, the present weight according to multiple training samples carry out stratified sampling processing to multiple training samples, obtain down One wheel training needs the training data used, is trained into next round.

Here, in the embodiment of the present invention, first round training uses the argument value of initial argument and initially because becoming The dependent variable value of amount, to the above-mentioned participation first round, at least two candidate families of training are trained, and based on to whole training The classification results of sample determine the first minimum candidate family of error rate from above-mentioned at least two candidate family, should with basis The error training sample for the classification results mistake that first candidate family determines carries out weight update.It is different from first round training Be, for it is subsequent other wheel, use the training data that previous round determines to above-mentioned at least two candidate family of participation into Row training, it is identical with first round training, the classification results to whole training samples are all based on, are waited from above-mentioned at least two The first minimum candidate family of error rate is determined in modeling type.

Wherein, in order to highlight error training sample, the default weight update rule in the embodiment of the present invention refers to go out The weight of wrong training sample is turned up, and the corresponding weight by correct training sample is turned down.

In specific implementation process, the predictablity rate and efficiency of grader in order to balance in the embodiment of the present invention, is often taken turns The candidate family that uses of training can be the same or different, and repeat no more for identical in the case of, special in the case of different It illustrates：If last round of middle error rate is more than the candidate family of threshold value, next round does not just use.

In the embodiment of the present invention, for being trained except the first round in addition to other wheel training, the attribute for participating in epicycle training is special Sign changes with the variation of training data, specifically, referring to Fig. 3, the embodiment of the present invention is based on following step and carries out feature more Newly：

S301, training data to be used is needed to include training sample based on the determining epicycle training of upper wheel training end Feature, structure is for the new attributive character of epicycle training；

S302, new attributive character is determined as current independent variable, corresponding classification information is determined as working as antecedents.

Here, epicycle training is intended to training data to be used be needed to include instruction based on the determining epicycle training of upper wheel training Practice the feature of sample, build new attributive character, and respectively using new attributive character and corresponding classification information as current Independent variable and work as antecedents, the argument value of current independent variable and current is determined with the training data used based on epicycle training The dependent variable value of dependent variable is trained at least two candidate families for participating in epicycle training.

In the embodiment of the present invention, for being trained except the first round in addition to other wheel training, according to epicycle train as a result, from It participates in determining the first minimum candidate family of error rate at least two candidate families of epicycle training, referring to Fig. 4, specifically include Following steps：

S401, the attribute information based on multiple training samples determine the current independent variable of correspondence of each training sample oneself Variate-value and the dependent variable value when antecedents；

S402, the correspondence argument value of each training sample is inputted at least two candidate moulds that epicycle completes training respectively Type obtains the classification results of each training sample；

S403, the classification results according to obtained each training sample and corresponding classification information are trained from epicycle is participated in At least two candidate families in determine minimum the first candidate family of error rate.

Here, the embodiment of the present invention is by the argument value of the current independent variable of correspondence of determining each training sample and current The dependent variable value of dependent variable inputs at least two candidate families that epicycle completes training respectively, obtains the classification knot of each training sample Fruit, and according to the classification results of each training sample and the comparison result of corresponding classification information, from participate in epicycle training to The first minimum candidate family of error rate is determined in few two candidate families.

What deserves to be explained is the embodiment of the present invention is after the first minimum candidate family of error rate is determined, it will be according to default Error rate threshold judges lowest error rate, only in lowest error rate not up to default error rate threshold, just into next Wheel training, if lowest error rate reaches default error rate threshold (such as 0.5), training terminates to epicycle, does not continue to.

In addition, the embodiment of the present invention further includes before next round training is entered：Determining next round training is needed to make Training data carries out class imbalance processing.The embodiment of the present invention may be used top sampling method and carry out uneven processing, Downsapling method can also be used to carry out uneven processing, SMOTE (Synthetic Minority Over- can also be used Sampling Technique) the uneven processing of method progress.Wherein, above-mentioned top sampling method refers to the too low sample of comparative example This (that is, error training sample) duplicate sampling, so that the feature of this kind of sample is arrived by model learning；Above-mentioned Downsapling method refers to Be the excessively high sample of comparative example (that is, correct training sample) reduce frequency in sampling, to prevent this type of model overlearning The feature of sample；Above-mentioned SMOTE methods refer to analyzing and according to few minority class sample (that is, error training sample) Several classes of artificial synthesized new samples of sample are added in training data, so as to avoid over-fitting problem.

In view of the good characteristic of SMOTE methods, the embodiment of the present invention is preferably carried out according to SMOTE methods at imbalance Reason.That is, the embodiment of the present invention can synthesize preset quantity third training sample based on the distribution characteristics of the first training sample, The third training sample of synthesis is added in training data and is trained, to implement class imbalance processing.Above-mentioned The preset quantity of three training samples can be determined according to the distribution characteristics of the first training sample.

Grader in the embodiment of the present invention is to rely on the model of error rate minimum in each round model training, referring to Fig. 5 determines grader based on the model of above-mentioned all wheel error rate minimums, is realized especially by following steps：

S501, the default value relationship according to the lowest error rate and Model Weight determine the weight of the first candidate family； Wherein, default value relationship meets that error rate is smaller, and Model Weight is higher；

S502, the weighted array by the model of error rate minimum in each round model training and its corresponding Model Weight, make To complete the grader of training.

Here, default value relationship can be determined according to equation below：Wherein, α_mTable Show the Model Weight of the candidate family of the determining error rate minimum of m wheel training, e_mRepresent the determining lowest error of m wheel training Rate.

In addition, the embodiment of the present invention also by the model of error rate minimum in each round model training and based on above-mentioned formula it is true The weighted array of fixed Model Weight determines to complete the grader of training.

In addition, in the embodiment of the present invention, other wheel training in addition to the first round trains, use is according to previous round Classification results carry out stratified sampling processing determined by training data.Determine that next round is trained according to the classification results of the first round Training data to be used is needed, referring to Fig. 6, is realized especially by following steps：

S601, from multiple training samples, determine present weight be more than initial weight all or part of training sample, As the first training sample；

The determining quantity of the first training sample of S602, basis and preset quantity relationship, determine the current of corresponding number Weight is less than the second training sample of initial weight；

S603, the first training sample and the second training sample are corresponded into attribute information and classification information as next training in rotation Practicing needs training data to be used.

Here, it after weight update is carried out to training sample, can determine currently to weigh first from all training samples The great all or part of training sample in initial weight, as the first training sample, which corresponds to Wrong training sample, then according between the quantity of above-mentioned error training sample and error training sample and correct training sample Preset quantity relationship (number such as error training sample is equal to the number of correct training sample) determines the current power of corresponding number It is less than the second training sample of initial weight again, which corresponds to correct training sample, is instructed based on above-mentioned error Practice sample and the corresponding attribute information of correct training sample and classification information needs trained number to be used as next round training According to.

Similarly, according to the classification results of next round determine the next round training of the next round need training data to be used with The above-mentioned classification results according to the first round determine that next round training needs the concrete methods of realizing of training data to be used similar, with This analogizes, and this will not be repeated here.

Based on the grader that above-described embodiment is trained, the embodiment of the present invention additionally provides a kind of side for predicting classification Method, as shown in fig. 7, the flow chart of the method for prediction classification provided in an embodiment of the present invention, above-mentioned applied to computer equipment The method of prediction classification includes the following steps：

S701, the attribute information for obtaining target sample；

S702, the model of error rate minimum got for every training in rotation, the attribute information based on target sample, determining should Model uses the corresponding characteristic value of attributive character；

S703, it the corresponding characteristic value of each error rate least model is inputted into corresponding model respectively obtains each error rate minimum modulus The corresponding classification results of type；

S704, based on the corresponding Model Weight of each error rate least model, each classification results are weighted with summation, and will Obtain and value is determined as the classification results of target sample.

Here, feature extraction, the mistake got for every training in rotation are carried out to the attribute information of the target sample of acquisition first The accidentally model of rate minimum based on the attribute information of the target sample extracted, determines that the model is corresponding using attributive character The corresponding characteristic value of each error rate least model is finally inputted corresponding model and obtains each error rate least model by characteristic value respectively Corresponding classification results to be based on the corresponding Model Weight of each error rate least model, are weighted each classification results Summation, and the classification results for being determined as target sample with value that will be obtained.It as it can be seen that can be fast using advance trained grader It is fast efficiently to carry out classification prediction for target sample, and the precision predicted is higher, the degree of automation is also higher.

Based on same inventive concept, grader structure corresponding with grader construction method is additionally provided in the embodiment of the present invention Device is built, due to the principle that the device in the embodiment of the present invention solves the problems, such as and the above-mentioned grader construction method of the embodiment of the present invention It is similar, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.As shown in figure 8, implement for the present invention The structure diagram of grader construction device that example is provided, the grader construction device include：

Training data acquisition module 11, for obtaining the training dataset of multiple training samples；Wherein, training dataset packet Include attribute information and classification information；

Attributive character extraction module 12, for extracting attributive character in dependence information；

Model training module 13, for using attributive character as initial argument, using corresponding classification information as initial Dependent variable carries out at least one wheel model training；Wherein, each round model training is participated in by least two candidate families；

Grader builds module 14, for by the combination of the model of error rate minimum in each round model training, as complete Into trained grader.

In specific implementation, above-mentioned model training module 13 performs following operation specifically for being directed to often wheel training：

Determine the error training sample of classification results mistake that the first candidate family obtains in epicycle training；

According to the present weight of multiple training samples, stratified sampling processing is carried out to multiple training samples, obtains next round Training needs training data to be used, is trained into next round.

Above-mentioned grader construction device further includes：

Attributive character builds module 15, and trained number to be used is needed for terminating determining epicycle training based on upper wheel training According to the feature of included training sample, structure is for the new attributive character of epicycle training；And new attributive character is determined as Corresponding classification information is determined as working as antecedents by current independent variable.

In one embodiment, above-mentioned model training module 13 is believed specifically for the attribute based on multiple training samples Breath, the argument value for determining the current independent variable of correspondence of each training sample and the dependent variable value when antecedents；It will each instruct The correspondence argument value for practicing sample inputs at least two candidate families that epicycle completes training respectively, obtains point of each training sample Class result；According to the classification results of obtained each training sample and corresponding classification information, trained at least from epicycle is participated in The first minimum candidate family of error rate is determined in two candidate families.

Above-mentioned grader construction device further includes：

Model Weight determining module 16 for the default value relationship according to the lowest error rate and Model Weight, determines The weight of first candidate family；Wherein, default value relationship meets that error rate is smaller, and Model Weight is higher；

Above-mentioned grader builds module 14, specifically for by the model of error rate minimum in each round model training and its right The weighted array of Model Weight is answered, as the grader for completing training.

In another embodiment, above-mentioned model training module 13, specifically for from multiple training samples, determining to work as Preceding weight is more than all or part of training sample of initial weight, as the first training sample；According to the first determining training The quantity of sample and preset quantity relationship determine that the present weight of corresponding number is less than the second training sample of initial weight； And using the first training sample and the second training sample correspond to attribute information and classification information as next round training need to use Training data.

Above-mentioned grader construction device further includes：

Training sample synthesis module 17, for based on the distribution characteristics of the first training sample synthesis preset quantity third instruction Practice sample；

Above-mentioned model training module 13, specifically for the third of the first training sample, the second training sample and synthesis is instructed White silk sample corresponds to attribute information and classification information needs training data to be used as next round training.

Above-mentioned grader construction device further includes：

Uneven processing module, for training data to be used being needed to carry out class imbalance determining next round training Processing.

Threshold determination module, for determining the not up to default error rate threshold of lowest error rate.

Corresponding to the grader construction method in Fig. 1 to Fig. 6, the embodiment of the present invention additionally provides a kind of computer equipment 21, as shown in figure 9, the equipment includes memory 21, processor 22 and is stored on the memory 21 and can be in the processor 22 The computer program of upper operation, wherein, above-mentioned processor 22 realizes above-mentioned grader structure side when performing above computer program The step of method.

Specifically, above-mentioned memory 21 and processor 22 can be general memory 21 and processor 22, not do here It is specific to limit, when the computer program of 22 run memory 21 of processor storage, above-mentioned grader construction method is able to carry out, So as to solve the problems, such as that precision of prediction and prediction accuracy are poor caused by single sorting technique, to improve prediction essence Degree and prediction accuracy.

Corresponding to the grader construction method in Fig. 1 to Fig. 6, the embodiment of the present invention additionally provides a kind of computer-readable deposit Storage media is stored with computer program on the computer readable storage medium, which holds when being run by processor 22 The step of row above-mentioned grader construction method.

Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium Computer program when being run, above-mentioned grader construction method is able to carry out, so as to which the sorting technique for solving single is brought Precision of prediction and the problem of poor prediction accuracy, to improve precision of prediction and prediction accuracy.

Based on same inventive concept, prediction corresponding with the method for prediction classification is additionally provided in the embodiment of the present invention and is classified Device, the method for the principle solved the problems, such as due to the device in the embodiment of the present invention prediction classification above-mentioned with the embodiment of the present invention It is similar, therefore the implementation of device may refer to the implementation of method, overlaps will not be repeated.As shown in Figure 10, it is real for the present invention The schematic device of prediction classification that example is provided is applied, the device of prediction classification includes：

Attribute information acquisition module 31, for obtaining the attribute information of target sample；

Characteristic value determining module 32, for being directed to the model of error rate minimum that every training in rotation is got, based on target sample Attribute information, determine the model use the corresponding characteristic value of attributive character；

First classification results determining module 33, for the corresponding characteristic value of each error rate least model to be inputted correspondence respectively Model obtains the corresponding classification results of each error rate least model；

Second classification results determining module 34, for being based on the corresponding Model Weight of each error rate least model, to each point Class result is weighted summation, and obtain and value is determined as to the classification results of target sample.

Corresponding to the method that the prediction in Fig. 7 is classified, the embodiment of the present invention additionally provides a kind of computer equipment 40, such as schemes Shown in 11, which includes memory 41, processor 42 and is stored on the memory 41 and can be run on the processor 42 Computer program, wherein, above-mentioned processor 42 realizes the step of the method for above-mentioned prediction classification when performing above computer program Suddenly.

Specifically, above-mentioned memory 41 and processor 42 can be general memory 41 and processor 42, not do here It is specific to limit, when the computer program of 42 run memory 41 of processor storage, the method that above-mentioned prediction is classified is able to carry out, So as to solve the problems, such as that precision of prediction and prediction accuracy are poor caused by single sorting technique, to improve prediction essence Degree and prediction accuracy.

Corresponding to the method that the prediction in Fig. 7 is classified, the embodiment of the present invention additionally provides a kind of computer-readable storage medium Matter is stored with computer program on the computer readable storage medium, is performed when which is run by processor 42 The step of stating the method for prediction classification.

Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium Computer program when being run, the method for being able to carry out above-mentioned prediction classification, so as to which the sorting technique for solving single is brought Precision of prediction and the problem of poor prediction accuracy, to improve precision of prediction and prediction accuracy.

In embodiment provided by the present invention, it should be understood that disclosed device and method, it can be by others side Formula is realized.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, only one kind are patrolled Volume function divides, and can have other dividing mode in actual implementation, in another example, multiple units or component can combine or can To be integrated into another system or some features can be ignored or does not perform.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some communication interfaces, device or unit It connects, can be electrical, machinery or other forms.

The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical unit, you can be located at a place or can also be distributed to multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.

In addition, each functional unit in embodiment provided by the invention can be integrated in a processing unit, also may be used To be that each unit is individually physically present, can also two or more units integrate in a unit.

If the function is realized in the form of SFU software functional unit and is independent product sale or in use, can be with It is stored in a computer read/write memory medium.Based on such understanding, technical scheme of the present invention is substantially in other words The part contribute to the prior art or the part of the technical solution can be embodied in the form of software product, the meter Calculation machine software product is stored in a storage medium, is used including some instructions so that a computer equipment (can be People's computer, server or network equipment etc.) perform all or part of the steps of the method according to each embodiment of the present invention. And aforementioned storage medium includes：USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic disc or CD.

It should be noted that：Similar label and letter represents similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need to that it is further defined and explained in subsequent attached drawing, in addition, term " the One ", " second ", " third " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.

Finally it should be noted that：Embodiment described above, only specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, protection scope of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, it will be understood by those of ordinary skill in the art that：Any one skilled in the art In the technical scope disclosed by the present invention, it can still modify to the technical solution recorded in previous embodiment or can be light It is readily conceivable that variation or equivalent replacement is carried out to which part technical characteristic；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention.The protection in the present invention should all be covered Within the scope of.Therefore, protection scope of the present invention described should be subject to the protection scope in claims.

Claims

1. a kind of grader construction method, which is characterized in that including：

Obtain the training dataset of multiple training samples；Wherein, the training dataset includes attribute information and classification information；

Attributive character is extracted from the attribute information；

Using the attributive character as initial argument, an at least wheel model is carried out using corresponding classification information as initial dependent variable Type training；Wherein, each round model training is participated in by least two candidate families；

2. according to the method described in claim 1, it is characterized in that, often wheel training performs following operation：

The argument value that current independent variable is determined based on the training data that epicycle training uses and the dependent variable value when antecedents, At least two candidate families for participating in epicycle training are trained；

According to epicycle training as a result, determining error rate is minimum from least two candidate families for participating in epicycle training first Candidate family；And

According to the present weight of the multiple training sample, stratified sampling processing is carried out to multiple training samples, obtains next round Training needs training data to be used, is trained into next round.

3. according to the method described in claim 2, it is characterized in that, in addition to for being trained except the first round other wheel training, Before the argument value of corresponding independent variable and the dependent variable value of corresponding dependent variable are determined based on the training data that epicycle training uses, It further includes：

Based on the feature that the determining epicycle training of upper wheel training end needs training data to be used to include training sample, structure For the new attributive character of epicycle training；And

4. according to the method described in claim 2, it is characterized in that, other in addition to for being trained except the first round take turns training, root According to epicycle training as a result, determining the first minimum candidate mould of error rate from least two candidate families for participating in epicycle training Type specifically includes：

Based on the attribute information of the multiple training sample, the argument value of the current independent variable of correspondence of each training sample is determined And the dependent variable value when antecedents；

The correspondence argument value of each training sample is inputted into epicycle respectively and completes at least two trained candidate families, is obtained each The classification results of training sample；

According to the classification results of obtained each training sample and corresponding classification information, from at least two of participation epicycle training The first minimum candidate family of error rate is determined in candidate family.

5. according to claim 2-4 any one of them methods, which is characterized in that determining the first minimum candidate of error rate After model, further include：

According to the lowest error rate and the default value relationship of Model Weight, the weight of first candidate family is determined；Wherein, The default value relationship meets that error rate is smaller, and Model Weight is higher；

By the combination of the model of error rate minimum in each round model training, as the grader for completing training, specifically include：

By the weighted array of the model of error rate minimum in each round model training and its corresponding Model Weight, trained as completion Grader.

6. according to claim 2-4 any one of them methods, which is characterized in that according to the current power of the multiple training sample Weight carries out stratified sampling processing to multiple training samples, and obtaining next round training needs training data to be used, specifically includes：

From the multiple training sample, determine that present weight is more than all or part of training sample of initial weight, as First training sample；

According to the quantity of the first determining training sample and preset quantity relationship, determine that the present weight of corresponding number is less than just Second training sample of beginning weight；And

First training sample and second training sample are corresponded into attribute information and classification information as next training in rotation Practicing needs training data to be used.

7. according to the method described in claim 6, it is characterized in that, described train first training sample with described second Sample corresponds to attribute information and classification information and is trained before needing training data to be used as next round, further includes：

It is described that first training sample and second training sample are corresponded into attribute information and classification information as next Wheel training needs the training data used, including：

By the third training sample of first training sample, second training sample and the synthesis correspond to attribute information with And classification information needs training data to be used as next round training.

8. according to the method described in claim 6, it is characterized in that, before next round training is entered, further include：

9. according to claim 2-4 any one of them methods, which is characterized in that at least two times from participation epicycle training After the first minimum candidate family of error rate is determined in modeling type, further include：

Determine the not up to default error rate threshold of the lowest error rate.

A kind of 10. method based on the trained grader prediction classification of any one of claim 1 to 9, which is characterized in that packet It includes：

Obtain the attribute information of target sample；

For the model of error rate minimum that every training in rotation is got, based on the attribute information of the target sample, the model is determined The corresponding characteristic value of used attributive character；

The corresponding characteristic value of each error rate least model is inputted corresponding model respectively, and to obtain each error rate least model right respectively The classification results answered；

Based on the corresponding Model Weight of each error rate least model, each classification results are weighted with summation, and the sum that will be obtained Value is determined as the classification results of the target sample.