CN109034398A - Feature selection approach, device and storage medium based on federation's training - Google Patents

Feature selection approach, device and storage medium based on federation's training Download PDF

Info

Publication number
CN109034398A
CN109034398A CN201810918867.3A CN201810918867A CN109034398A CN 109034398 A CN109034398 A CN 109034398A CN 201810918867 A CN201810918867 A CN 201810918867A CN 109034398 A CN109034398 A CN 109034398A
Authority
CN
China
Prior art keywords
training
split
sample
feature
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810918867.3A
Other languages
Chinese (zh)
Other versions
CN109034398B (en
Inventor
成柯葳
范涛
刘洋
陈天健
杨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201810918867.3A priority Critical patent/CN109034398B/en
Publication of CN109034398A publication Critical patent/CN109034398A/en
Application granted granted Critical
Publication of CN109034398B publication Critical patent/CN109034398B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of feature selection approach based on federation's training, the following steps are included: carrying out federal training using the training sample that XGboost algorithm is aligned two, tree-model is promoted to construct gradient, wherein, it includes more regression trees that the gradient, which promotes tree-model, and a split vertexes of the regression tree correspond to a feature of training sample;The average yield value that the gradient promotes the corresponding split vertexes of same feature in tree-model is counted, and using the average yield value as the scoring of character pair;Scoring based on each feature carries out feature ordering and exports ranking results, for carrying out feature selecting, wherein if there is the feature for not corresponding to split vertexes in training sample, this feature uses default scoring.The invention also discloses a kind of feature selecting devices and computer readable storage medium based on federation's training.The present invention, which is realized, carries out federal training modeling using the training sample of different data side, and then realizes the feature selecting of multi-party sample data.

Description

Feature selection approach, device and storage medium based on federation's training
Technical field
The present invention relates to machine learning techniques field more particularly to a kind of feature selection approach based on federation's training, dress It sets and computer readable storage medium.
Background technique
Certain behaviors in current information epoch, people can be come out by Data Representation, such as consumer behavior, thus derivative Big data analysis is gone out, corresponding Analysis model of network behaviors has been constructed by machine learning, and then can classify to the behavior of people Or the behavioural characteristic based on user is predicted etc..
Usually all it is that stand-alone training is carried out to sample data by a side in existing machine learning techniques, that is to say that folk prescription is built Mould.Meanwhile the mathematical model based on foundation, it may be determined that the feature that sample characteristics concentrate significance level relatively high.However very In the big data analysis scene in multispan field, such as the existing consumer behavior of user, also there is a lend-borrow action, and consumer consumption behavior number According to generation in consumer service provider, and user's lend-borrow action data are generated in financial service provider, if financial service mentions Supplier needs the lend-borrow action of the prediction user of the consumer behavior feature based on user, then needs disappearing using consumer service provider Expense behavioral data simultaneously carries out machine learning together with the lend-borrow action data of we to construct prediction model.
Therefore, for above-mentioned application scenarios, a kind of new modeling pattern is needed to realize the sample of different data provider The joint training of data, and then realize that both sides participate in modeling jointly.
Summary of the invention
The main purpose of the present invention is to provide a kind of feature selection approach, device and computers based on federation's training can Read storage medium, it is intended to which solving the prior art cannot achieve the joint training of sample data of different data provider, Jin Erwu Method realizes the technical issues of both sides participate in modeling jointly.
To achieve the above object, the present invention provides a kind of feature selection approach based on federation's training, described based on federation Trained feature selection approach the following steps are included:
Federal training is carried out using the training sample that XGboost algorithm is aligned two, promotes tree-model to construct gradient, Wherein, it includes more regression trees that the gradient, which promotes tree-model, and a split vertexes of the regression tree correspond to training sample One feature;
It counts the gradient and promotes the average yield value of the corresponding split vertexes of same feature in tree-model, and described will put down Equal scoring of the financial value as character pair;
Scoring based on each feature carries out feature ordering and exports ranking results, for carrying out feature selecting, wherein if instruction Practice the feature for existing in sample and not corresponding to split vertexes, then this feature uses default scoring.
Optionally, the training sample of described two alignment is respectively the first training sample and the second training sample;
The first training sample attribute includes sample ID and part sample characteristics, the second training sample attribute packet Include sample ID, another part sample characteristics and data label;
First training sample is provided by the first data side and is stored in the first data side local, the second training sample This is provided by the second data side and is stored in the second data side local.
Optionally, the training sample being aligned using XGboost algorithm to two carries out federal training, to construct gradient Promoting tree-model includes:
In second data side side, the First-order Gradient of each training sample in the corresponding sample set of epicycle node split is obtained With second order gradient;
If epicycle node split is the first run node split for constructing regression tree, to the First-order Gradient and two ladder Degree is sent to the first data side after being encrypted together with the sample ID of the sample set, in the first data side The First-order Gradient and the second order gradient of the side group in encryption calculate local training sample corresponding with the sample ID every The financial value of split vertexes under a kind of divisional mode;
If epicycle node split is the non-first run node split for constructing regression tree, the sample ID of the sample set is sent To the first data side, in first data side lateral edge First-order Gradient used in first run node split and second order Gradient calculates the financial value of local training sample split vertexes under each divisional mode corresponding with the sample ID;
Second data side receives the encryption financial value for all split vertexes that the first data side returns and is decrypted;
The local and sample is calculated based on the First-order Gradient and the second order gradient in second data side side The financial value of the corresponding training sample of ID split vertexes under each divisional mode;
Based on the financial value of the respective calculated all split vertexes of both sides, best point of the overall situation of epicycle node split is determined Split node;
The best split vertexes of the overall situation based on epicycle node split, divide the corresponding sample set of present node, raw The node of Cheng Xin is to construct the regression tree that gradient promotes tree-model.
Optionally, described in second data side side, obtain each trained sample in the corresponding sample set of epicycle node split Before the step of this First-order Gradient and second order gradient, further includes:
When carrying out node split, judge whether epicycle node split corresponds to first regression tree of construction;
If epicycle node split first regression tree of corresponding construction, judge whether epicycle node split is first recurrence of construction The first run node split of tree;
If epicycle node split is the first run node split for constructing first regression tree, in second data side side, just The First-order Gradient of each training sample and second order gradient in the corresponding sample set of beginningization epicycle node split;If epicycle node split is The non-first run node split for constructing first regression tree, then continue to use First-order Gradient used in first run node split and second order gradient;
If epicycle node split is corresponding to construct non-first regression tree, judge whether epicycle node split is construction non-first The first run node split of regression tree;
If epicycle node split is the first run node split for constructing non-first regression tree, more according to last round of federal training New First-order Gradient and second order gradient;If epicycle node split is the non-first run node split for constructing non-first regression tree, continue to use First-order Gradient used in first run node split and second order gradient.
Optionally, the feature selection approach based on federation's training further include:
When generating new node to construct the regression tree of gradient promotion tree-model, in second data side side, judgement Whether the depth of epicycle regression tree reaches predetermined depth threshold value;
If the depth of epicycle regression tree reaches the predetermined depth threshold value, Stop node division obtains gradient boosted tree Otherwise one regression tree of model continues next round node split.
Optionally, the feature selection approach based on federation's training further include:
When Stop node division, in second data side side, judge whether the total quantity of epicycle regression tree reaches pre- If amount threshold;
If the total quantity of epicycle regression tree reaches the preset quantity threshold value, stop federal training, otherwise continues next The federal training of wheel.
Optionally, the feature selection approach based on federation's training further include:
In second data side side, the related letter for the best split vertexes of the overall situation that each round node split determines is recorded Breath;
Wherein, the relevant information include: the provider of corresponding sample data, corresponding sample data feature coding and Financial value.
Optionally, the statistics gradient promotes the average yield value of the corresponding split vertexes of same feature in tree-model Include:
In second data side side, is promoted in tree-model using each global best split vertexes as the gradient and respectively returned The split vertexes of tree count the average yield value of the corresponding split vertexes of same feature coding.
Further, to achieve the above object, the present invention also provides a kind of feature selecting device based on federation's training, institutes The feature selecting device based on federation's training is stated to include memory, processor and be stored on the memory and can be described The feature selecting program run on processor realizes as above any one institute when the feature selecting program is executed by the processor The step of feature selection approach based on federation's training stated.
Further, to achieve the above object, the present invention also provides a kind of computer readable storage medium, the computers It is stored with feature selecting program on readable storage medium storing program for executing, as above any one is realized when the feature selecting program is executed by processor The step of described feature selection approach based on federation's training.
The present invention carries out federal training using the training sample that XGboost algorithm is aligned two, to construct gradient promotion Tree-model, wherein it is regression tree set that gradient, which promotes tree-model, comprising there are more regression trees, one point of every regression tree Split the feature that node corresponds to training sample;Pass through the corresponding split vertexes of feature same in statistical gradient promotion tree-model Average yield value, using average yield value as the scoring of character pair, and then realize to the features of two training sample data into Row marking;Finally the scoring based on each feature carries out feature ordering and exports ranking results again, for carrying out feature selecting, In, it scores higher, the importance of feature is also higher.The present invention, which is realized, carries out federal instruction using the training sample of different data side Practice modeling, and then realizes the feature selecting of multi-party sample data.
Detailed description of the invention
Fig. 1 is the knot for the hardware running environment being related to the present invention is based on the feature selecting Installation practice scheme of federation's training Structure schematic diagram;
Fig. 2 is that the present invention is based on the flow diagrams of one embodiment of feature selection approach of federation's training;
Fig. 3 is the refinement flow diagram of mono- embodiment of step S10 in Fig. 2;
Fig. 4 is that the present invention is based on the training result schematic diagrames of one embodiment of feature selection approach of federation's training.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
The present invention provides a kind of feature selecting device based on federation's training.
As shown in Figure 1, Fig. 1 is the hardware fortune being related to the present invention is based on the feature selecting Installation practice scheme of federation's training The structural schematic diagram of row environment.
The present invention is based on the feature selecting devices of federation's training can be PC, and being also possible to server etc. has meter The equipment for calculating processing capacity.
As shown in Figure 1, the feature selecting device based on federation's training may include: processor 1001, such as CPU, network Interface 1004, user interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is for realizing these groups Connection communication between part.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 is optional May include standard wireline interface and wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, It is also possible to stable memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally may be used also To be independently of the storage device of aforementioned processor 1001.
It will be understood by those skilled in the art that the feature selecting apparatus structure based on federation's training shown in Fig. 1 is not The restriction of structure twin installation may include perhaps combining certain components or different portions than illustrating more or fewer components Part arrangement.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe module, Subscriber Interface Module SIM and file copy program.
In feature selecting device based on federation's training shown in Fig. 1, network interface 1004 is mainly used for connection backstage Server carries out data communication with background server;User interface 1003 is mainly used for connecting client (user terminal), with client End carries out data communication;And processor 1001 can be used for calling the feature selecting program stored in memory 1005, and execute It operates below:
Federal training is carried out using the training sample that XGboost algorithm is aligned two, promotes tree-model to construct gradient, Wherein, it includes more regression trees that the gradient, which promotes tree-model, and a split vertexes of the regression tree correspond to training sample One feature;
It counts the gradient and promotes the average yield value of the corresponding split vertexes of same feature in tree-model, and described will put down Equal scoring of the financial value as character pair;
Scoring based on each feature carries out feature ordering and exports ranking results, for carrying out feature selecting, wherein if instruction Practice the feature for existing in sample and not corresponding to split vertexes, then this feature uses default scoring.
Further, the training sample of described two alignment is respectively the first training sample and the second training sample;It is described First training sample attribute includes sample ID and part sample characteristics, and the second training sample attribute includes sample ID, another A part of sample characteristics and data label;First training sample is provided by the first data side and is stored in the first data side Local, second training sample is provided by the second data side and is stored in the second data side local;The calling of processor 1001 is deposited The feature selecting program stored in reservoir 1005 also executes following operation:
In second data side side, the First-order Gradient of each training sample in the corresponding sample set of epicycle node split is obtained With second order gradient;
If epicycle node split is the first run node split for constructing regression tree, to the First-order Gradient and two ladder Degree is sent to the first data side after being encrypted together with the sample ID of the sample set, in the first data side The First-order Gradient and the second order gradient of the side group in encryption calculate local training sample corresponding with the sample ID every The financial value of split vertexes under a kind of divisional mode;
If epicycle node split is the non-first run node split for constructing regression tree, the sample ID of the sample set is sent To the first data side, in first data side lateral edge First-order Gradient used in first run node split and second order Gradient calculates the financial value of local training sample split vertexes under each divisional mode corresponding with the sample ID;
Second data side receives the encryption financial value for all split vertexes that the first data side returns and is decrypted;
The local and sample is calculated based on the First-order Gradient and the second order gradient in second data side side The financial value of the corresponding training sample of ID split vertexes under each divisional mode;
Based on the financial value of the respective calculated all split vertexes of both sides, best point of the overall situation of epicycle node split is determined Split node;
The best split vertexes of the overall situation based on epicycle node split, divide the corresponding sample set of present node, raw The node of Cheng Xin is to construct the regression tree that gradient promotes tree-model.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
When carrying out node split, judge whether epicycle node split corresponds to first regression tree of construction;
If epicycle node split first regression tree of corresponding construction, judge whether epicycle node split is first recurrence of construction The first run node split of tree;
If epicycle node split is the first run node split for constructing first regression tree, in second data side side, just The First-order Gradient of each training sample and second order gradient in the corresponding sample set of beginningization epicycle node split;If epicycle node split is The non-first run node split for constructing first regression tree, then continue to use First-order Gradient used in first run node split and second order gradient;
If epicycle node split is corresponding to construct non-first regression tree, judge whether epicycle node split is construction non-first The first run node split of regression tree;
If epicycle node split is the first run node split for constructing non-first regression tree, more according to last round of federal training New First-order Gradient and second order gradient;If epicycle node split is the non-first run node split for constructing non-first regression tree, continue to use First-order Gradient used in first run node split and second order gradient.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
In first data side side, the First-order Gradient and the second order gradient based on encryption calculate local and institute State the financial value of the corresponding training sample of sample ID split vertexes under each divisional mode;
Or in first data side side, First-order Gradient used in first run node split and second order gradient are continued to use, it counts Calculate the financial value of local training sample split vertexes under each divisional mode corresponding with the sample ID;
The second data side is sent to after encrypting to the financial value of all split vertexes.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
When generating new node to construct the regression tree of gradient promotion tree-model, in second data side side, judgement Whether the depth of epicycle regression tree reaches predetermined depth threshold value;
If the depth of epicycle regression tree reaches the predetermined depth threshold value, Stop node division obtains gradient boosted tree Otherwise one regression tree of model continues next round node split.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
When Stop node division, in second data side side, judge whether the total quantity of epicycle regression tree reaches pre- If amount threshold;
If the total quantity of epicycle regression tree reaches the preset quantity threshold value, stop federal training, otherwise continues next The federal training of wheel.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
In second data side side, the related letter for the best split vertexes of the overall situation that each round node split determines is recorded Breath;
Wherein, the relevant information include: the provider of corresponding sample data, corresponding sample data feature coding and Financial value.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
In second data side side, is promoted in tree-model using each global best split vertexes as the gradient and respectively returned The split vertexes of tree count the average yield value of the corresponding split vertexes of same feature coding.
Based on the hardware running environment that the above-mentioned feature selecting Installation practice scheme based on federation's training is related to, this is proposed The following embodiment of feature selection approach of the invention based on federation's training.
It is that the present invention is based on the flow diagrams of one embodiment of feature selection approach of federation's training referring to Fig. 2, Fig. 2.This In embodiment, it is described based on federation training feature selection approach the following steps are included:
Step S10 carries out federal training using the training sample that XGboost algorithm is aligned two, is mentioned with constructing gradient Rise tree-model, wherein it includes more regression trees, the corresponding instruction of a split vertexes of the regression tree that the gradient, which promotes tree-model, Practice a feature of sample;
XGboost (eXtreme Gradient Boosting) algorithm is in GBDT (Gradient Boosting Decision Tree, gradient boosted tree) improvement that Boosting algorithm is carried out on the basis of algorithm, the use of internal decision making tree Be regression tree, it includes more regression trees that algorithm output, which is the set of regression tree, and the basic ideas of training study are traversal instructions All dividing methods (namely mode of node split) for practicing all features of sample, select the dividing method of loss reduction, obtain Two leaves (namely split vertexes and generate new node), then proceed to traverse, until:
(1) stop splitting condition if meeting, export a regression tree;
(2) stop iterated conditional if meeting, export a regression tree set.
In the present embodiment, the training sample that XGboost algorithm uses is two independent training samples namely each instruction Practice sample and belongs to different data sides respectively.If two training samples are regarded as a whole training sample, due to two Training sample belongs to different data sides, therefore, can regard as and carry out cutting to whole training sample, and then training sample is The different characteristic of same sample (sample is longitudinal sectional).
Furthermore.Since two training samples belong to different data sides respectively, to realize federal training modeling, need Sample alignment is carried out to the raw sample data that both sides provide.
In the present embodiment, federation's training refers to sample training process by two data sides cooperate it is common complete, it is final trained To the gradient boosted tree model regression tree that includes, split vertexes correspond to the feature of both sides' training sample.
Step S20 counts the average yield value that the gradient promotes the corresponding split vertexes of same feature in tree-model, and Using the average yield value as the scoring of character pair;
In XGboost algorithm, when traversing all dividing methods of all features of training sample, evaluated by financial value The superiority and inferiority of dividing method, each split vertexes all select the dividing method of loss reduction.Therefore, the financial value of split vertexes can be made It is characterized the Appreciation gist of importance, the financial value of split vertexes is bigger, then node allocation loss is smaller, and then the split vertexes The importance of corresponding feature is also bigger.
It include more regression trees since the gradient that training obtains is promoted in tree-model in the present embodiment, and different recurrence For tree there is a possibility that carrying out node allocation with same characteristic features, therefore, it is necessary to statistical gradients to promote all recurrence that tree-model includes The average yield value of the corresponding split vertexes of same feature in tree, and using average yield value as the scoring of character pair.
Step S30, the scoring based on each feature carries out feature ordering and exports ranking results, for carrying out feature selecting, Wherein, if there is the feature for not corresponding to split vertexes in training sample, this feature uses default scoring.
In the present embodiment, the scoring height of feature represents the significance level of feature, the meeting after the scoring for obtaining each feature It carries out feature ordering and exports ranking results, for example sort from high to low, then the importance for coming the feature of front, which is higher than, to be come Subsequent feature.It therefore, can be with feature unrelated with sample predictions or classification in Rejection of samples by feature selecting.For example, learning Include in raw sample: gender, the rate of attendance, praises number at school grade, if class object is that excellent student and non-three are eager to learn It is raw, then feature gender obviously with whether be that excellent student is unrelated or association less, therefore can reject.
The present embodiment carries out federal training using the training sample that XGboost algorithm is aligned two, is mentioned with constructing gradient Rise tree-model, wherein it is regression tree set that gradient, which promotes tree-model, comprising there are more regression trees, one of every regression tree Split vertexes correspond to a feature of training sample;Pass through the corresponding split vertexes of feature same in statistical gradient promotion tree-model Average yield value, using average yield value as the scoring of character pair, and then realize the feature to two training sample data It gives a mark;Finally the scoring based on each feature carries out feature ordering and exports ranking results again, for carrying out feature selecting, In, it scores higher, the importance of feature is also higher.The present embodiment, which is realized, carries out federation using the training sample of different data side Training modeling, and then realize the feature selecting of multi-party sample data.
Further, the specific implementation of joint training of the invention for ease of description, the present embodiment is specifically with two Independent training sample is illustrated.
In the present embodiment, the first data side provide the first training sample, the first training sample attribute include sample ID and Part sample characteristics;Second data side provides the second training sample, and the second training sample attribute includes sample ID, another part sample Eigen and data label.
Wherein, sample characteristics refer to that the feature that sample shows or has, such as sample are behaved, then corresponding sample characteristics It can be age, gender, income, educational background etc..Data label is for classifying to multiple and different samples, the result tool of classification The feature that body is dependent on sample carries out determining to obtain.
The major significance that federal training of the invention is modeled is to realize the two-way secret protection of both sides' sample data.Cause This, in federal training process, the first training sample is stored in the first data side local, and the second training sample is stored in the second number According to square local, such as in following table 1, data are provided by the first data side and are stored in the first data side local, number in surface table 2 It is local according to being provided by the second data side and being stored in the second data side.
Table 1
Sample ID Age Gender Amount of given credit
X1 20 1 5000
X2 30 1 300000
X3 35 0 250000
X4 48 0 300000
X5 10 1 200
As shown in Table 1, the first training sample attribute include sample ID (X1~X5), Age feature, Gender feature with And Amount of given credit feature.
Table 2
Sample ID Bill Payment Education Lable
X1 3102 2 24
X2 17250 3 14
X3 14027 2 16
X4 6787 1 10
X5 280 1 26
Shown in table 2 as above, the second training sample attribute include sample ID (X1~X5), Bill Payment feature, Education feature and data label Lable.
It further, is the refinement flow diagram of mono- embodiment of step S10 in Fig. 2 referring to Fig. 3, Fig. 3.Based on above-mentioned reality Apply example, in the present embodiment, above-mentioned steps S10 is specifically included:
Step S101 obtains each training sample in the corresponding sample set of epicycle node split in second data side side First-order Gradient and second order gradient;
XGboost algorithm is a kind of machine learning modeling method, is needed using classifier (namely classification function) sample Data are mapped to some in given classification, predict so as to be applied to data.Utilizing classifier learning classification rule In the process, need to judge using loss function the error of fitting size of machine learning.
In the present embodiment, when carrying out node split every time, in the second data side side, it is corresponding to obtain epicycle node split The First-order Gradient of each training sample and second order gradient in sample set.
Wherein, gradient promotion tree-model needs to carry out the training of more wheel federations, and the training of each round federation is corresponding to be generated one time Gui Shu, and the generation of a regression tree needs to carry out multiple node split.
Therefore, in each round federation training process, node split uses the training sample for most starting to save for the first time, Node split next time then will use the training sample that new node caused by last node split corresponds to sample set, and In the federal training process of same wheel, each round node split all continues to use First-order Gradient used in first run node split and two ladders Degree.And federation's training of next round will use last round of federal training result and update a ladder used in last round of federal training Degree and second order gradient.
XGboost algorithm supports customized loss function, asks single order inclined objective function using customized loss function Derivative and second-order partial differential coefficient, the corresponding First-order Gradient and second order gradient for obtaining local sample data to be trained.
Therefore the explanation for promoting tree-model in based on the above embodiment for XGboost algorithm and gradient constructs regression tree It needs to be determined that split vertexes, and split vertexes can be determined by financial value.The calculation formula of financial value gain is as follows:
Wherein, ILRepresent the sample set for including of present node division rear left child node, IRAfter representing present node division The sample set for including of right child node, giIndicate the First-order Gradient of sample i, hiIndicate the second order gradient of sample i, λ, γ are normal Number.
Since sample data to be trained is respectively present the first data side and the second data side, therefore, it is necessary in the first number The financial value of respective sample data split vertexes under each divisional mode is calculated separately according to square side and the second data side side.
In the present embodiment, it is aligned since the first data side has carried out sample with the second data side in advance, thus both sides have Therefore identical Gradient Features, are based on the second data simultaneously because data label is present in the sample data of the second data side The First-order Gradient and second order gradient of the sample data of side, calculate both sides' sample data split vertexes under each divisional mode Financial value.
Step S102, if epicycle node split be construct regression tree first run node split, to the First-order Gradient with The second order gradient is sent to the first data side together with the sample ID of the sample set after being encrypted, for described The First-order Gradient and the second order gradient of the first data side's side group in encryption, calculate local instruction corresponding with the sample ID Practice the financial value of sample split vertexes under each divisional mode;
In the present embodiment, to realize the two-way secret protection for realizing both sides' sample data in federal training process, therefore, if Epicycle node split is the first run node split for constructing regression tree, then the single order of sample data is calculated in the second data side side After gradient and second order gradient, is first encrypted, be then then forwarded to the first data side.
In the first data side side, First-order Gradient and second order gradient and above-mentioned income based on the sample data received The receipts of first data side local sample data split vertexes under each divisional mode are calculated in the calculation formula of value gain Benefit value, since First-order Gradient and second order gradient are encrypted, the financial value being calculated is also secret value, thus nothing Financial value need to be encrypted.
Under the various partitioning schemes for calculating sample data after the financial value of split vertexes, generation new node can be divided To construct regression tree.The present embodiment is preferably had the leading building gradient boosted tree in the second data side of data label by sample data The regression tree of model.Therefore, it is necessary to the first data side local sample datas that will be calculated in the first data side side each The financial value of split vertexes is sent to the second data side under kind divisional mode.
Step S103, if epicycle node split is the non-first run node split for constructing regression tree, by the sample set Sample ID is sent to the first data side, in first data side lateral edge single order used in first run node split Gradient and second order gradient calculate local training sample split vertexes under each divisional mode corresponding with the sample ID Financial value;
It, only need to be by epicycle section if epicycle node split is the non-first run node split for constructing regression tree in the present embodiment The sample ID of the corresponding sample set of dot splitting is sent to the first data side, and when the first data side continues to continue to use first run node split Used First-order Gradient and second order gradient calculate local training sample corresponding with the sample ID received in each division The financial value of split vertexes under mode.
Step S104, the second data side receive the encryption financial value for all split vertexes that the first data side returns simultaneously It is decrypted;
Step S105, in second data side side, based on the First-order Gradient and the second order gradient, calculate it is local with The financial value of the corresponding training sample split vertexes under each divisional mode of the sample ID;
In the second data side side, First-order Gradient and second order gradient and above-mentioned receipts based on the sample data being calculated The calculation formula of beneficial value gain calculates the local sample data to be trained in the second data side and divides section under each divisional mode The financial value of point.
Step S106 determines epicycle node split based on the financial value of the respective calculated all split vertexes of both sides Global best split vertexes;
Since the initial sample data of both sides has carried out sample alignment, respectively calculated all divisions save both sides The financial value of point can regard the financial value to both sides' overall data sample split vertexes under each divisional mode as, because This, by comparing the size of financial value, using the maximum split vertexes of financial value as best point of the overall situation of epicycle node split Split node.
It should be noted that the best corresponding sample characteristics of split vertexes of the overall situation be both likely to belong to the first data side Training sample, it is also possible to belong to the training sample of the second data side.
Optionally, it is dominated since the regression tree that gradient promotes tree-model is constructed by the second data side, in the second data Square side needs to record the relevant information for the best split vertexes of the overall situation that each round node split determines;Relevant information includes: correspondence The provider of sample data, the feature coding and financial value for corresponding to sample data.
For example, if data side A holds the corresponding feature f of global optimal partition pointi, then this is recorded as (SiteA, EA (fi),gain).Conversely, if data side B holds the corresponding feature f of global optimal partition pointi, then this is recorded as (Site B, EB (fi),gain).Wherein, EA(fi) indicate data side A to feature fiIt is encoded, EB(fi) indicate data side B to feature fiIt carries out Coding can indicate feature f by codingiWithout revealing its initial characteristic data.
Optionally, when carrying out feature selecting in the above-described embodiments, preferably using each global best split vertexes as gradient The split vertexes for promoting each regression tree in tree-model, count the average yield value of the corresponding split vertexes of same feature coding.
Step S107, the best split vertexes of the overall situation based on epicycle node split, to the corresponding sample set of present node into Line splitting generates new node to construct the regression tree that gradient promotes tree-model.
If the best corresponding sample characteristics of split vertexes of the overall situation of epicycle node split belong to the training sample of the first data side This, then the corresponding sample data of present node of epicycle segmentation belongs to the first data side.Correspondingly, if epicycle node split it is complete The best corresponding sample characteristics of split vertexes of office belong to the training sample of the second data side, then the present node of epicycle segmentation is corresponding Sample data belong to the second data side.
By node split, that is, new node (left child node and right child node) is produced, to construct regression tree.And lead to Excessive wheel node split, then can be continuously generated new node, and then obtain the tree deeper regression tree of depth, and if Stop node The regression tree that gradient promotes tree-model then can be obtained in division.
In the present embodiment, since the data that both sides calculate communication are all the encryption data of model intermediate result, training Process will not reveal initial characteristic data.Guarantee the privacy of data in entire training process using Encryption Algorithm simultaneously. It is preferred that using part homomorphic encryption algorithm, additive homomorphism is supported.
Further, in one embodiment, the difference based on node split condition, is used for especially by following manner The First-order Gradient and second order gradient of the training sample of node split:
1, first regression tree of the corresponding construction of epicycle node split
If 1.1, epicycle node split is the first run node split for constructing first regression tree, in the second data side side, just The First-order Gradient of each training sample and second order gradient in the corresponding sample set of beginningization epicycle node split;
If 1.2, epicycle node split is the non-first run node split for constructing first regression tree, first run node split is continued to use Used First-order Gradient and second order gradient.
2, epicycle node split is corresponding constructs non-first regression tree
If 2.1, the corresponding first run node split for constructing non-first regression tree of epicycle node split, according to last round of federation Training updates First-order Gradient and second order gradient;
If 2.2, epicycle node split is the non-first run node split for constructing non-first regression tree, first run node point is continued to use First-order Gradient used in splitting and second order gradient.
Further, in one embodiment, be reduce the complexity of regression tree, therefore the depth threshold of default regression tree with Carry out node split limitation.
In the present embodiment, when each round, which generates new node, promotes the regression tree of tree-model to construct gradient, second Data side side, judges whether the depth of epicycle regression tree reaches predetermined depth threshold value;
If the depth of epicycle regression tree reaches predetermined depth threshold value, Stop node division, and then obtains gradient boosted tree Otherwise one regression tree of model continues next round node split.
It should be noted that the condition of limitation node split is also possible to the Stop node point when node cannot continue division It splits, such as the corresponding sample of present node, then can not continue node split.
Further, in another embodiment, to avoid training process overfitting, therefore the quantity threshold of regression tree is preset Value is to limit the generation quantity of regression tree.
In the present embodiment, when Stop node division, in the second data side side, judge epicycle regression tree total quantity whether Reach preset quantity threshold value;
If the total quantity of epicycle regression tree reaches preset quantity threshold value, stop federal training, otherwise continues next round connection Nation's training.
It should be noted that the condition of the generation quantity of limitation regression tree is also possible to stop when node cannot continue division Only construct regression tree.
For a better understanding of the invention, below based on sample data in table 1,2 in above-described embodiment, to federal instruction of the invention White silk is illustrated with modeling process.
First round federation training: first regression tree of training
(1) first round node split
1.1, in the second data side side, computational chart 2 sample data First-order Gradient (gi) and second order gradient (hi);To gi And hiThe first data side is sent to after being encrypted;
1.2, in the first data side side, it is based on giAnd hi, lower point of all possible divisional mode of sample data in computational chart 1 Split the financial value gain of node;Financial value gain is sent to the second data side;
Since Age feature with 5 kinds of sample data division modes, Gender feature there are 2 kinds of sample datas to divide in table 1 Mode, Amount of given credit 5 kinds of sample data division modes of feature, therefore, sample data has altogether in table 1 12 kinds of divisional modes, namely need to calculate the financial value of the corresponding split vertexes of 12 kinds of division modes.
1.3, in the second data side side, computational chart 2 under all possible divisional mode of sample data split vertexes receipts Beneficial value gain;
Due in table 2 Bill Payment feature with 5 kinds of sample data division modes, Education feature have 3 kinds Sample data division mode, therefore, sample data has 8 kinds of divisional modes altogether in table 2, namely needs to calculate 8 kinds of division sides The financial value of the corresponding split vertexes of formula.
1.4, from the financial value of the corresponding split vertexes of the calculated 12 kinds of division modes in the first data side side and from In the financial value of the corresponding split vertexes of the calculated 8 kinds of division modes in two data sides side, the corresponding spy of maximum return value is selected Levy the best split vertexes of the overall situation as epicycle node split;
1.5, the best split vertexes of the overall situation based on epicycle node split, divide the corresponding sample data of present node It splits, generates new node to construct the regression tree that gradient promotes tree-model.
1.6, judge whether the depth of epicycle regression tree reaches predetermined depth threshold value;If the depth of epicycle regression tree reaches pre- If depth threshold, then Stop node divides, and then obtains the regression tree that gradient promotes tree-model, otherwise continues next round section Dot splitting;
1.7, judge whether the total quantity of epicycle regression tree reaches preset quantity threshold value;If the total quantity of epicycle regression tree reaches To preset quantity threshold value, then stop federal training, otherwise continues the training of next round federation.
(2) second and third wheel node split
2.1, assume that the corresponding feature of last round of split vertexes is that Bill Payment is less than or equal to 3102, then this feature As split vertexes (corresponding sample be X1, X2, X3, X4, X5), two new partial nodes are generated, wherein left sibling is to should be less than Or the sample set (X1, X5) equal to 3102, and right node is to the sample set (X2, X3, X4) that should be greater than 3102, by sample set It closes (X1, X5) and sample set (X2, X3, X4) and continues second and third wheel node split respectively as new sample set, with right respectively Two new nodes are divided, and new node is generated.
2.2, since second and third wheel node split belongs to the federal training of same wheel, continue to continue to use first round node point Sample gradient value used in splitting.Assuming that the corresponding feature of a split vertexes of epicycle is Amount of given credit Less than or equal to 200, then this feature generates two new partial nodes, wherein left as split vertexes (corresponding sample is X1, X5) The corresponding sample X5 less than or equal to 200 of node, and right node is to the sample X1 that should be greater than 200;Similarly, epicycle another The corresponding feature of split vertexes is that Age is less than or equal to 35, then this feature is as split vertexes (corresponding sample be X2, X3, X4), Generate two new partial nodes, wherein left sibling it is corresponding be less than or equal to 35 sample X2, X3, and right node is to should be greater than 35 Sample X4.Specific implementation flow refers to first round node split process.
The federal training of second wheel: second regression tree of training
3.1, it since epicycle node split belongs to the training of next round federation, is updated with last round of federal training result First-order Gradient and second order gradient used in the federal training of one wheel continue the federal training of the second wheel and carry out node split, to generate New node constructs next regression tree, and specific implementation flow refers to the building process of previous regression tree.
3.2, as shown in figure 4, sample data produces two after the training of two-wheeled federation in table 1,2 in above-described embodiment Regression tree, first regression tree includes three split vertexes, is respectively: Bill Payment is less than or equal to 3102, Amount Of given credit is less than or equal to 200, Age and is less than or equal to 35;Second regression tree includes two split vertexes, point Be not: Bill Payment is less than or equal to 6787, Gender==1.
3.3, two regression trees of tree-model are promoted based on gradient as shown in Figure 4, the feature of sample data is corresponding flat Equal financial value: Bill Payment is (gain1+gain4)/2;Education is 0;Age is gain3;Gender is gain5; Amount of given credit is gain2.
The present invention also provides a kind of computer readable storage mediums.
Feature selecting program is stored on computer readable storage medium of the present invention, the feature selecting program is by processor The step of feature selection approach as described in the examples such as any of the above-described based on federation's training is realized when execution.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM), including some instructions are used so that a terminal (can be mobile phone, computer, server or network are set It is standby etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, it is all using equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, directly or indirectly Other related technical areas are used in, all of these belong to the protection of the present invention.

Claims (10)

1. a kind of feature selection approach based on federation's training, which is characterized in that the feature selecting side based on federation's training Method the following steps are included:
Federal training is carried out using the training sample that XGboost algorithm is aligned two, promotes tree-model to construct gradient, In, it includes more regression trees that the gradient, which promotes tree-model, and a split vertexes of the regression tree correspond to the one of training sample A feature;
Count the average yield value that the gradient promotes the corresponding split vertexes of same feature in tree-model, and by the average receipts Scoring of the benefit value as character pair;
Scoring based on each feature carries out feature ordering and exports ranking results, for carrying out feature selecting, wherein if training sample There is the feature for not corresponding to split vertexes in this, then this feature uses default scoring.
2. the feature selection approach as described in claim 1 based on federation's training, which is characterized in that the instruction of described two alignment Practicing sample is respectively the first training sample and the second training sample;
The first training sample attribute includes sample ID and part sample characteristics, and the second training sample attribute includes sample This ID, another part sample characteristics and data label;
First training sample provided by the first data side and be stored in the first data side local, second training sample by Second data side provides and is stored in the second data side local.
3. the feature selection approach as claimed in claim 2 based on federation's training, which is characterized in that described to use XGboost The training sample that algorithm is aligned two carries out federal training, includes: to construct gradient promotion tree-model
In second data side side, the First-order Gradient and two of each training sample in the corresponding sample set of epicycle node split is obtained Ladder degree;
If epicycle node split is the first run node split for constructing regression tree, to the First-order Gradient and the second order gradient into The first data side is sent to together with the sample ID of the sample set after row encryption, in first data side side group The First-order Gradient and the second order gradient in encryption calculate local training sample corresponding with the sample ID at each The financial value of split vertexes under divisional mode;
If epicycle node split is the non-first run node split for constructing regression tree, the sample ID of the sample set is sent to institute The first data side is stated, in first data side lateral edge First-order Gradient used in first run node split and two ladders Degree calculates the financial value of local training sample split vertexes under each divisional mode corresponding with the sample ID;
Second data side receives the encryption financial value for all split vertexes that the first data side returns and is decrypted;
Local and ID pairs of the sample is calculated based on the First-order Gradient and the second order gradient in second data side side The financial value of the training sample answered split vertexes under each divisional mode;
Based on the financial value of the respective calculated all split vertexes of both sides, the best division section of the overall situation of epicycle node split is determined Point;
The best split vertexes of the overall situation based on epicycle node split, divide the corresponding sample set of present node, generate new Node with construct gradient promoted tree-model regression tree.
4. the feature selection approach as claimed in claim 3 based on federation's training, which is characterized in that described in second number According to square side, the step of obtaining the First-order Gradient and second order gradient of each training sample in the corresponding sample set of epicycle node split it Before, further includes:
When carrying out node split, judge whether epicycle node split corresponds to first regression tree of construction;
If epicycle node split first regression tree of corresponding construction, judge whether epicycle node split is first regression tree of construction First run node split;
If epicycle node split is the first run node split for constructing first regression tree, in second data side side, initialization The First-order Gradient of each training sample and second order gradient in the corresponding sample set of epicycle node split;If epicycle node split is construction The non-first run node split of first regression tree, then continue to use First-order Gradient used in first run node split and second order gradient;
If epicycle node split is corresponding to construct non-first regression tree, judge whether epicycle node split is the non-first recurrence of construction The first run node split of tree;
If epicycle node split is the first run node split for constructing non-first regression tree, one is updated according to last round of federal training Ladder degree and second order gradient;If epicycle node split is the non-first run node split for constructing non-first regression tree, the first run is continued to use First-order Gradient used in node split and second order gradient.
5. the feature selection approach as claimed in claim 3 based on federation's training, which is characterized in that described based on federal training Feature selection approach further include:
When generating new node to construct the regression tree of gradient promotion tree-model, in second data side side, epicycle is judged Whether the depth of regression tree reaches predetermined depth threshold value;
If the depth of epicycle regression tree reaches the predetermined depth threshold value, Stop node division obtains gradient and promotes tree-model A regression tree, otherwise continue next round node split.
6. the feature selection approach as claimed in claim 5 based on federation's training, which is characterized in that described based on federal training Feature selection approach further include:
When Stop node division, in second data side side, judge whether the total quantity of epicycle regression tree reaches present count Measure threshold value;
If the total quantity of epicycle regression tree reaches the preset quantity threshold value, stop federal training, otherwise continues next round connection Nation's training.
7. the feature selection approach based on federation's training as described in any one of claim 3-6, which is characterized in that the base In the feature selection approach of federation's training further include:
In second data side side, the relevant information for the best split vertexes of the overall situation that each round node split determines is recorded;
Wherein, the relevant information includes: the feature coding and income of the provider of corresponding sample data, corresponding sample data Value.
8. the feature selection approach as claimed in claim 7 based on federation's training, which is characterized in that the statistics gradient The average yield value for promoting the corresponding split vertexes of same feature in tree-model includes:
In second data side side, each regression tree in tree-model is promoted using each global best split vertexes as the gradient Split vertexes count the average yield value of the corresponding split vertexes of same feature coding.
9. a kind of feature selecting device based on federation's training, which is characterized in that the feature selecting dress based on federation's training It sets including memory, processor and is stored in the feature selecting journey that can be run on the memory and on the processor Sequence is realized as of any of claims 1-8 when the feature selecting program is executed by the processor based on federation The step of trained feature selection approach.
10. a kind of computer readable storage medium, which is characterized in that be stored with feature choosing on the computer readable storage medium Program is selected, is realized when the feature selecting program is executed by processor as of any of claims 1-8 based on federation The step of trained feature selection approach.
CN201810918867.3A 2018-08-10 2018-08-10 Gradient lifting tree model construction method and device based on federal training and storage medium Active CN109034398B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810918867.3A CN109034398B (en) 2018-08-10 2018-08-10 Gradient lifting tree model construction method and device based on federal training and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810918867.3A CN109034398B (en) 2018-08-10 2018-08-10 Gradient lifting tree model construction method and device based on federal training and storage medium

Publications (2)

Publication Number Publication Date
CN109034398A true CN109034398A (en) 2018-12-18
CN109034398B CN109034398B (en) 2023-09-12

Family

ID=64633061

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810918867.3A Active CN109034398B (en) 2018-08-10 2018-08-10 Gradient lifting tree model construction method and device based on federal training and storage medium

Country Status (1)

Country Link
CN (1) CN109034398B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492420A (en) * 2018-12-28 2019-03-19 深圳前海微众银行股份有限公司 Model parameter training method, terminal, system and medium based on federation's study
CN109711556A (en) * 2018-12-24 2019-05-03 中国南方电网有限责任公司 Machine patrols data processing method, device, net grade server and provincial server
CN109934179A (en) * 2019-03-18 2019-06-25 中南大学 Human motion recognition method based on automated characterization selection and Ensemble Learning Algorithms
CN110297848A (en) * 2019-07-09 2019-10-01 深圳前海微众银行股份有限公司 Recommended models training method, terminal and storage medium based on federation's study
CN110851786A (en) * 2019-11-14 2020-02-28 深圳前海微众银行股份有限公司 Longitudinal federated learning optimization method, device, equipment and storage medium
CN110941963A (en) * 2019-11-29 2020-03-31 福州大学 Text attribute viewpoint abstract generation method and system based on sentence emotion attributes
CN110968886A (en) * 2019-12-20 2020-04-07 支付宝(杭州)信息技术有限公司 Method and system for screening training samples of machine learning model
CN110990829A (en) * 2019-11-21 2020-04-10 支付宝(杭州)信息技术有限公司 Method, device and equipment for training GBDT model in trusted execution environment
CN111079939A (en) * 2019-11-28 2020-04-28 支付宝(杭州)信息技术有限公司 Machine learning model feature screening method and device based on data privacy protection
CN111178538A (en) * 2019-12-17 2020-05-19 杭州睿信数据科技有限公司 Federated learning method and device for vertical data
CN111178408A (en) * 2019-12-19 2020-05-19 中国科学院计算技术研究所 Health monitoring model construction method and system based on federal random forest learning
CN111291417A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for protecting data privacy of multi-party combined training object recommendation model
CN111340614A (en) * 2020-02-28 2020-06-26 深圳前海微众银行股份有限公司 Sample sampling method and device based on federal learning and readable storage medium
CN111368901A (en) * 2020-02-28 2020-07-03 深圳前海微众银行股份有限公司 Multi-party combined modeling method, device and medium based on federal learning
CN111507479A (en) * 2020-04-15 2020-08-07 深圳前海微众银行股份有限公司 Feature binning method, device, equipment and computer-readable storage medium
CN111738359A (en) * 2020-07-24 2020-10-02 支付宝(杭州)信息技术有限公司 Two-party decision tree training method and system
WO2021000572A1 (en) * 2019-07-01 2021-01-07 创新先进技术有限公司 Data processing method and apparatus, and electronic device
WO2021082634A1 (en) * 2019-10-29 2021-05-06 支付宝(杭州)信息技术有限公司 Tree model-based prediction method and apparatus
CN113435537A (en) * 2021-07-16 2021-09-24 同盾控股有限公司 Cross-feature federated learning method and prediction method based on Soft GBDT
CN113657617A (en) * 2020-04-23 2021-11-16 支付宝(杭州)信息技术有限公司 Method and system for model joint training
CN113722987A (en) * 2021-08-16 2021-11-30 京东科技控股股份有限公司 Federal learning model training method and device, electronic equipment and storage medium
CN113723477A (en) * 2021-08-16 2021-11-30 同盾科技有限公司 Cross-feature federal abnormal data detection method based on isolated forest

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704966A (en) * 2017-10-17 2018-02-16 华南理工大学 A kind of Energy Load forecasting system and method based on weather big data
CN107767183A (en) * 2017-10-31 2018-03-06 常州大学 Brand loyalty method of testing based on combination learning and profile point
US20180089587A1 (en) * 2016-09-26 2018-03-29 Google Inc. Systems and Methods for Communication Efficient Distributed Mean Estimation
CN107993139A (en) * 2017-11-15 2018-05-04 华融融通(北京)科技有限公司 A kind of anti-fake system of consumer finance based on dynamic regulation database and method
CN108021984A (en) * 2016-11-01 2018-05-11 第四范式(北京)技术有限公司 Determine the method and system of the feature importance of machine learning sample
CN108257105A (en) * 2018-01-29 2018-07-06 南华大学 A kind of light stream estimation for video image and denoising combination learning depth network model
CN108375808A (en) * 2018-03-12 2018-08-07 南京恩瑞特实业有限公司 Dense fog forecasting procedures of the NRIET based on machine learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089587A1 (en) * 2016-09-26 2018-03-29 Google Inc. Systems and Methods for Communication Efficient Distributed Mean Estimation
CN108021984A (en) * 2016-11-01 2018-05-11 第四范式(北京)技术有限公司 Determine the method and system of the feature importance of machine learning sample
CN107704966A (en) * 2017-10-17 2018-02-16 华南理工大学 A kind of Energy Load forecasting system and method based on weather big data
CN107767183A (en) * 2017-10-31 2018-03-06 常州大学 Brand loyalty method of testing based on combination learning and profile point
CN107993139A (en) * 2017-11-15 2018-05-04 华融融通(北京)科技有限公司 A kind of anti-fake system of consumer finance based on dynamic regulation database and method
CN108257105A (en) * 2018-01-29 2018-07-06 南华大学 A kind of light stream estimation for video image and denoising combination learning depth network model
CN108375808A (en) * 2018-03-12 2018-08-07 南京恩瑞特实业有限公司 Dense fog forecasting procedures of the NRIET based on machine learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
H. BRENDAN MCMAHAN 等: "Communication-efficient learning of deep networks from decentralized data", 《ARTIFICIAL INTELLIGENCE AND STATISTICS》 *
JAKUB 等: "Federated learning strategies for improving communication efficiency", 《ARXIV.ORG》 *
STEPHEN HARDY 等: "Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption", 《ARXIV.ORG》 *
TIANQI CHEN 等: "XGBoost: A Scalable Tree Boosting System", 《KDD"16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING》 *
许裕栗 等: "Xgboost算法在区域用电预测中的应用", 《自动化仪表》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711556A (en) * 2018-12-24 2019-05-03 中国南方电网有限责任公司 Machine patrols data processing method, device, net grade server and provincial server
US11947680B2 (en) 2018-12-28 2024-04-02 Webank Co., Ltd Model parameter training method, terminal, and system based on federation learning, and medium
CN109492420B (en) * 2018-12-28 2021-07-20 深圳前海微众银行股份有限公司 Model parameter training method, terminal, system and medium based on federal learning
CN109492420A (en) * 2018-12-28 2019-03-19 深圳前海微众银行股份有限公司 Model parameter training method, terminal, system and medium based on federation's study
WO2020134704A1 (en) * 2018-12-28 2020-07-02 深圳前海微众银行股份有限公司 Model parameter training method based on federated learning, terminal, system and medium
CN109934179A (en) * 2019-03-18 2019-06-25 中南大学 Human motion recognition method based on automated characterization selection and Ensemble Learning Algorithms
WO2021000572A1 (en) * 2019-07-01 2021-01-07 创新先进技术有限公司 Data processing method and apparatus, and electronic device
CN110297848A (en) * 2019-07-09 2019-10-01 深圳前海微众银行股份有限公司 Recommended models training method, terminal and storage medium based on federation's study
CN110297848B (en) * 2019-07-09 2024-02-23 深圳前海微众银行股份有限公司 Recommendation model training method, terminal and storage medium based on federal learning
WO2021082634A1 (en) * 2019-10-29 2021-05-06 支付宝(杭州)信息技术有限公司 Tree model-based prediction method and apparatus
CN110851786A (en) * 2019-11-14 2020-02-28 深圳前海微众银行股份有限公司 Longitudinal federated learning optimization method, device, equipment and storage medium
CN110990829A (en) * 2019-11-21 2020-04-10 支付宝(杭州)信息技术有限公司 Method, device and equipment for training GBDT model in trusted execution environment
CN111079939A (en) * 2019-11-28 2020-04-28 支付宝(杭州)信息技术有限公司 Machine learning model feature screening method and device based on data privacy protection
CN110941963A (en) * 2019-11-29 2020-03-31 福州大学 Text attribute viewpoint abstract generation method and system based on sentence emotion attributes
CN111178538A (en) * 2019-12-17 2020-05-19 杭州睿信数据科技有限公司 Federated learning method and device for vertical data
CN111178538B (en) * 2019-12-17 2023-08-15 杭州睿信数据科技有限公司 Federal learning method and device for vertical data
CN111178408A (en) * 2019-12-19 2020-05-19 中国科学院计算技术研究所 Health monitoring model construction method and system based on federal random forest learning
CN110968886A (en) * 2019-12-20 2020-04-07 支付宝(杭州)信息技术有限公司 Method and system for screening training samples of machine learning model
CN111368901A (en) * 2020-02-28 2020-07-03 深圳前海微众银行股份有限公司 Multi-party combined modeling method, device and medium based on federal learning
CN111340614A (en) * 2020-02-28 2020-06-26 深圳前海微众银行股份有限公司 Sample sampling method and device based on federal learning and readable storage medium
CN111507479B (en) * 2020-04-15 2021-08-10 深圳前海微众银行股份有限公司 Feature binning method, device, equipment and computer-readable storage medium
CN111507479A (en) * 2020-04-15 2020-08-07 深圳前海微众银行股份有限公司 Feature binning method, device, equipment and computer-readable storage medium
CN113657617A (en) * 2020-04-23 2021-11-16 支付宝(杭州)信息技术有限公司 Method and system for model joint training
CN111291417A (en) * 2020-05-09 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for protecting data privacy of multi-party combined training object recommendation model
CN111738359A (en) * 2020-07-24 2020-10-02 支付宝(杭州)信息技术有限公司 Two-party decision tree training method and system
CN113435537A (en) * 2021-07-16 2021-09-24 同盾控股有限公司 Cross-feature federated learning method and prediction method based on Soft GBDT
CN113722987A (en) * 2021-08-16 2021-11-30 京东科技控股股份有限公司 Federal learning model training method and device, electronic equipment and storage medium
CN113723477A (en) * 2021-08-16 2021-11-30 同盾科技有限公司 Cross-feature federal abnormal data detection method based on isolated forest
CN113722987B (en) * 2021-08-16 2023-11-03 京东科技控股股份有限公司 Training method and device of federal learning model, electronic equipment and storage medium
CN113723477B (en) * 2021-08-16 2024-04-30 同盾科技有限公司 Cross-feature federal abnormal data detection method based on isolated forest

Also Published As

Publication number Publication date
CN109034398B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
CN109034398A (en) Feature selection approach, device and storage medium based on federation's training
CN109165683A (en) Sample predictions method, apparatus and storage medium based on federation's training
CN109299811B (en) Complex network-based fraud group recognition and risk propagation prediction method
WO2022110721A1 (en) Client category aggregation-based joint risk assessment method and related device
CN108763314A (en) A kind of interest recommends method, apparatus, server and storage medium
CN111932386B (en) User account determining method and device, information pushing method and device, and electronic equipment
US9838484B2 (en) Relevance estimation and actions based thereon
CN109753608A (en) Determine the method for user tag, the training method of autoencoder network and device
CN107291815A (en) Recommend method in Ask-Answer Community based on cross-platform tag fusion
CN112416986B (en) User portrait realizing method and system based on hierarchical personalized federal learning
CN107633257B (en) Data quality evaluation method and device, computer readable storage medium and terminal
CN107003834B (en) Pedestrian detection device and method
CN108446291A (en) The real-time methods of marking and points-scoring system of user credit
CN107741986A (en) User's behavior prediction and corresponding information recommend method and apparatus
Postigo-Boix et al. A social model based on customers’ profiles for analyzing the churning process in the mobile market of data plans
CN111767319A (en) Customer mining method and device based on fund flow direction
CN112101577A (en) XGboost-based cross-sample federal learning and testing method, system, device and medium
CN108876193A (en) A kind of air control model building method based on credit score
CN103366009A (en) Book recommendation method based on self-adaption clustering
CN107368499B (en) Client label modeling and recommending method and device
CN112817563A (en) Target attribute configuration information determination method, computer device, and storage medium
CN111984842B (en) Bank customer data processing method and device
CN112837078B (en) Method for detecting abnormal behavior of user based on clusters
CN106056137A (en) Telecom group service recommending method based on data mining multi-classification algorithm
CN106383738A (en) Task processing method and distributed computing framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant