CN109034398A - Feature selection approach, device and storage medium based on federation's training - Google Patents
Feature selection approach, device and storage medium based on federation's training Download PDFInfo
- Publication number
- CN109034398A CN109034398A CN201810918867.3A CN201810918867A CN109034398A CN 109034398 A CN109034398 A CN 109034398A CN 201810918867 A CN201810918867 A CN 201810918867A CN 109034398 A CN109034398 A CN 109034398A
- Authority
- CN
- China
- Prior art keywords
- training
- split
- sample
- feature
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of feature selection approach based on federation's training, the following steps are included: carrying out federal training using the training sample that XGboost algorithm is aligned two, tree-model is promoted to construct gradient, wherein, it includes more regression trees that the gradient, which promotes tree-model, and a split vertexes of the regression tree correspond to a feature of training sample;The average yield value that the gradient promotes the corresponding split vertexes of same feature in tree-model is counted, and using the average yield value as the scoring of character pair;Scoring based on each feature carries out feature ordering and exports ranking results, for carrying out feature selecting, wherein if there is the feature for not corresponding to split vertexes in training sample, this feature uses default scoring.The invention also discloses a kind of feature selecting devices and computer readable storage medium based on federation's training.The present invention, which is realized, carries out federal training modeling using the training sample of different data side, and then realizes the feature selecting of multi-party sample data.
Description
Technical field
The present invention relates to machine learning techniques field more particularly to a kind of feature selection approach based on federation's training, dress
It sets and computer readable storage medium.
Background technique
Certain behaviors in current information epoch, people can be come out by Data Representation, such as consumer behavior, thus derivative
Big data analysis is gone out, corresponding Analysis model of network behaviors has been constructed by machine learning, and then can classify to the behavior of people
Or the behavioural characteristic based on user is predicted etc..
Usually all it is that stand-alone training is carried out to sample data by a side in existing machine learning techniques, that is to say that folk prescription is built
Mould.Meanwhile the mathematical model based on foundation, it may be determined that the feature that sample characteristics concentrate significance level relatively high.However very
In the big data analysis scene in multispan field, such as the existing consumer behavior of user, also there is a lend-borrow action, and consumer consumption behavior number
According to generation in consumer service provider, and user's lend-borrow action data are generated in financial service provider, if financial service mentions
Supplier needs the lend-borrow action of the prediction user of the consumer behavior feature based on user, then needs disappearing using consumer service provider
Expense behavioral data simultaneously carries out machine learning together with the lend-borrow action data of we to construct prediction model.
Therefore, for above-mentioned application scenarios, a kind of new modeling pattern is needed to realize the sample of different data provider
The joint training of data, and then realize that both sides participate in modeling jointly.
Summary of the invention
The main purpose of the present invention is to provide a kind of feature selection approach, device and computers based on federation's training can
Read storage medium, it is intended to which solving the prior art cannot achieve the joint training of sample data of different data provider, Jin Erwu
Method realizes the technical issues of both sides participate in modeling jointly.
To achieve the above object, the present invention provides a kind of feature selection approach based on federation's training, described based on federation
Trained feature selection approach the following steps are included:
Federal training is carried out using the training sample that XGboost algorithm is aligned two, promotes tree-model to construct gradient,
Wherein, it includes more regression trees that the gradient, which promotes tree-model, and a split vertexes of the regression tree correspond to training sample
One feature;
It counts the gradient and promotes the average yield value of the corresponding split vertexes of same feature in tree-model, and described will put down
Equal scoring of the financial value as character pair;
Scoring based on each feature carries out feature ordering and exports ranking results, for carrying out feature selecting, wherein if instruction
Practice the feature for existing in sample and not corresponding to split vertexes, then this feature uses default scoring.
Optionally, the training sample of described two alignment is respectively the first training sample and the second training sample;
The first training sample attribute includes sample ID and part sample characteristics, the second training sample attribute packet
Include sample ID, another part sample characteristics and data label;
First training sample is provided by the first data side and is stored in the first data side local, the second training sample
This is provided by the second data side and is stored in the second data side local.
Optionally, the training sample being aligned using XGboost algorithm to two carries out federal training, to construct gradient
Promoting tree-model includes:
In second data side side, the First-order Gradient of each training sample in the corresponding sample set of epicycle node split is obtained
With second order gradient;
If epicycle node split is the first run node split for constructing regression tree, to the First-order Gradient and two ladder
Degree is sent to the first data side after being encrypted together with the sample ID of the sample set, in the first data side
The First-order Gradient and the second order gradient of the side group in encryption calculate local training sample corresponding with the sample ID every
The financial value of split vertexes under a kind of divisional mode;
If epicycle node split is the non-first run node split for constructing regression tree, the sample ID of the sample set is sent
To the first data side, in first data side lateral edge First-order Gradient used in first run node split and second order
Gradient calculates the financial value of local training sample split vertexes under each divisional mode corresponding with the sample ID;
Second data side receives the encryption financial value for all split vertexes that the first data side returns and is decrypted;
The local and sample is calculated based on the First-order Gradient and the second order gradient in second data side side
The financial value of the corresponding training sample of ID split vertexes under each divisional mode;
Based on the financial value of the respective calculated all split vertexes of both sides, best point of the overall situation of epicycle node split is determined
Split node;
The best split vertexes of the overall situation based on epicycle node split, divide the corresponding sample set of present node, raw
The node of Cheng Xin is to construct the regression tree that gradient promotes tree-model.
Optionally, described in second data side side, obtain each trained sample in the corresponding sample set of epicycle node split
Before the step of this First-order Gradient and second order gradient, further includes:
When carrying out node split, judge whether epicycle node split corresponds to first regression tree of construction;
If epicycle node split first regression tree of corresponding construction, judge whether epicycle node split is first recurrence of construction
The first run node split of tree;
If epicycle node split is the first run node split for constructing first regression tree, in second data side side, just
The First-order Gradient of each training sample and second order gradient in the corresponding sample set of beginningization epicycle node split;If epicycle node split is
The non-first run node split for constructing first regression tree, then continue to use First-order Gradient used in first run node split and second order gradient;
If epicycle node split is corresponding to construct non-first regression tree, judge whether epicycle node split is construction non-first
The first run node split of regression tree;
If epicycle node split is the first run node split for constructing non-first regression tree, more according to last round of federal training
New First-order Gradient and second order gradient;If epicycle node split is the non-first run node split for constructing non-first regression tree, continue to use
First-order Gradient used in first run node split and second order gradient.
Optionally, the feature selection approach based on federation's training further include:
When generating new node to construct the regression tree of gradient promotion tree-model, in second data side side, judgement
Whether the depth of epicycle regression tree reaches predetermined depth threshold value;
If the depth of epicycle regression tree reaches the predetermined depth threshold value, Stop node division obtains gradient boosted tree
Otherwise one regression tree of model continues next round node split.
Optionally, the feature selection approach based on federation's training further include:
When Stop node division, in second data side side, judge whether the total quantity of epicycle regression tree reaches pre-
If amount threshold;
If the total quantity of epicycle regression tree reaches the preset quantity threshold value, stop federal training, otherwise continues next
The federal training of wheel.
Optionally, the feature selection approach based on federation's training further include:
In second data side side, the related letter for the best split vertexes of the overall situation that each round node split determines is recorded
Breath;
Wherein, the relevant information include: the provider of corresponding sample data, corresponding sample data feature coding and
Financial value.
Optionally, the statistics gradient promotes the average yield value of the corresponding split vertexes of same feature in tree-model
Include:
In second data side side, is promoted in tree-model using each global best split vertexes as the gradient and respectively returned
The split vertexes of tree count the average yield value of the corresponding split vertexes of same feature coding.
Further, to achieve the above object, the present invention also provides a kind of feature selecting device based on federation's training, institutes
The feature selecting device based on federation's training is stated to include memory, processor and be stored on the memory and can be described
The feature selecting program run on processor realizes as above any one institute when the feature selecting program is executed by the processor
The step of feature selection approach based on federation's training stated.
Further, to achieve the above object, the present invention also provides a kind of computer readable storage medium, the computers
It is stored with feature selecting program on readable storage medium storing program for executing, as above any one is realized when the feature selecting program is executed by processor
The step of described feature selection approach based on federation's training.
The present invention carries out federal training using the training sample that XGboost algorithm is aligned two, to construct gradient promotion
Tree-model, wherein it is regression tree set that gradient, which promotes tree-model, comprising there are more regression trees, one point of every regression tree
Split the feature that node corresponds to training sample;Pass through the corresponding split vertexes of feature same in statistical gradient promotion tree-model
Average yield value, using average yield value as the scoring of character pair, and then realize to the features of two training sample data into
Row marking;Finally the scoring based on each feature carries out feature ordering and exports ranking results again, for carrying out feature selecting,
In, it scores higher, the importance of feature is also higher.The present invention, which is realized, carries out federal instruction using the training sample of different data side
Practice modeling, and then realizes the feature selecting of multi-party sample data.
Detailed description of the invention
Fig. 1 is the knot for the hardware running environment being related to the present invention is based on the feature selecting Installation practice scheme of federation's training
Structure schematic diagram;
Fig. 2 is that the present invention is based on the flow diagrams of one embodiment of feature selection approach of federation's training;
Fig. 3 is the refinement flow diagram of mono- embodiment of step S10 in Fig. 2;
Fig. 4 is that the present invention is based on the training result schematic diagrames of one embodiment of feature selection approach of federation's training.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
The present invention provides a kind of feature selecting device based on federation's training.
As shown in Figure 1, Fig. 1 is the hardware fortune being related to the present invention is based on the feature selecting Installation practice scheme of federation's training
The structural schematic diagram of row environment.
The present invention is based on the feature selecting devices of federation's training can be PC, and being also possible to server etc. has meter
The equipment for calculating processing capacity.
As shown in Figure 1, the feature selecting device based on federation's training may include: processor 1001, such as CPU, network
Interface 1004, user interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is for realizing these groups
Connection communication between part.User interface 1003 may include display screen (Display), input unit such as keyboard
(Keyboard), optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 is optional
May include standard wireline interface and wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory,
It is also possible to stable memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally may be used also
To be independently of the storage device of aforementioned processor 1001.
It will be understood by those skilled in the art that the feature selecting apparatus structure based on federation's training shown in Fig. 1 is not
The restriction of structure twin installation may include perhaps combining certain components or different portions than illustrating more or fewer components
Part arrangement.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium
Believe module, Subscriber Interface Module SIM and file copy program.
In feature selecting device based on federation's training shown in Fig. 1, network interface 1004 is mainly used for connection backstage
Server carries out data communication with background server;User interface 1003 is mainly used for connecting client (user terminal), with client
End carries out data communication;And processor 1001 can be used for calling the feature selecting program stored in memory 1005, and execute
It operates below:
Federal training is carried out using the training sample that XGboost algorithm is aligned two, promotes tree-model to construct gradient,
Wherein, it includes more regression trees that the gradient, which promotes tree-model, and a split vertexes of the regression tree correspond to training sample
One feature;
It counts the gradient and promotes the average yield value of the corresponding split vertexes of same feature in tree-model, and described will put down
Equal scoring of the financial value as character pair;
Scoring based on each feature carries out feature ordering and exports ranking results, for carrying out feature selecting, wherein if instruction
Practice the feature for existing in sample and not corresponding to split vertexes, then this feature uses default scoring.
Further, the training sample of described two alignment is respectively the first training sample and the second training sample;It is described
First training sample attribute includes sample ID and part sample characteristics, and the second training sample attribute includes sample ID, another
A part of sample characteristics and data label;First training sample is provided by the first data side and is stored in the first data side
Local, second training sample is provided by the second data side and is stored in the second data side local;The calling of processor 1001 is deposited
The feature selecting program stored in reservoir 1005 also executes following operation:
In second data side side, the First-order Gradient of each training sample in the corresponding sample set of epicycle node split is obtained
With second order gradient;
If epicycle node split is the first run node split for constructing regression tree, to the First-order Gradient and two ladder
Degree is sent to the first data side after being encrypted together with the sample ID of the sample set, in the first data side
The First-order Gradient and the second order gradient of the side group in encryption calculate local training sample corresponding with the sample ID every
The financial value of split vertexes under a kind of divisional mode;
If epicycle node split is the non-first run node split for constructing regression tree, the sample ID of the sample set is sent
To the first data side, in first data side lateral edge First-order Gradient used in first run node split and second order
Gradient calculates the financial value of local training sample split vertexes under each divisional mode corresponding with the sample ID;
Second data side receives the encryption financial value for all split vertexes that the first data side returns and is decrypted;
The local and sample is calculated based on the First-order Gradient and the second order gradient in second data side side
The financial value of the corresponding training sample of ID split vertexes under each divisional mode;
Based on the financial value of the respective calculated all split vertexes of both sides, best point of the overall situation of epicycle node split is determined
Split node;
The best split vertexes of the overall situation based on epicycle node split, divide the corresponding sample set of present node, raw
The node of Cheng Xin is to construct the regression tree that gradient promotes tree-model.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
When carrying out node split, judge whether epicycle node split corresponds to first regression tree of construction;
If epicycle node split first regression tree of corresponding construction, judge whether epicycle node split is first recurrence of construction
The first run node split of tree;
If epicycle node split is the first run node split for constructing first regression tree, in second data side side, just
The First-order Gradient of each training sample and second order gradient in the corresponding sample set of beginningization epicycle node split;If epicycle node split is
The non-first run node split for constructing first regression tree, then continue to use First-order Gradient used in first run node split and second order gradient;
If epicycle node split is corresponding to construct non-first regression tree, judge whether epicycle node split is construction non-first
The first run node split of regression tree;
If epicycle node split is the first run node split for constructing non-first regression tree, more according to last round of federal training
New First-order Gradient and second order gradient;If epicycle node split is the non-first run node split for constructing non-first regression tree, continue to use
First-order Gradient used in first run node split and second order gradient.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
In first data side side, the First-order Gradient and the second order gradient based on encryption calculate local and institute
State the financial value of the corresponding training sample of sample ID split vertexes under each divisional mode;
Or in first data side side, First-order Gradient used in first run node split and second order gradient are continued to use, it counts
Calculate the financial value of local training sample split vertexes under each divisional mode corresponding with the sample ID;
The second data side is sent to after encrypting to the financial value of all split vertexes.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
When generating new node to construct the regression tree of gradient promotion tree-model, in second data side side, judgement
Whether the depth of epicycle regression tree reaches predetermined depth threshold value;
If the depth of epicycle regression tree reaches the predetermined depth threshold value, Stop node division obtains gradient boosted tree
Otherwise one regression tree of model continues next round node split.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
When Stop node division, in second data side side, judge whether the total quantity of epicycle regression tree reaches pre-
If amount threshold;
If the total quantity of epicycle regression tree reaches the preset quantity threshold value, stop federal training, otherwise continues next
The federal training of wheel.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
In second data side side, the related letter for the best split vertexes of the overall situation that each round node split determines is recorded
Breath;
Wherein, the relevant information include: the provider of corresponding sample data, corresponding sample data feature coding and
Financial value.
Further, processor 1001 calls the feature selecting program stored in memory 1005 also to execute following operation:
In second data side side, is promoted in tree-model using each global best split vertexes as the gradient and respectively returned
The split vertexes of tree count the average yield value of the corresponding split vertexes of same feature coding.
Based on the hardware running environment that the above-mentioned feature selecting Installation practice scheme based on federation's training is related to, this is proposed
The following embodiment of feature selection approach of the invention based on federation's training.
It is that the present invention is based on the flow diagrams of one embodiment of feature selection approach of federation's training referring to Fig. 2, Fig. 2.This
In embodiment, it is described based on federation training feature selection approach the following steps are included:
Step S10 carries out federal training using the training sample that XGboost algorithm is aligned two, is mentioned with constructing gradient
Rise tree-model, wherein it includes more regression trees, the corresponding instruction of a split vertexes of the regression tree that the gradient, which promotes tree-model,
Practice a feature of sample;
XGboost (eXtreme Gradient Boosting) algorithm is in GBDT (Gradient Boosting
Decision Tree, gradient boosted tree) improvement that Boosting algorithm is carried out on the basis of algorithm, the use of internal decision making tree
Be regression tree, it includes more regression trees that algorithm output, which is the set of regression tree, and the basic ideas of training study are traversal instructions
All dividing methods (namely mode of node split) for practicing all features of sample, select the dividing method of loss reduction, obtain
Two leaves (namely split vertexes and generate new node), then proceed to traverse, until:
(1) stop splitting condition if meeting, export a regression tree;
(2) stop iterated conditional if meeting, export a regression tree set.
In the present embodiment, the training sample that XGboost algorithm uses is two independent training samples namely each instruction
Practice sample and belongs to different data sides respectively.If two training samples are regarded as a whole training sample, due to two
Training sample belongs to different data sides, therefore, can regard as and carry out cutting to whole training sample, and then training sample is
The different characteristic of same sample (sample is longitudinal sectional).
Furthermore.Since two training samples belong to different data sides respectively, to realize federal training modeling, need
Sample alignment is carried out to the raw sample data that both sides provide.
In the present embodiment, federation's training refers to sample training process by two data sides cooperate it is common complete, it is final trained
To the gradient boosted tree model regression tree that includes, split vertexes correspond to the feature of both sides' training sample.
Step S20 counts the average yield value that the gradient promotes the corresponding split vertexes of same feature in tree-model, and
Using the average yield value as the scoring of character pair;
In XGboost algorithm, when traversing all dividing methods of all features of training sample, evaluated by financial value
The superiority and inferiority of dividing method, each split vertexes all select the dividing method of loss reduction.Therefore, the financial value of split vertexes can be made
It is characterized the Appreciation gist of importance, the financial value of split vertexes is bigger, then node allocation loss is smaller, and then the split vertexes
The importance of corresponding feature is also bigger.
It include more regression trees since the gradient that training obtains is promoted in tree-model in the present embodiment, and different recurrence
For tree there is a possibility that carrying out node allocation with same characteristic features, therefore, it is necessary to statistical gradients to promote all recurrence that tree-model includes
The average yield value of the corresponding split vertexes of same feature in tree, and using average yield value as the scoring of character pair.
Step S30, the scoring based on each feature carries out feature ordering and exports ranking results, for carrying out feature selecting,
Wherein, if there is the feature for not corresponding to split vertexes in training sample, this feature uses default scoring.
In the present embodiment, the scoring height of feature represents the significance level of feature, the meeting after the scoring for obtaining each feature
It carries out feature ordering and exports ranking results, for example sort from high to low, then the importance for coming the feature of front, which is higher than, to be come
Subsequent feature.It therefore, can be with feature unrelated with sample predictions or classification in Rejection of samples by feature selecting.For example, learning
Include in raw sample: gender, the rate of attendance, praises number at school grade, if class object is that excellent student and non-three are eager to learn
It is raw, then feature gender obviously with whether be that excellent student is unrelated or association less, therefore can reject.
The present embodiment carries out federal training using the training sample that XGboost algorithm is aligned two, is mentioned with constructing gradient
Rise tree-model, wherein it is regression tree set that gradient, which promotes tree-model, comprising there are more regression trees, one of every regression tree
Split vertexes correspond to a feature of training sample;Pass through the corresponding split vertexes of feature same in statistical gradient promotion tree-model
Average yield value, using average yield value as the scoring of character pair, and then realize the feature to two training sample data
It gives a mark;Finally the scoring based on each feature carries out feature ordering and exports ranking results again, for carrying out feature selecting,
In, it scores higher, the importance of feature is also higher.The present embodiment, which is realized, carries out federation using the training sample of different data side
Training modeling, and then realize the feature selecting of multi-party sample data.
Further, the specific implementation of joint training of the invention for ease of description, the present embodiment is specifically with two
Independent training sample is illustrated.
In the present embodiment, the first data side provide the first training sample, the first training sample attribute include sample ID and
Part sample characteristics;Second data side provides the second training sample, and the second training sample attribute includes sample ID, another part sample
Eigen and data label.
Wherein, sample characteristics refer to that the feature that sample shows or has, such as sample are behaved, then corresponding sample characteristics
It can be age, gender, income, educational background etc..Data label is for classifying to multiple and different samples, the result tool of classification
The feature that body is dependent on sample carries out determining to obtain.
The major significance that federal training of the invention is modeled is to realize the two-way secret protection of both sides' sample data.Cause
This, in federal training process, the first training sample is stored in the first data side local, and the second training sample is stored in the second number
According to square local, such as in following table 1, data are provided by the first data side and are stored in the first data side local, number in surface table 2
It is local according to being provided by the second data side and being stored in the second data side.
Table 1
Sample ID | Age | Gender | Amount of given credit |
X1 | 20 | 1 | 5000 |
X2 | 30 | 1 | 300000 |
X3 | 35 | 0 | 250000 |
X4 | 48 | 0 | 300000 |
X5 | 10 | 1 | 200 |
As shown in Table 1, the first training sample attribute include sample ID (X1~X5), Age feature, Gender feature with
And Amount of given credit feature.
Table 2
Sample ID | Bill Payment | Education | Lable |
X1 | 3102 | 2 | 24 |
X2 | 17250 | 3 | 14 |
X3 | 14027 | 2 | 16 |
X4 | 6787 | 1 | 10 |
X5 | 280 | 1 | 26 |
Shown in table 2 as above, the second training sample attribute include sample ID (X1~X5), Bill Payment feature,
Education feature and data label Lable.
It further, is the refinement flow diagram of mono- embodiment of step S10 in Fig. 2 referring to Fig. 3, Fig. 3.Based on above-mentioned reality
Apply example, in the present embodiment, above-mentioned steps S10 is specifically included:
Step S101 obtains each training sample in the corresponding sample set of epicycle node split in second data side side
First-order Gradient and second order gradient;
XGboost algorithm is a kind of machine learning modeling method, is needed using classifier (namely classification function) sample
Data are mapped to some in given classification, predict so as to be applied to data.Utilizing classifier learning classification rule
In the process, need to judge using loss function the error of fitting size of machine learning.
In the present embodiment, when carrying out node split every time, in the second data side side, it is corresponding to obtain epicycle node split
The First-order Gradient of each training sample and second order gradient in sample set.
Wherein, gradient promotion tree-model needs to carry out the training of more wheel federations, and the training of each round federation is corresponding to be generated one time
Gui Shu, and the generation of a regression tree needs to carry out multiple node split.
Therefore, in each round federation training process, node split uses the training sample for most starting to save for the first time,
Node split next time then will use the training sample that new node caused by last node split corresponds to sample set, and
In the federal training process of same wheel, each round node split all continues to use First-order Gradient used in first run node split and two ladders
Degree.And federation's training of next round will use last round of federal training result and update a ladder used in last round of federal training
Degree and second order gradient.
XGboost algorithm supports customized loss function, asks single order inclined objective function using customized loss function
Derivative and second-order partial differential coefficient, the corresponding First-order Gradient and second order gradient for obtaining local sample data to be trained.
Therefore the explanation for promoting tree-model in based on the above embodiment for XGboost algorithm and gradient constructs regression tree
It needs to be determined that split vertexes, and split vertexes can be determined by financial value.The calculation formula of financial value gain is as follows:
Wherein, ILRepresent the sample set for including of present node division rear left child node, IRAfter representing present node division
The sample set for including of right child node, giIndicate the First-order Gradient of sample i, hiIndicate the second order gradient of sample i, λ, γ are normal
Number.
Since sample data to be trained is respectively present the first data side and the second data side, therefore, it is necessary in the first number
The financial value of respective sample data split vertexes under each divisional mode is calculated separately according to square side and the second data side side.
In the present embodiment, it is aligned since the first data side has carried out sample with the second data side in advance, thus both sides have
Therefore identical Gradient Features, are based on the second data simultaneously because data label is present in the sample data of the second data side
The First-order Gradient and second order gradient of the sample data of side, calculate both sides' sample data split vertexes under each divisional mode
Financial value.
Step S102, if epicycle node split be construct regression tree first run node split, to the First-order Gradient with
The second order gradient is sent to the first data side together with the sample ID of the sample set after being encrypted, for described
The First-order Gradient and the second order gradient of the first data side's side group in encryption, calculate local instruction corresponding with the sample ID
Practice the financial value of sample split vertexes under each divisional mode;
In the present embodiment, to realize the two-way secret protection for realizing both sides' sample data in federal training process, therefore, if
Epicycle node split is the first run node split for constructing regression tree, then the single order of sample data is calculated in the second data side side
After gradient and second order gradient, is first encrypted, be then then forwarded to the first data side.
In the first data side side, First-order Gradient and second order gradient and above-mentioned income based on the sample data received
The receipts of first data side local sample data split vertexes under each divisional mode are calculated in the calculation formula of value gain
Benefit value, since First-order Gradient and second order gradient are encrypted, the financial value being calculated is also secret value, thus nothing
Financial value need to be encrypted.
Under the various partitioning schemes for calculating sample data after the financial value of split vertexes, generation new node can be divided
To construct regression tree.The present embodiment is preferably had the leading building gradient boosted tree in the second data side of data label by sample data
The regression tree of model.Therefore, it is necessary to the first data side local sample datas that will be calculated in the first data side side each
The financial value of split vertexes is sent to the second data side under kind divisional mode.
Step S103, if epicycle node split is the non-first run node split for constructing regression tree, by the sample set
Sample ID is sent to the first data side, in first data side lateral edge single order used in first run node split
Gradient and second order gradient calculate local training sample split vertexes under each divisional mode corresponding with the sample ID
Financial value;
It, only need to be by epicycle section if epicycle node split is the non-first run node split for constructing regression tree in the present embodiment
The sample ID of the corresponding sample set of dot splitting is sent to the first data side, and when the first data side continues to continue to use first run node split
Used First-order Gradient and second order gradient calculate local training sample corresponding with the sample ID received in each division
The financial value of split vertexes under mode.
Step S104, the second data side receive the encryption financial value for all split vertexes that the first data side returns simultaneously
It is decrypted;
Step S105, in second data side side, based on the First-order Gradient and the second order gradient, calculate it is local with
The financial value of the corresponding training sample split vertexes under each divisional mode of the sample ID;
In the second data side side, First-order Gradient and second order gradient and above-mentioned receipts based on the sample data being calculated
The calculation formula of beneficial value gain calculates the local sample data to be trained in the second data side and divides section under each divisional mode
The financial value of point.
Step S106 determines epicycle node split based on the financial value of the respective calculated all split vertexes of both sides
Global best split vertexes;
Since the initial sample data of both sides has carried out sample alignment, respectively calculated all divisions save both sides
The financial value of point can regard the financial value to both sides' overall data sample split vertexes under each divisional mode as, because
This, by comparing the size of financial value, using the maximum split vertexes of financial value as best point of the overall situation of epicycle node split
Split node.
It should be noted that the best corresponding sample characteristics of split vertexes of the overall situation be both likely to belong to the first data side
Training sample, it is also possible to belong to the training sample of the second data side.
Optionally, it is dominated since the regression tree that gradient promotes tree-model is constructed by the second data side, in the second data
Square side needs to record the relevant information for the best split vertexes of the overall situation that each round node split determines;Relevant information includes: correspondence
The provider of sample data, the feature coding and financial value for corresponding to sample data.
For example, if data side A holds the corresponding feature f of global optimal partition pointi, then this is recorded as (SiteA, EA
(fi),gain).Conversely, if data side B holds the corresponding feature f of global optimal partition pointi, then this is recorded as (Site B, EB
(fi),gain).Wherein, EA(fi) indicate data side A to feature fiIt is encoded, EB(fi) indicate data side B to feature fiIt carries out
Coding can indicate feature f by codingiWithout revealing its initial characteristic data.
Optionally, when carrying out feature selecting in the above-described embodiments, preferably using each global best split vertexes as gradient
The split vertexes for promoting each regression tree in tree-model, count the average yield value of the corresponding split vertexes of same feature coding.
Step S107, the best split vertexes of the overall situation based on epicycle node split, to the corresponding sample set of present node into
Line splitting generates new node to construct the regression tree that gradient promotes tree-model.
If the best corresponding sample characteristics of split vertexes of the overall situation of epicycle node split belong to the training sample of the first data side
This, then the corresponding sample data of present node of epicycle segmentation belongs to the first data side.Correspondingly, if epicycle node split it is complete
The best corresponding sample characteristics of split vertexes of office belong to the training sample of the second data side, then the present node of epicycle segmentation is corresponding
Sample data belong to the second data side.
By node split, that is, new node (left child node and right child node) is produced, to construct regression tree.And lead to
Excessive wheel node split, then can be continuously generated new node, and then obtain the tree deeper regression tree of depth, and if Stop node
The regression tree that gradient promotes tree-model then can be obtained in division.
In the present embodiment, since the data that both sides calculate communication are all the encryption data of model intermediate result, training
Process will not reveal initial characteristic data.Guarantee the privacy of data in entire training process using Encryption Algorithm simultaneously.
It is preferred that using part homomorphic encryption algorithm, additive homomorphism is supported.
Further, in one embodiment, the difference based on node split condition, is used for especially by following manner
The First-order Gradient and second order gradient of the training sample of node split:
1, first regression tree of the corresponding construction of epicycle node split
If 1.1, epicycle node split is the first run node split for constructing first regression tree, in the second data side side, just
The First-order Gradient of each training sample and second order gradient in the corresponding sample set of beginningization epicycle node split;
If 1.2, epicycle node split is the non-first run node split for constructing first regression tree, first run node split is continued to use
Used First-order Gradient and second order gradient.
2, epicycle node split is corresponding constructs non-first regression tree
If 2.1, the corresponding first run node split for constructing non-first regression tree of epicycle node split, according to last round of federation
Training updates First-order Gradient and second order gradient;
If 2.2, epicycle node split is the non-first run node split for constructing non-first regression tree, first run node point is continued to use
First-order Gradient used in splitting and second order gradient.
Further, in one embodiment, be reduce the complexity of regression tree, therefore the depth threshold of default regression tree with
Carry out node split limitation.
In the present embodiment, when each round, which generates new node, promotes the regression tree of tree-model to construct gradient, second
Data side side, judges whether the depth of epicycle regression tree reaches predetermined depth threshold value;
If the depth of epicycle regression tree reaches predetermined depth threshold value, Stop node division, and then obtains gradient boosted tree
Otherwise one regression tree of model continues next round node split.
It should be noted that the condition of limitation node split is also possible to the Stop node point when node cannot continue division
It splits, such as the corresponding sample of present node, then can not continue node split.
Further, in another embodiment, to avoid training process overfitting, therefore the quantity threshold of regression tree is preset
Value is to limit the generation quantity of regression tree.
In the present embodiment, when Stop node division, in the second data side side, judge epicycle regression tree total quantity whether
Reach preset quantity threshold value;
If the total quantity of epicycle regression tree reaches preset quantity threshold value, stop federal training, otherwise continues next round connection
Nation's training.
It should be noted that the condition of the generation quantity of limitation regression tree is also possible to stop when node cannot continue division
Only construct regression tree.
For a better understanding of the invention, below based on sample data in table 1,2 in above-described embodiment, to federal instruction of the invention
White silk is illustrated with modeling process.
First round federation training: first regression tree of training
(1) first round node split
1.1, in the second data side side, computational chart 2 sample data First-order Gradient (gi) and second order gradient (hi);To gi
And hiThe first data side is sent to after being encrypted;
1.2, in the first data side side, it is based on giAnd hi, lower point of all possible divisional mode of sample data in computational chart 1
Split the financial value gain of node;Financial value gain is sent to the second data side;
Since Age feature with 5 kinds of sample data division modes, Gender feature there are 2 kinds of sample datas to divide in table 1
Mode, Amount of given credit 5 kinds of sample data division modes of feature, therefore, sample data has altogether in table 1
12 kinds of divisional modes, namely need to calculate the financial value of the corresponding split vertexes of 12 kinds of division modes.
1.3, in the second data side side, computational chart 2 under all possible divisional mode of sample data split vertexes receipts
Beneficial value gain;
Due in table 2 Bill Payment feature with 5 kinds of sample data division modes, Education feature have 3 kinds
Sample data division mode, therefore, sample data has 8 kinds of divisional modes altogether in table 2, namely needs to calculate 8 kinds of division sides
The financial value of the corresponding split vertexes of formula.
1.4, from the financial value of the corresponding split vertexes of the calculated 12 kinds of division modes in the first data side side and from
In the financial value of the corresponding split vertexes of the calculated 8 kinds of division modes in two data sides side, the corresponding spy of maximum return value is selected
Levy the best split vertexes of the overall situation as epicycle node split;
1.5, the best split vertexes of the overall situation based on epicycle node split, divide the corresponding sample data of present node
It splits, generates new node to construct the regression tree that gradient promotes tree-model.
1.6, judge whether the depth of epicycle regression tree reaches predetermined depth threshold value;If the depth of epicycle regression tree reaches pre-
If depth threshold, then Stop node divides, and then obtains the regression tree that gradient promotes tree-model, otherwise continues next round section
Dot splitting;
1.7, judge whether the total quantity of epicycle regression tree reaches preset quantity threshold value;If the total quantity of epicycle regression tree reaches
To preset quantity threshold value, then stop federal training, otherwise continues the training of next round federation.
(2) second and third wheel node split
2.1, assume that the corresponding feature of last round of split vertexes is that Bill Payment is less than or equal to 3102, then this feature
As split vertexes (corresponding sample be X1, X2, X3, X4, X5), two new partial nodes are generated, wherein left sibling is to should be less than
Or the sample set (X1, X5) equal to 3102, and right node is to the sample set (X2, X3, X4) that should be greater than 3102, by sample set
It closes (X1, X5) and sample set (X2, X3, X4) and continues second and third wheel node split respectively as new sample set, with right respectively
Two new nodes are divided, and new node is generated.
2.2, since second and third wheel node split belongs to the federal training of same wheel, continue to continue to use first round node point
Sample gradient value used in splitting.Assuming that the corresponding feature of a split vertexes of epicycle is Amount of given credit
Less than or equal to 200, then this feature generates two new partial nodes, wherein left as split vertexes (corresponding sample is X1, X5)
The corresponding sample X5 less than or equal to 200 of node, and right node is to the sample X1 that should be greater than 200;Similarly, epicycle another
The corresponding feature of split vertexes is that Age is less than or equal to 35, then this feature is as split vertexes (corresponding sample be X2, X3, X4),
Generate two new partial nodes, wherein left sibling it is corresponding be less than or equal to 35 sample X2, X3, and right node is to should be greater than 35
Sample X4.Specific implementation flow refers to first round node split process.
The federal training of second wheel: second regression tree of training
3.1, it since epicycle node split belongs to the training of next round federation, is updated with last round of federal training result
First-order Gradient and second order gradient used in the federal training of one wheel continue the federal training of the second wheel and carry out node split, to generate
New node constructs next regression tree, and specific implementation flow refers to the building process of previous regression tree.
3.2, as shown in figure 4, sample data produces two after the training of two-wheeled federation in table 1,2 in above-described embodiment
Regression tree, first regression tree includes three split vertexes, is respectively: Bill Payment is less than or equal to 3102, Amount
Of given credit is less than or equal to 200, Age and is less than or equal to 35;Second regression tree includes two split vertexes, point
Be not: Bill Payment is less than or equal to 6787, Gender==1.
3.3, two regression trees of tree-model are promoted based on gradient as shown in Figure 4, the feature of sample data is corresponding flat
Equal financial value: Bill Payment is (gain1+gain4)/2;Education is 0;Age is gain3;Gender is gain5;
Amount of given credit is gain2.
The present invention also provides a kind of computer readable storage mediums.
Feature selecting program is stored on computer readable storage medium of the present invention, the feature selecting program is by processor
The step of feature selection approach as described in the examples such as any of the above-described based on federation's training is realized when execution.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as ROM/RAM), including some instructions are used so that a terminal (can be mobile phone, computer, server or network are set
It is standby etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form, it is all using equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, directly or indirectly
Other related technical areas are used in, all of these belong to the protection of the present invention.
Claims (10)
1. a kind of feature selection approach based on federation's training, which is characterized in that the feature selecting side based on federation's training
Method the following steps are included:
Federal training is carried out using the training sample that XGboost algorithm is aligned two, promotes tree-model to construct gradient,
In, it includes more regression trees that the gradient, which promotes tree-model, and a split vertexes of the regression tree correspond to the one of training sample
A feature;
Count the average yield value that the gradient promotes the corresponding split vertexes of same feature in tree-model, and by the average receipts
Scoring of the benefit value as character pair;
Scoring based on each feature carries out feature ordering and exports ranking results, for carrying out feature selecting, wherein if training sample
There is the feature for not corresponding to split vertexes in this, then this feature uses default scoring.
2. the feature selection approach as described in claim 1 based on federation's training, which is characterized in that the instruction of described two alignment
Practicing sample is respectively the first training sample and the second training sample;
The first training sample attribute includes sample ID and part sample characteristics, and the second training sample attribute includes sample
This ID, another part sample characteristics and data label;
First training sample provided by the first data side and be stored in the first data side local, second training sample by
Second data side provides and is stored in the second data side local.
3. the feature selection approach as claimed in claim 2 based on federation's training, which is characterized in that described to use XGboost
The training sample that algorithm is aligned two carries out federal training, includes: to construct gradient promotion tree-model
In second data side side, the First-order Gradient and two of each training sample in the corresponding sample set of epicycle node split is obtained
Ladder degree;
If epicycle node split is the first run node split for constructing regression tree, to the First-order Gradient and the second order gradient into
The first data side is sent to together with the sample ID of the sample set after row encryption, in first data side side group
The First-order Gradient and the second order gradient in encryption calculate local training sample corresponding with the sample ID at each
The financial value of split vertexes under divisional mode;
If epicycle node split is the non-first run node split for constructing regression tree, the sample ID of the sample set is sent to institute
The first data side is stated, in first data side lateral edge First-order Gradient used in first run node split and two ladders
Degree calculates the financial value of local training sample split vertexes under each divisional mode corresponding with the sample ID;
Second data side receives the encryption financial value for all split vertexes that the first data side returns and is decrypted;
Local and ID pairs of the sample is calculated based on the First-order Gradient and the second order gradient in second data side side
The financial value of the training sample answered split vertexes under each divisional mode;
Based on the financial value of the respective calculated all split vertexes of both sides, the best division section of the overall situation of epicycle node split is determined
Point;
The best split vertexes of the overall situation based on epicycle node split, divide the corresponding sample set of present node, generate new
Node with construct gradient promoted tree-model regression tree.
4. the feature selection approach as claimed in claim 3 based on federation's training, which is characterized in that described in second number
According to square side, the step of obtaining the First-order Gradient and second order gradient of each training sample in the corresponding sample set of epicycle node split it
Before, further includes:
When carrying out node split, judge whether epicycle node split corresponds to first regression tree of construction;
If epicycle node split first regression tree of corresponding construction, judge whether epicycle node split is first regression tree of construction
First run node split;
If epicycle node split is the first run node split for constructing first regression tree, in second data side side, initialization
The First-order Gradient of each training sample and second order gradient in the corresponding sample set of epicycle node split;If epicycle node split is construction
The non-first run node split of first regression tree, then continue to use First-order Gradient used in first run node split and second order gradient;
If epicycle node split is corresponding to construct non-first regression tree, judge whether epicycle node split is the non-first recurrence of construction
The first run node split of tree;
If epicycle node split is the first run node split for constructing non-first regression tree, one is updated according to last round of federal training
Ladder degree and second order gradient;If epicycle node split is the non-first run node split for constructing non-first regression tree, the first run is continued to use
First-order Gradient used in node split and second order gradient.
5. the feature selection approach as claimed in claim 3 based on federation's training, which is characterized in that described based on federal training
Feature selection approach further include:
When generating new node to construct the regression tree of gradient promotion tree-model, in second data side side, epicycle is judged
Whether the depth of regression tree reaches predetermined depth threshold value;
If the depth of epicycle regression tree reaches the predetermined depth threshold value, Stop node division obtains gradient and promotes tree-model
A regression tree, otherwise continue next round node split.
6. the feature selection approach as claimed in claim 5 based on federation's training, which is characterized in that described based on federal training
Feature selection approach further include:
When Stop node division, in second data side side, judge whether the total quantity of epicycle regression tree reaches present count
Measure threshold value;
If the total quantity of epicycle regression tree reaches the preset quantity threshold value, stop federal training, otherwise continues next round connection
Nation's training.
7. the feature selection approach based on federation's training as described in any one of claim 3-6, which is characterized in that the base
In the feature selection approach of federation's training further include:
In second data side side, the relevant information for the best split vertexes of the overall situation that each round node split determines is recorded;
Wherein, the relevant information includes: the feature coding and income of the provider of corresponding sample data, corresponding sample data
Value.
8. the feature selection approach as claimed in claim 7 based on federation's training, which is characterized in that the statistics gradient
The average yield value for promoting the corresponding split vertexes of same feature in tree-model includes:
In second data side side, each regression tree in tree-model is promoted using each global best split vertexes as the gradient
Split vertexes count the average yield value of the corresponding split vertexes of same feature coding.
9. a kind of feature selecting device based on federation's training, which is characterized in that the feature selecting dress based on federation's training
It sets including memory, processor and is stored in the feature selecting journey that can be run on the memory and on the processor
Sequence is realized as of any of claims 1-8 when the feature selecting program is executed by the processor based on federation
The step of trained feature selection approach.
10. a kind of computer readable storage medium, which is characterized in that be stored with feature choosing on the computer readable storage medium
Program is selected, is realized when the feature selecting program is executed by processor as of any of claims 1-8 based on federation
The step of trained feature selection approach.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810918867.3A CN109034398B (en) | 2018-08-10 | 2018-08-10 | Gradient lifting tree model construction method and device based on federal training and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810918867.3A CN109034398B (en) | 2018-08-10 | 2018-08-10 | Gradient lifting tree model construction method and device based on federal training and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109034398A true CN109034398A (en) | 2018-12-18 |
CN109034398B CN109034398B (en) | 2023-09-12 |
Family
ID=64633061
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810918867.3A Active CN109034398B (en) | 2018-08-10 | 2018-08-10 | Gradient lifting tree model construction method and device based on federal training and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109034398B (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492420A (en) * | 2018-12-28 | 2019-03-19 | 深圳前海微众银行股份有限公司 | Model parameter training method, terminal, system and medium based on federation's study |
CN109711556A (en) * | 2018-12-24 | 2019-05-03 | 中国南方电网有限责任公司 | Machine patrols data processing method, device, net grade server and provincial server |
CN109934179A (en) * | 2019-03-18 | 2019-06-25 | 中南大学 | Human motion recognition method based on automated characterization selection and Ensemble Learning Algorithms |
CN110297848A (en) * | 2019-07-09 | 2019-10-01 | 深圳前海微众银行股份有限公司 | Recommended models training method, terminal and storage medium based on federation's study |
CN110851786A (en) * | 2019-11-14 | 2020-02-28 | 深圳前海微众银行股份有限公司 | Longitudinal federated learning optimization method, device, equipment and storage medium |
CN110941963A (en) * | 2019-11-29 | 2020-03-31 | 福州大学 | Text attribute viewpoint abstract generation method and system based on sentence emotion attributes |
CN110968886A (en) * | 2019-12-20 | 2020-04-07 | 支付宝(杭州)信息技术有限公司 | Method and system for screening training samples of machine learning model |
CN110990829A (en) * | 2019-11-21 | 2020-04-10 | 支付宝(杭州)信息技术有限公司 | Method, device and equipment for training GBDT model in trusted execution environment |
CN111079939A (en) * | 2019-11-28 | 2020-04-28 | 支付宝(杭州)信息技术有限公司 | Machine learning model feature screening method and device based on data privacy protection |
CN111178538A (en) * | 2019-12-17 | 2020-05-19 | 杭州睿信数据科技有限公司 | Federated learning method and device for vertical data |
CN111178408A (en) * | 2019-12-19 | 2020-05-19 | 中国科学院计算技术研究所 | Health monitoring model construction method and system based on federal random forest learning |
CN111291417A (en) * | 2020-05-09 | 2020-06-16 | 支付宝(杭州)信息技术有限公司 | Method and device for protecting data privacy of multi-party combined training object recommendation model |
CN111340614A (en) * | 2020-02-28 | 2020-06-26 | 深圳前海微众银行股份有限公司 | Sample sampling method and device based on federal learning and readable storage medium |
CN111368901A (en) * | 2020-02-28 | 2020-07-03 | 深圳前海微众银行股份有限公司 | Multi-party combined modeling method, device and medium based on federal learning |
CN111507479A (en) * | 2020-04-15 | 2020-08-07 | 深圳前海微众银行股份有限公司 | Feature binning method, device, equipment and computer-readable storage medium |
CN111738359A (en) * | 2020-07-24 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Two-party decision tree training method and system |
WO2021000572A1 (en) * | 2019-07-01 | 2021-01-07 | 创新先进技术有限公司 | Data processing method and apparatus, and electronic device |
WO2021082634A1 (en) * | 2019-10-29 | 2021-05-06 | 支付宝(杭州)信息技术有限公司 | Tree model-based prediction method and apparatus |
CN113435537A (en) * | 2021-07-16 | 2021-09-24 | 同盾控股有限公司 | Cross-feature federated learning method and prediction method based on Soft GBDT |
CN113657617A (en) * | 2020-04-23 | 2021-11-16 | 支付宝(杭州)信息技术有限公司 | Method and system for model joint training |
CN113722987A (en) * | 2021-08-16 | 2021-11-30 | 京东科技控股股份有限公司 | Federal learning model training method and device, electronic equipment and storage medium |
CN113723477A (en) * | 2021-08-16 | 2021-11-30 | 同盾科技有限公司 | Cross-feature federal abnormal data detection method based on isolated forest |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704966A (en) * | 2017-10-17 | 2018-02-16 | 华南理工大学 | A kind of Energy Load forecasting system and method based on weather big data |
CN107767183A (en) * | 2017-10-31 | 2018-03-06 | 常州大学 | Brand loyalty method of testing based on combination learning and profile point |
US20180089587A1 (en) * | 2016-09-26 | 2018-03-29 | Google Inc. | Systems and Methods for Communication Efficient Distributed Mean Estimation |
CN107993139A (en) * | 2017-11-15 | 2018-05-04 | 华融融通(北京)科技有限公司 | A kind of anti-fake system of consumer finance based on dynamic regulation database and method |
CN108021984A (en) * | 2016-11-01 | 2018-05-11 | 第四范式(北京)技术有限公司 | Determine the method and system of the feature importance of machine learning sample |
CN108257105A (en) * | 2018-01-29 | 2018-07-06 | 南华大学 | A kind of light stream estimation for video image and denoising combination learning depth network model |
CN108375808A (en) * | 2018-03-12 | 2018-08-07 | 南京恩瑞特实业有限公司 | Dense fog forecasting procedures of the NRIET based on machine learning |
-
2018
- 2018-08-10 CN CN201810918867.3A patent/CN109034398B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180089587A1 (en) * | 2016-09-26 | 2018-03-29 | Google Inc. | Systems and Methods for Communication Efficient Distributed Mean Estimation |
CN108021984A (en) * | 2016-11-01 | 2018-05-11 | 第四范式(北京)技术有限公司 | Determine the method and system of the feature importance of machine learning sample |
CN107704966A (en) * | 2017-10-17 | 2018-02-16 | 华南理工大学 | A kind of Energy Load forecasting system and method based on weather big data |
CN107767183A (en) * | 2017-10-31 | 2018-03-06 | 常州大学 | Brand loyalty method of testing based on combination learning and profile point |
CN107993139A (en) * | 2017-11-15 | 2018-05-04 | 华融融通(北京)科技有限公司 | A kind of anti-fake system of consumer finance based on dynamic regulation database and method |
CN108257105A (en) * | 2018-01-29 | 2018-07-06 | 南华大学 | A kind of light stream estimation for video image and denoising combination learning depth network model |
CN108375808A (en) * | 2018-03-12 | 2018-08-07 | 南京恩瑞特实业有限公司 | Dense fog forecasting procedures of the NRIET based on machine learning |
Non-Patent Citations (5)
Title |
---|
H. BRENDAN MCMAHAN 等: "Communication-efficient learning of deep networks from decentralized data", 《ARTIFICIAL INTELLIGENCE AND STATISTICS》 * |
JAKUB 等: "Federated learning strategies for improving communication efficiency", 《ARXIV.ORG》 * |
STEPHEN HARDY 等: "Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption", 《ARXIV.ORG》 * |
TIANQI CHEN 等: "XGBoost: A Scalable Tree Boosting System", 《KDD"16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING》 * |
许裕栗 等: "Xgboost算法在区域用电预测中的应用", 《自动化仪表》 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109711556A (en) * | 2018-12-24 | 2019-05-03 | 中国南方电网有限责任公司 | Machine patrols data processing method, device, net grade server and provincial server |
US11947680B2 (en) | 2018-12-28 | 2024-04-02 | Webank Co., Ltd | Model parameter training method, terminal, and system based on federation learning, and medium |
CN109492420B (en) * | 2018-12-28 | 2021-07-20 | 深圳前海微众银行股份有限公司 | Model parameter training method, terminal, system and medium based on federal learning |
CN109492420A (en) * | 2018-12-28 | 2019-03-19 | 深圳前海微众银行股份有限公司 | Model parameter training method, terminal, system and medium based on federation's study |
WO2020134704A1 (en) * | 2018-12-28 | 2020-07-02 | 深圳前海微众银行股份有限公司 | Model parameter training method based on federated learning, terminal, system and medium |
CN109934179A (en) * | 2019-03-18 | 2019-06-25 | 中南大学 | Human motion recognition method based on automated characterization selection and Ensemble Learning Algorithms |
WO2021000572A1 (en) * | 2019-07-01 | 2021-01-07 | 创新先进技术有限公司 | Data processing method and apparatus, and electronic device |
CN110297848A (en) * | 2019-07-09 | 2019-10-01 | 深圳前海微众银行股份有限公司 | Recommended models training method, terminal and storage medium based on federation's study |
CN110297848B (en) * | 2019-07-09 | 2024-02-23 | 深圳前海微众银行股份有限公司 | Recommendation model training method, terminal and storage medium based on federal learning |
WO2021082634A1 (en) * | 2019-10-29 | 2021-05-06 | 支付宝(杭州)信息技术有限公司 | Tree model-based prediction method and apparatus |
CN110851786A (en) * | 2019-11-14 | 2020-02-28 | 深圳前海微众银行股份有限公司 | Longitudinal federated learning optimization method, device, equipment and storage medium |
CN110990829A (en) * | 2019-11-21 | 2020-04-10 | 支付宝(杭州)信息技术有限公司 | Method, device and equipment for training GBDT model in trusted execution environment |
CN111079939A (en) * | 2019-11-28 | 2020-04-28 | 支付宝(杭州)信息技术有限公司 | Machine learning model feature screening method and device based on data privacy protection |
CN110941963A (en) * | 2019-11-29 | 2020-03-31 | 福州大学 | Text attribute viewpoint abstract generation method and system based on sentence emotion attributes |
CN111178538A (en) * | 2019-12-17 | 2020-05-19 | 杭州睿信数据科技有限公司 | Federated learning method and device for vertical data |
CN111178538B (en) * | 2019-12-17 | 2023-08-15 | 杭州睿信数据科技有限公司 | Federal learning method and device for vertical data |
CN111178408A (en) * | 2019-12-19 | 2020-05-19 | 中国科学院计算技术研究所 | Health monitoring model construction method and system based on federal random forest learning |
CN110968886A (en) * | 2019-12-20 | 2020-04-07 | 支付宝(杭州)信息技术有限公司 | Method and system for screening training samples of machine learning model |
CN111368901A (en) * | 2020-02-28 | 2020-07-03 | 深圳前海微众银行股份有限公司 | Multi-party combined modeling method, device and medium based on federal learning |
CN111340614A (en) * | 2020-02-28 | 2020-06-26 | 深圳前海微众银行股份有限公司 | Sample sampling method and device based on federal learning and readable storage medium |
CN111507479B (en) * | 2020-04-15 | 2021-08-10 | 深圳前海微众银行股份有限公司 | Feature binning method, device, equipment and computer-readable storage medium |
CN111507479A (en) * | 2020-04-15 | 2020-08-07 | 深圳前海微众银行股份有限公司 | Feature binning method, device, equipment and computer-readable storage medium |
CN113657617A (en) * | 2020-04-23 | 2021-11-16 | 支付宝(杭州)信息技术有限公司 | Method and system for model joint training |
CN111291417A (en) * | 2020-05-09 | 2020-06-16 | 支付宝(杭州)信息技术有限公司 | Method and device for protecting data privacy of multi-party combined training object recommendation model |
CN111738359A (en) * | 2020-07-24 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Two-party decision tree training method and system |
CN113435537A (en) * | 2021-07-16 | 2021-09-24 | 同盾控股有限公司 | Cross-feature federated learning method and prediction method based on Soft GBDT |
CN113722987A (en) * | 2021-08-16 | 2021-11-30 | 京东科技控股股份有限公司 | Federal learning model training method and device, electronic equipment and storage medium |
CN113723477A (en) * | 2021-08-16 | 2021-11-30 | 同盾科技有限公司 | Cross-feature federal abnormal data detection method based on isolated forest |
CN113722987B (en) * | 2021-08-16 | 2023-11-03 | 京东科技控股股份有限公司 | Training method and device of federal learning model, electronic equipment and storage medium |
CN113723477B (en) * | 2021-08-16 | 2024-04-30 | 同盾科技有限公司 | Cross-feature federal abnormal data detection method based on isolated forest |
Also Published As
Publication number | Publication date |
---|---|
CN109034398B (en) | 2023-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109034398A (en) | Feature selection approach, device and storage medium based on federation's training | |
CN109165683A (en) | Sample predictions method, apparatus and storage medium based on federation's training | |
CN109299811B (en) | Complex network-based fraud group recognition and risk propagation prediction method | |
WO2022110721A1 (en) | Client category aggregation-based joint risk assessment method and related device | |
CN108763314A (en) | A kind of interest recommends method, apparatus, server and storage medium | |
CN111932386B (en) | User account determining method and device, information pushing method and device, and electronic equipment | |
US9838484B2 (en) | Relevance estimation and actions based thereon | |
CN109753608A (en) | Determine the method for user tag, the training method of autoencoder network and device | |
CN107291815A (en) | Recommend method in Ask-Answer Community based on cross-platform tag fusion | |
CN112416986B (en) | User portrait realizing method and system based on hierarchical personalized federal learning | |
CN107633257B (en) | Data quality evaluation method and device, computer readable storage medium and terminal | |
CN107003834B (en) | Pedestrian detection device and method | |
CN108446291A (en) | The real-time methods of marking and points-scoring system of user credit | |
CN107741986A (en) | User's behavior prediction and corresponding information recommend method and apparatus | |
Postigo-Boix et al. | A social model based on customers’ profiles for analyzing the churning process in the mobile market of data plans | |
CN111767319A (en) | Customer mining method and device based on fund flow direction | |
CN112101577A (en) | XGboost-based cross-sample federal learning and testing method, system, device and medium | |
CN108876193A (en) | A kind of air control model building method based on credit score | |
CN103366009A (en) | Book recommendation method based on self-adaption clustering | |
CN107368499B (en) | Client label modeling and recommending method and device | |
CN112817563A (en) | Target attribute configuration information determination method, computer device, and storage medium | |
CN111984842B (en) | Bank customer data processing method and device | |
CN112837078B (en) | Method for detecting abnormal behavior of user based on clusters | |
CN106056137A (en) | Telecom group service recommending method based on data mining multi-classification algorithm | |
CN106383738A (en) | Task processing method and distributed computing framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |