CN112668723A

CN112668723A - Machine learning method and system

Info

Publication number: CN112668723A
Application number: CN202011589671.8A
Authority: CN
Inventors: 李国琪
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-16
Anticipated expiration: 2040-12-29
Also published as: CN112668723B

Abstract

The embodiment of the invention provides a machine learning method and a machine learning system. Wherein the method comprises the following steps: the decision-making end determines a target mapping relation identifier corresponding to a to-be-processed data set of the execution end according to a preset feature engineering strategy, wherein the target mapping relation identifier is used for representing a target mapping relation between original features of each object in the to-be-processed data set and target features of each object, and the target features are features obtained by performing feature engineering on the original features according to the preset feature engineering strategy; the decision terminal sends the target mapping relation identifier to the execution terminal; the execution end maps the original characteristics of the data set to be processed according to the target mapping relation represented by the target mapping relation identification to obtain the target characteristics of each object; and the execution end performs machine learning based on the target characteristics of each object to obtain a model for processing the similar objects of each object. The development cost of machine learning can be effectively reduced.

Description

Machine learning method and system

Technical Field

The invention relates to the technical field of machine learning, in particular to a machine learning method and a machine learning system.

Background

The electronic device with machine learning capability can obtain a model through machine learning, wherein the model is used for representing the mapping relation between the characteristics learned from the data set and the result, and the model is used for reasoning. However, in some application scenarios, there may be no displayed mapping relationship between the features in the data set and the results, and therefore, it is difficult for the electronic device to learn a model capable of effectively representing the mapping relationship between the features and the results according to the features in the data set, that is, machine learning is inefficient and less accurate.

Therefore, in these application scenarios, feature engineering needs to be performed on the data set so that a more obvious mapping relationship can exist between features in the data set and results, for convenience of description, features of each object in the data set before feature engineering are referred to as original features, and features of each object in the data set after feature engineering are referred to as target features.

However, the representation form of the target feature obtained after the feature engineering is different according to the machine learning framework used, such as the representation form of the target feature under Python (a programming language) framework and java (a programming language) framework. In order to enable target features obtained after feature engineering to be applicable to different machine learning frames, corresponding feature engineering methods need to be developed for the different machine learning frames. For example, one feature engineering method is developed for a Python framework to obtain target features suitable for the Python framework, and another feature engineering method is developed for a java framework to obtain target features suitable for the java framework.

The development cost of machine learning is high due to the need to develop a variety of different feature engineering methods.

Disclosure of Invention

The embodiment of the invention aims to provide a machine learning method so as to reduce the development cost of machine learning. The specific technical scheme is as follows:

in a first aspect of embodiments of the present invention, a machine learning method is provided, the method including:

a decision end determines a target mapping relation identifier corresponding to a to-be-processed data set of an execution end according to a preset feature engineering strategy, wherein the target mapping relation identifier is used for representing a target mapping relation between original features of each object in the to-be-processed data set and target features of each object, and the target features are features obtained by performing feature engineering on the original features according to the preset feature engineering strategy;

the decision end sends the target mapping relation identifier to the execution end;

the execution end maps the original features of the data set to be processed according to the target mapping relation represented by the target mapping relation identification to obtain the target features of the objects;

and the execution end performs machine learning based on the target characteristics of the objects to obtain a model for processing the similar objects of the objects.

In a possible embodiment, before determining, according to a preset feature engineering policy, a target mapping relationship identifier corresponding to a to-be-processed data set at an execution end, the method further includes:

the execution end acquires the mapping relation and the mapping relation identification which are supported and realized by the execution end to obtain mapping relation information, wherein the mapping relation information is used for representing the corresponding relation between the mapping relation and the mapping relation identification which are supported and realized by the execution end;

the execution end sends the mapping relation information to a decision end;

the decision end receives the mapping relation information sent by the execution end;

the determining a target mapping relationship identifier corresponding to a to-be-processed data set of an execution end according to a preset feature engineering strategy comprises the following steps:

determining a target mapping relation between the original characteristics and the target characteristics of the data set to be processed of the execution end according to a preset characteristic engineering strategy;

and determining a mapping relation identifier corresponding to the target mapping relation as a target mapping relation identifier according to the corresponding relation represented by the mapping relation information.

In a possible embodiment, the determining, according to a preset feature engineering policy, a target mapping relationship identifier corresponding to a to-be-processed data set of an execution end includes:

determining a target characteristic engineering strategy corresponding to the data set to be processed from a plurality of preset characteristic engineering strategies;

and determining a target mapping relation identifier corresponding to the data set to be processed of the execution end by adopting the target characteristic engineering strategy.

In a possible embodiment, the determining a target feature engineering policy corresponding to the to-be-processed data set from a plurality of different preset feature engineering policies includes:

aiming at each preset characteristic engineering strategy, determining the pre-estimated score of the preset characteristic engineering strategy, wherein the pre-estimated score is used for representing the discrete degree of characteristic values on each dimension in target characteristics obtained by performing characteristic engineering on the original characteristics of a data set to be processed at an execution end according to the preset characteristic engineering strategy, and the pre-estimated score is negatively correlated with the discrete degree;

and determining the preset characteristic engineering strategy with the highest estimated score as a target characteristic engineering strategy.

In a possible embodiment, after the mapping the original features of the data set to be processed according to the target mapping relationship represented by the target mapping relationship identifier to obtain the target features of the data set to be processed, the method further includes:

the decision terminal takes the target characteristics of the data set to be processed as new original characteristics of the data set to be processed, and returns to the step of executing the target mapping relation identification corresponding to the data set to be processed of the execution terminal determined according to the preset characteristic engineering strategy;

the performing machine learning based on the target features of the objects to obtain a model for processing similar objects of the objects, includes:

and performing machine learning based on the target characteristics of the objects until a preset cycle ending condition is reached to obtain a model for processing the similar objects of the objects.

In a second aspect of the embodiments of the present invention, there is provided a machine learning system, including a decision end and an execution end;

the decision end is used for determining a target mapping relation identifier corresponding to a to-be-processed data set of an execution end according to a preset feature engineering strategy, the target mapping relation identifier is used for representing a target mapping relation between original features of each object in the to-be-processed data set and target features of each object, and the target features are features obtained by performing feature engineering on the original features according to the preset feature engineering strategy; sending the target mapping relation identifier to the execution end;

the execution end is used for mapping the original features of the data set to be processed according to the target mapping relation represented by the target mapping relation identifier to obtain the target features of the objects; and performing machine learning based on the target characteristics of the objects to obtain a model for processing the similar objects of the objects.

In a possible embodiment, the execution end is further configured to acquire a mapping relationship and a mapping relationship identifier that are supported and implemented by the execution end to obtain mapping relationship information, where the mapping relationship information is used to represent a corresponding relationship between the mapping relationship and the mapping relationship identifier that are supported and implemented by the execution end; sending the mapping relation information to a decision end;

the decision end is further configured to receive the mapping relationship information sent by the execution end;

the decision end is specifically used for determining a target mapping relation between the original features and the target features of the data set to be processed of the execution end according to a preset feature engineering strategy;

In a possible embodiment, the decision terminal is specifically configured to determine, for each preset feature engineering strategy, an estimated score of the preset feature engineering strategy, where the estimated score is used to represent a discrete degree of a feature value in each dimension in a target feature obtained by performing feature engineering on an original feature of a to-be-processed data set of an execution terminal according to the preset feature engineering strategy, and the estimated score is negatively related to the discrete degree;

In a possible embodiment, the decision end is further configured to use a target feature of the to-be-processed data set as a new original feature of the to-be-processed data set, and return to execute the step of determining the target mapping relationship identifier corresponding to the to-be-processed data set of the execution end according to a preset feature engineering policy;

the execution end is specifically configured to perform machine learning based on the target features of the objects until a preset loop end condition is reached, so as to obtain a model for processing similar objects of the objects.

And returning to the step of executing the target mapping relation identifier sent by the receiving decision terminal by the new original characteristics of the data set until a preset cycle ending condition is reached.

The embodiment of the invention has the following beneficial effects:

the machine learning method and the system provided by the embodiment of the invention can determine the target mapping relation identifier by the decision end according to the preset characteristic engineering strategy, guide the execution end to map the original characteristics in the data set by using the target mapping relation identifier, so that the feature engineering and the feature mapping are decomposed into two mutually independent steps, the feature mapping can be carried out by the execution end according to the machine learning framework used by the execution end, the obtained target feature can be suitable for the machine learning framework used by the execution end, it can be seen that the machine learning method provided by the embodiment of the present application can obtain target features suitable for different machine learning frameworks according to different machine learning frameworks used by the execution end, therefore, the development cost of machine learning can be reduced without developing different feature projects aiming at different machine learning frameworks.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a machine learning method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of another machine learning method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a machine learning system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to more clearly describe the machine learning method provided by the embodiment of the present invention, a possible application scenario of the machine learning method provided by the embodiment of the present invention will be exemplarily described below, it can be understood that the following example is only one possible application scenario of the machine learning method provided by the embodiment of the present invention, and in other possible embodiments, the machine learning provided by the embodiment of the present invention may also be applied to other possible application scenarios, which is not limited in this embodiment.

Assuming that a user needs to obtain a classification model for judging whether people are overweight through machine learning, data of a plurality of people can be collected in advance to obtain a data set. The data set includes raw characteristics of a plurality of persons, and the raw characteristics of each person may include height, weight of the person.

Assuming that machine learning is performed based on the original features of each person, the classification model obtained by machine learning can be theoretically used to represent the following mapping relationship:

R＝F(H，W)

where R is the classification result for indicating whether a person is overweight, for example, it may be that R is overweight when it is greater than a preset threshold, it is not overweight when it is not greater than the preset threshold, H is the height of the person, W is the weight of the person, and F (H, W) is a mapping function, it being understood that whether a person is overweight depends on both the height and the weight of the person, if it is assumed that it is the body mass index that determines whether a person is overweight, F (H, W) may ideally be represented in one possible embodiment in the form:

if the original characteristics of each person are subjected to characteristic engineering, for example, if it is known from the experience of the user that whether the person is overweight depends on the body mass index, the body mass index of each person can be obtained by dividing the weight of the person by the square of the height of the person, and the body mass index of the person is used as the target characteristic of the person.

Then, machine learning is performed based on the target features of the respective persons, and the classification model obtained by the machine learning can be theoretically used for representing the following mapping relationship:

R＝G(BMI)

where BMI is the body mass index of a person and G (BMI) is the mapping function, if it is assumed that the body mass index is used to determine whether a person is overweight, then in one possible embodiment G (BMI) may ideally be expressed as follows:

F(BMI)＝BMI

it can be seen that the form of g (bmi) is simpler than that of F (H, W), and therefore machine learning based on target features is more efficient and an accurate model is easier to obtain. Therefore, in machine learning, the original features of each person are often subjected to feature engineering so as to improve the efficiency and accuracy of machine learning.

However, the way in which the division of the weight of a person by the square of the height of the person is implemented under different machine learning frameworks is different, illustratively, the function that implements the square computation for the java framework is the pow function and the function that implements the square computation for the python framework is the power function. Therefore, a user is required to develop corresponding codes according to a framework used by the execution end to realize feature engineering, which results in higher development cost of machine learning.

Referring to fig. 1, fig. 1 is a schematic flow chart of a machine learning method according to an embodiment of the present invention, which may include:

s101, the decision end determines a target mapping relation representation corresponding to a to-be-processed data set of the execution end according to a preset characteristic engineering strategy.

And S102, the decision end sends the target mapping relation identifier to the execution end.

S103, the execution end maps the original features of the data set to be processed according to the target mapping relation represented by the target mapping relation identification to obtain the target features of all the objects.

And S104, the execution end performs machine learning based on the target characteristics of each object to obtain a model for processing the similar object of each object.

By adopting the embodiment, the decision end can determine the target mapping relation identifier according to the preset characteristic engineering strategy, and the execution end is guided by the target mapping relation identifier to map the original characteristics in the data set, so that the characteristic engineering and the characteristic mapping are decomposed into two mutually independent steps, and the characteristic mapping can be performed by the execution end according to the machine learning frame used by the execution end, so that the obtained target characteristics can be suitable for the machine learning frame used by the execution end.

In S101, the target mapping relationship identifier is used to represent a target mapping relationship between an original feature of each object in the data set to be processed and a target feature of each object, where the target feature is a feature obtained by performing feature engineering on the original feature according to a preset feature engineering policy. The mapping relationship may be expressed in different manners according to different application scenarios, for example, in a possible embodiment, the target mapping relationship identifier identifies an operator name of a feature operator that may be used to implement the target mapping relationship, for example, if the target feature is obtained by normalizing the original feature, that is, the target mapping relationship may be implemented by a normalization operator, the target mapping relationship identifier may be the name of the normalization operator. In other possible embodiments, the target mapping relationship identifier may also be represented in the form of a number, an identifier, and the like of a feature operator for implementing the target mapping relationship, which is not limited in this embodiment.

The object in the data set to be processed may be a person, a vehicle, a road sign, and the like according to the difference of the application scene, and the features included in the original features of the respective objects may be different according to the difference of the object types and the difference of the application scenes, for example, when the object is a person, the original features may include one or more of features of the person such as height, weight, age, face image, sex, voiceprint, whether to wear a mask, and the like, and for example, when the object is a vehicle, one or more of features of the vehicle such as color, model, license plate number, and outline may be included in the original features.

The decision end and the execution end may be two different physical devices, or two different virtual devices, and one of them may be a physical device and the other may be a virtual device. When the decision end and the execution end are two virtual devices, the decision end and the execution end may be two virtual devices running on the same physical device, or two virtual devices running on different physical devices.

One decision end can be connected with a plurality of execution ends, and one execution end can also be connected with a plurality of decision ends. For convenience of description, a decision end and an execution end are taken as an example for explanation, and the principles of the cases of the decision end and the execution ends, the decision end and the execution end, and the decision end and the execution end are the same, and therefore, the details are not repeated herein.

In S102, the decision end may send the target mapping relationship identifier to the execution end through the connection established with the execution end.

In S103, since the execution end maps the original features, the mapped target features should be applied to the machine learning framework used by the execution end. For example, assuming that the machine learning framework used by the execution end is a Python framework, the target features obtained theoretically are suitable for the Python framework.

It will be appreciated that although the way in which the mapping is implemented differs under different machine learning frameworks, the mapping implemented is theoretically the same. For example, the way of realizing normalization under the Phthon framework is different from the way of realizing normalization under the java framework, but the Phthon framework and the java framework can realize normalization theoretically. Therefore, different machine learning frameworks can theoretically map the original features of the data set to be processed according to the mapping relation represented by the target mapping relation identifier. Namely, the target mapping relation identification sent by the decision end can be accurately responded by the execution ends using different machine learning frameworks.

In S104, the model obtained through machine learning may be a model for performing different processing on the same type of object as each object according to different application scenarios and actual requirements, for example, the model may be a classification model for determining the gender of a person, a detection model for detecting a road sign in an image, a recognition model for recognizing a license plate number, or a model for performing other processing, which is not limited in this embodiment.

As the analysis in S103, the decision end in the machine learning method provided in the embodiment of the present application does not need to care about the machine learning framework used by the execution end, so that the execution ends using different machine learning frameworks can all obtain the target features suitable for the machine learning framework used by the execution ends. Therefore, the machine learning method provided by the embodiment of the invention has strong usability, and different machine learning methods do not need to be developed aiming at different machine learning frames respectively.

Referring to fig. 2, fig. 2 is a schematic flow chart of another machine learning method according to an embodiment of the present invention, which may include:

s201, the execution end collects the mapping relation and the mapping relation identification which are supported and realized by the execution end to obtain the mapping relation information.

The mapping relation information is used for representing the corresponding relation between the mapping relation supported by the execution end and the mapping relation identifier. The execution end can acquire the mapping relation and the mapping relation identification supported and realized by the execution end to obtain the mapping relation information. Taking the mapping relationship represented in the form of a feature operator as an example, the execution end may scan the feature operator of the used machine learning framework and the name of the feature operator, such as normalization, addition, subtraction, multiplication, division, cross entropy, and the like, to obtain the corresponding relationship between the feature operator and the name of the feature operator, which is used as the mapping relationship information.

S202, the decision end receives the mapping relation information sent by the execution end.

S203, the decision end determines a target mapping relation between the original features and the target features of the data set to be processed of the execution end according to a preset feature engineering strategy.

In one possible embodiment, the predetermined feature engineering policy may be a feature engineering policy, such as any one of Meta-Learning, Expand-Reduce, and reconstruction Learning.

In other possible embodiments, the predetermined feature engineering policy may be a plurality of feature engineering policies, such as a plurality of the aforementioned Meta-Learning policy, Expand-Reduce policy, Hierarchical organization of transformations policy, and relationship Learning policy. The preset feature engineering strategy may include part of the four feature engineering strategies, or all of the four feature engineering strategies. And, in other possible embodiments, other feature engineering strategies besides the above four feature engineering strategies may also be included.

When the preset feature engineering strategy comprises a plurality of feature engineering strategies, a target feature engineering strategy corresponding to the data set to be processed may be determined from the plurality of preset feature engineering strategies, and a target mapping relationship corresponding to the data set to be processed at the execution end may be determined by using the target feature engineering strategy.

The mode of determining the target characteristic engineering strategy corresponding to the data set to be processed from the multiple preset characteristic engineering strategies can be different according to different application scenes, and different preset characteristic engineering strategies have different advantages, so that different preset characteristic engineering strategies can be selected according to actual requirements to determine a target mapping relation, and the target characteristics obtained by the execution end can be suitable for different application scenes.

In a possible embodiment, for each preset feature engineering strategy, an estimation score of the preset feature engineering strategy is determined, where the estimation score is used to represent a degree of dispersion of feature values in each dimension in a target feature obtained by performing feature engineering on an original feature of a to-be-processed data set at an execution end according to the preset feature engineering strategy, and the estimation score is negatively correlated with the degree of dispersion.

By selecting the embodiment, a proper characteristic engineering strategy can be selected from a plurality of built-in preset characteristic engineering strategies to determine the target mapping relation, so that the machine learning method provided by the embodiment of the invention can be suitable for different application scenes. Meanwhile, due to the fact that various preset characteristic engineering strategies are built in, a user does not need to write codes manually, the characteristic engineering efficiency is improved, and meanwhile, the labor cost consumed by the characteristic engineering is reduced. And reduces the requirements on the user.

It will be appreciated that features are used to distinguish between different objects, and therefore if the degree of dispersion between feature values of different objects in a feature dimension is large, it may be better to distinguish different objects according to the feature values in that feature dimension. If the degree of dispersion between feature values of different objects is small in a feature dimension, it is difficult to distinguish different objects according to the feature values in the feature dimension.

For example, assuming that the objects are students and a certain characteristic dimension is whether or not a red scarf is worn, since both boy students and girl students wear red scarves, that is, the degree of dispersion between characteristic values of the objects in the characteristic dimension of whether or not a red scarf is worn is small, it is difficult to distinguish between the boy students and the girl students according to whether or not a red scarf is worn. Assuming that a further characteristic dimension is whether to wear a skirt, the boy student does not wear the skirt and the girl student wears the skirt due to the school uniform design, and therefore, the characteristic dimensions of whether to wear the skirt differ in characteristic values of the boy student and the girl student, that is, the characteristic dimensions of whether to wear the skirt have a large degree of dispersion between the characteristic values of the objects, and it is relatively easy to distinguish the boy student and the girl student depending on whether to wear the skirt.

The degree of dispersion may be expressed in different ways in different embodiments, for example, the degree of dispersion may be expressed in the form of an entropy value, and in one possible embodiment, the degree of dispersion may be expressed by a feature importance calculated by a random forest method.

S204, the decision end determines a mapping relation identifier corresponding to the target mapping relation according to the corresponding relation represented by the mapping relation information, and the mapping relation identifier is used as the target mapping relation identifier.

When determining the target mapping relationship according to the characteristic engineering strategy, the decision end may be determined based on the data set to be processed, or may be determined based on the characteristic information of the data set to be processed. For example, in a possible embodiment, if the bandwidth between the decision end and the execution end is sufficient and the transmission rate is fast, the execution end may send the to-be-processed data set to the decision end, and the decision end constructs the basic information and the meta-feature of the to-be-processed data set according to the to-be-processed data set as the feature information of the to-be-processed data set, and determines to obtain the target mapping relationship according to the feature information of the to-be-processed data set and a preset feature engineering policy.

In another possible embodiment, the execution end may also construct the basic information and the meta-feature of the to-be-processed data set according to the to-be-processed data set as the feature information of the to-be-processed data set, and send the feature information to the decision end, and the decision end determines to obtain the target mapping relationship according to the feature information of the to-be-processed data set and a preset feature engineering strategy.

S205, the decision end sends the target mapping relation identifier to the execution end.

The step is the same as the step S102, and reference may be made to the related description of the step S102, which is not described herein again.

S206, the execution end maps the original characteristics of the data set to be processed according to the target mapping relation represented by the target mapping relation identification to obtain the target characteristics of each object in the data set to be processed.

This step is the same as S103, and reference may be made to the related description of S103, which is not described herein again.

And S207, the execution end performs machine learning based on the target characteristics of each object to obtain a model for processing the similar object of each object.

The step is the same as the step S104, and reference may be made to the related description of the step S104, which is not described herein again.

It can be understood that in some possible application scenarios, the original features are subjected to feature conversion only once, and the obtained target features may still have a difficult displayed association relationship with the result. Therefore, in a possible embodiment, after the executing end maps the original features of the data set to be processed according to the target mapping relationship represented by the target mapping relationship identifier to obtain the target features of the data set to be processed, the target features of the data set to be processed may be used as new original features, the aforementioned step of determining the target mapping relationship identifier corresponding to the data set to be processed at the executing end according to the preset feature engineering policy is executed again, and the newly determined target mapping relationship identifier is sent to the executing end, the executing end maps the original features of the data to be processed according to the target mapping relationship represented by the newly determined target mapping identifier to obtain the target features of each object, until reaching the preset cycle end condition, if the cycle has been executed for 3-5 times, the cycle is ended, and the executing end performs machine learning based on the latest target features of each object, a model for processing the same kind of object of each object is obtained. With the embodiment, the target characteristics can be enabled to have more explicit association relation with the result.

In order to more clearly describe the machine learning method provided by the embodiment of the present invention, the feature engineering strategies mentioned in the foregoing S203 will be described below, and since each feature engineering strategy is not a main inventive point of the present invention, a brief description is made here.

Meta-learning strategy: the original characteristics of the data to be processed of the execution end can be directly predicted through the meta-model in the hyper-reference library of the decision end, and therefore the target mapping relation is obtained through inference.

Expand-Reduce strategy: the Expand-Reduce strategy can be divided into an Expand stage and a Reduce stage, wherein the Expand stage can be executed by a decision end or an execution end, and the Reduce stage is executed by the decision end.

In the Expand phase, k feature transfer functions (T1, T2, T3, …, Tk) may be called, where T1 is the first feature function, T2 is the second feature function, and so on. And performing feature conversion on the original features to generate new features, and for convenience of description, marking the original features as (f1, f2, …, fn), wherein f1 is the first feature in the original features, f2 is the second feature in the original features, and so on. The newly generated features are (T1(f1), T1(f2), …, T1(fn), T2(f1), … tk (fn)), and since the dimension of the newly generated features is k × n dimension, the newly generated features are expanded compared to the original features of n dimension, and thus are called Expand stage. It can be understood that if the Expand stage is executed by the decision end, the execution end needs to send the original features to the decision end.

In the Reduce stage, N characteristics are selected from newly generated k × N dimensions according to a preset screening strategy, and selection can be performed according to the accuracy and/or recall rate and other evaluation indexes during selection. And the decision terminal records the feature operator of the selected feature, the feature column identifier and the corresponding relation between the feature operator and the feature column identifier, and sends the feature operator and the feature column identifier to the execution terminal. The feature operator is used for representing a feature conversion function corresponding to the selected feature, and the feature column identifier is used for representing an original feature corresponding to the selected feature. For example, assuming that T2(f3) is included in the N selected features, a feature operator for representing a feature conversion function T2 and a feature column identifier for representing an original feature f3 may be recorded, and the execution end may perform feature conversion on the original feature f3 by using the feature conversion function T2 according to the recorded feature operator and feature column identifier, so as to obtain a feature T2(f 3).

The tactics of organizational of transformations: the organizational organization of transformations strategy also includes an Expand phase and a phase that approximates the Reduce phase described above. In the Expand stage, the original features may be expanded into a plurality of features, for example, if the original features are represented in a feature table, the feature table may be expanded into a plurality of feature tables. And training each expanded feature to obtain an evaluation value of each feature, such as auc (Area under the ROC curve), and accuracy, wherein the ROC curve refers to a receiver operating characteristic curve.

In a stage similar to the Reduce stage, part of the nodes may be discarded based on the obtained evaluation value and a threshold preset for the evaluation value, and recording is performed, and a next round of Search is performed, where the Search may be DFS (Depth First Search) or BFS (break First Search).

Reinforcement learning strategy: similar to the principle of the aforementioned organizational strategy, the only difference is that the search is not DFS or BFS, but rather is based on MDP (Markov Decision Process).

Referring to fig. 3, fig. 3 is a schematic structural diagram of a machine learning system according to an embodiment of the present invention, which may include:

a decision end 301 and an execution end 302. It should be understood that the machine learning system shown in fig. 3 is only one possible structural schematic diagram of the machine learning system provided in the embodiment of the present invention, and in other possible embodiments, the machine learning system provided in the embodiment of the present invention may also include a plurality of decision terminals 301 and a plurality of execution terminals 302.

The decision end 301 is configured to determine, according to a preset feature engineering policy, a target mapping relationship identifier corresponding to a to-be-processed data set of the execution end 302, where the target mapping relationship identifier is used to represent a target mapping relationship between an original feature of each object in the to-be-processed data set and a target feature of each object, and the target feature is a feature obtained by performing feature engineering on the original feature according to the preset feature engineering policy; sending the target mapping relationship identifier to the execution end 302;

the execution end 302 is configured to map the original features of the to-be-processed data set according to the target mapping relationship represented by the target mapping relationship identifier, so as to obtain target features of the objects; and performing machine learning based on the target characteristics of the objects to obtain a model for processing the similar objects of the objects.

In a possible embodiment, the execution end 302 is further configured to acquire a mapping relationship and a mapping relationship identifier that the execution end supports to implement, to obtain mapping relationship information, where the mapping relationship information is used to represent a corresponding relationship between the mapping relationship and the mapping relationship identifier that the execution end supports to implement; sending the mapping relation information to the decision terminal 301;

the decision end 301 is further configured to receive the mapping relationship information sent by the execution end 302;

the decision end 301 is specifically configured to determine a target mapping relationship between an original feature and a target feature of a to-be-processed data set of an execution end according to a preset feature engineering strategy;

In a possible embodiment, the decision end 301 is specifically configured to determine a target feature engineering policy corresponding to the data set to be processed from multiple preset feature engineering policies;

In a possible embodiment, the decision terminal 301 is specifically configured to determine, for each preset feature engineering strategy, an estimated score of the preset feature engineering strategy, where the estimated score is used to represent a discrete degree of a feature value in each dimension in a target feature obtained by performing feature engineering on an original feature of a to-be-processed data set of an execution terminal according to the preset feature engineering strategy, and the estimated score is negatively related to the discrete degree;

In a possible embodiment, the decision terminal 301 is further configured to use a target feature of the data set to be processed as a new original feature of the data set to be processed, and return to execute the step of determining the target mapping relationship identifier corresponding to the data set to be processed of the execution terminal 302 according to a preset feature engineering policy;

the execution end 302 is specifically configured to perform machine learning based on the target features of the objects until a preset loop end condition is reached, so as to obtain a model for processing similar objects of the objects.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the system, since they are substantially similar to the method embodiments, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A machine learning method, the method comprising:

2. The method according to claim 1, wherein before the determining, according to a preset feature engineering policy, a target mapping relationship identifier corresponding to a to-be-processed data set of an execution end, the method further comprises:

the execution end sends the mapping relation information to a decision end;

3. The method according to claim 1, wherein the determining, according to a preset feature engineering policy, a target mapping relationship identifier corresponding to a to-be-processed data set at an execution end comprises:

4. The method according to claim 3, wherein the determining a target feature engineering policy corresponding to the data set to be processed from a plurality of different preset feature engineering policies comprises:

5. The method according to claim 1, wherein after the mapping the original features of the data set to be processed according to the target mapping relationship represented by the target mapping relationship identifier to obtain the target features of the data set to be processed, the method further comprises:

6. A machine learning system is characterized in that the machine learning system comprises a decision end and an execution end;

7. The system according to claim 6, wherein the execution end is further configured to acquire a mapping relationship and a mapping relationship identifier that are supported and implemented by the execution end, to obtain mapping relationship information, where the mapping relationship information is used to represent a corresponding relationship between the mapping relationship and the mapping relationship identifier that are supported and implemented by the execution end; sending the mapping relation information to a decision end;

8. The system according to claim 6, wherein the decision-making end is specifically configured to determine a target feature engineering policy corresponding to the data set to be processed from a plurality of preset feature engineering policies;

9. The system according to claim 8, wherein the decision-making end is specifically configured to determine, for each preset feature engineering strategy, an estimated score of the preset feature engineering strategy, where the estimated score is used to indicate a degree of dispersion of feature values in each dimension in a target feature obtained by feature engineering an original feature of a to-be-processed data set of an execution end according to the preset feature engineering strategy, and the estimated score is negatively correlated with the degree of dispersion;

10. The system according to claim 6, wherein the decision-making end is further configured to use a target feature of a to-be-processed data set as a new original feature of the to-be-processed data set, and return to perform the step of determining a target mapping relationship identifier corresponding to the to-be-processed data set of the execution end according to a preset feature engineering policy;