CN113435602A

CN113435602A - Method and system for determining feature importance of machine learning sample

Info

Publication number: CN113435602A
Application number: CN202110542599.1A
Authority: CN
Inventors: 罗远飞; 涂威威
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2016-11-01
Filing date: 2016-11-01
Publication date: 2021-09-24
Also published as: CN108021984A

Abstract

A method and system for determining feature importance of a machine learning sample are provided, the method comprising: (A) obtaining a historical data record, wherein the historical data record comprises a label and at least one attribute information about a machine learning problem; (B) training at least one characteristic pool model by using the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the characteristics; (C) acquiring an effect of the at least one feature pool model, and determining the importance of each feature according to the acquired effect of the at least one feature pool model, wherein in the step (B), the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least one part of features. By the method and the system, the importance of each feature in the machine learning sample can be effectively determined.

Description

Method and system for determining feature importance of machine learning sample

The present application is a divisional application of patent applications entitled "method and system for determining feature importance of machine learning samples" filed 2016, 11/01/2016 and No. 201610935697.0.

Technical Field

The present invention relates generally to the field of artificial intelligence, and more particularly, to a method and system for determining feature importance of a machine learning sample.

Background

With the advent of massive data, artificial intelligence techniques have evolved rapidly, and in order to extract value from the massive data, it is necessary to generate samples suitable for machine learning based on data records.

Here, each data record may be considered as a description of an event or object, corresponding to an example or sample. In a data record, various items are included that reflect the performance or nature of an event or object in some respect, and these items may be referred to as "attributes".

In practice, the predictive effect of a machine learning model is related to the selection of the model, the extraction of available data and features, etc. How to extract the features of the machine learning samples from the various attributes of the raw data records will have a great influence on the effect of the machine learning model. Accordingly, it is highly desirable to know the importance of various features of a machine learning sample, both from a model training and model understanding perspective. For example, the expected splitting gain for each feature may be calculated and then the feature importance may be calculated based on a tree model trained based on XGBoost. Although the above method can consider the interaction between features, the training cost is high, and the influence of different parameters on the importance of the features is large.

In fact, the importance of features is difficult to determine intuitively, technicians are often required to not only master machine learning knowledge, but also to understand actual prediction problems deeply, and the prediction problems are often combined with different practical experiences of different industries, so that satisfactory effects are difficult to achieve.

Disclosure of Invention

Exemplary embodiments of the present invention aim to overcome the deficiencies of the prior art in which it is difficult to efficiently determine the importance of various features of a machine-learned sample.

According to an exemplary embodiment of the invention, there is provided a method of determining importance of individual features of a machine learning sample, comprising: (A) obtaining a historical data record, wherein the historical data record comprises marks about machine learning problems and at least one piece of attribute information of each feature used for generating machine learning samples; (B) training at least one characteristic pool model by using the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the characteristics; (C) obtaining an effect of the at least one feature pool model, and determining importance of the respective features according to the obtained effect of the at least one feature pool model, wherein, in step (B), the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least one part of features.

Optionally, in the method, in step (C), the importance of the corresponding feature based on the feature pool model is determined according to a difference between effects of the feature pool model on the original test data set and a transformed test data set, where the transformed test data set refers to a data set obtained by replacing a value of a target feature whose importance is to be determined in the original test data set with one of: zero values, random values, values obtained by scrambling the order of the original values of the target features.

Optionally, in the method, the at least one feature pool model includes an all-features model, where the all-features model refers to a machine learning model that provides a prediction result about a machine learning problem based on all of the features among the respective features.

Optionally, in the method, the at least one feature pool model comprises a plurality of machine learning models that provide a prediction result about the machine learning problem based on different feature groups, wherein in step (C), the importance of the respective features is determined according to a difference between effects of the at least one feature pool model on the original test data set.

Optionally, in the method, the at least one feature pool model includes one or more main feature pool models and at least one sub-feature pool model respectively corresponding to each main feature pool model, where a sub-feature pool model refers to a machine learning model that provides a prediction result about a machine learning problem based on remaining features except a target feature whose importance is to be determined among features based on which the corresponding main feature pool model is based, and in step (C), the importance of the corresponding target feature is determined according to a difference between effects of the main feature pool model and respective sub-feature pool models corresponding thereto on an original test data set.

Optionally, in the method, the at least one feature pool model includes a plurality of single-feature models, wherein a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature whose importance is to be determined among the respective features, wherein in step (C), the importance of the corresponding target feature is determined according to a difference between effects of the single-feature models on the original test data set.

Optionally, in the method, the discretization operation comprises a basic binning operation and at least one additional operation.

Optionally, in the method, the at least one additional operation comprises at least one of the following kinds of operations: logarithm operation, exponential operation, absolute value operation, gaussian transformation operation.

Optionally, in the method, the at least one additional operation comprises an additional binning operation binning in the same manner as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation comprises an additional binning operation in a different binning manner than the basic binning operation.

Optionally, in the method, the basic binning operation and the additional binning operation correspond to equal-width binning operations of different widths or equal-depth binning of different depths, respectively.

Optionally, in the method, the different widths or different depths numerically constitute an equal ratio series or an equal difference series.

Optionally, in the method, the step of performing the basic binning operation and/or the additional binning operation comprises: an outlier bin is additionally provided such that consecutive features having outliers are sorted to the outlier bin.

Optionally, in the method, in step (B), the feature pool model is trained based on a logistic regression (logarithmically regression) algorithm.

Optionally, in the method, the effect of the feature pool model comprises AUC of the feature pool model.

Optionally, in the method, the original test data set is composed of acquired historical data records, wherein in step (B), the acquired historical data records are divided into a plurality of sets of historical data records to train respective feature pool models step by step, and step (B) further includes: and performing prediction on the next group of historical data records by using the feature pool model trained by the current group of historical data records to obtain grouped AUCs corresponding to the next group of historical data records, and synthesizing the grouped AUCs to obtain the AUC of the feature pool model, wherein after the grouped AUCs corresponding to the next group of historical data records are obtained, the feature pool model trained by the current group of historical data records is continuously trained by using the next group of historical data records.

Optionally, in the method, in step (B), when the next set of history data records includes a missing history data record lacking attribute information for at least a part of features on which the feature pool model is generated, a group AUC corresponding to the next set of history data records is obtained based on one of: calculating a group AUC using only the predicted results of history data records other than the missing history data record in the next set of history data records; calculating a group AUC using the prediction results of all history data records of the next set of history data records, wherein the prediction result of the missing history data record is set as a default value, and the default value is determined based on the value range of the prediction result or based on the acquired marker distribution of the history data record; and multiplying the AUC calculated by the prediction results of other historical data records except the missing historical data record in the next group of historical data records by the proportion of the other historical data records in the next group of historical data records to obtain the grouped AUC.

Optionally, in the method, in step (B), when the feature pool model is trained based on a log-probability regression algorithm, the regularization term set for the continuous features is different from the regularization term set for the discontinuous features.

Optionally, in the method, step (B) further comprises: providing an interface to a user for configuring at least one of the following items of the feature pool model: at least a part of the features based on the feature pool model, the algorithm type of the feature pool model, the algorithm parameters of the feature pool model, the operation type of the discretization operation, and the operation parameters of the discretization operation, and in the step (B), the feature pool model is trained according to the items configured by the user through the interface.

Optionally, in the method, in step (B), the interface is provided to the user in response to an indication by the user of the determination of feature importance.

Optionally, the method further comprises: (D) the importance of the determined individual characteristics is graphically presented to the user.

Optionally, in the method, in the step (D), the respective features are presented in order of importance of the features, and/or a part of the features among the respective features is highlighted, wherein the part of the features includes an important feature corresponding to a high importance, an unimportant feature corresponding to a low importance, and/or an abnormal feature corresponding to an abnormal importance.

According to another exemplary embodiment of the invention, a system for determining the importance of individual features of a machine learning sample is provided, comprising: data record acquisition means for acquiring a history data record including a label on a machine learning problem and at least one attribute information for each feature used to generate a machine learning sample; the model training device is used for training at least one characteristic pool model by utilizing the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the characteristics; and the importance determining device is used for acquiring the effect of the at least one characteristic pool model and determining the importance of each characteristic according to the acquired effect of the at least one characteristic pool model, wherein the model training device trains the characteristic pool model by performing discretization operation on at least one continuous characteristic in the at least one part of characteristics.

Optionally, in the system, the importance determination means determines the importance of the corresponding feature based on the feature pool model according to a difference between effects of the feature pool model on the original test data set and a transformed test data set, where the transformed test data set refers to a data set obtained by replacing a value of a target feature whose importance is to be determined in the original test data set with one of: zero values, random values, values obtained by scrambling the order of the original values of the target features.

Optionally, in the system, the at least one feature pool model includes an all-features model, where the all-features model refers to a machine learning model that provides a prediction result about a machine learning problem based on all of the features among the individual features.

Optionally, in the system, the at least one feature pool model comprises a plurality of machine learning models that provide a prediction result about the machine learning problem based on different feature groups, wherein the importance determination means determines the importance of the respective feature according to a difference between effects of the at least one feature pool model on the raw test data set.

Optionally, in the system, the at least one feature pool model includes one or more main feature pool models and at least one sub-feature pool model corresponding to each main feature pool model, wherein a sub-feature pool model refers to a machine learning model that provides a prediction result about a machine learning problem based on remaining features except a target feature whose importance is to be determined among features based on which the corresponding main feature pool model is based, and wherein the importance determination means determines the importance of the corresponding target feature according to a difference between effects of the main feature pool model and the respective sub-feature pool models on the original test data set.

Optionally, in the system, the at least one feature pool model includes a plurality of single-feature models, where a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature whose importance is to be determined among the respective features, and the importance determination means determines the importance of the corresponding target feature according to a difference between effects of the single-feature models on the original test data set.

Optionally, in the system, the discretization operation comprises a basic binning operation and at least one additional operation.

Optionally, in the system, the at least one additional operation comprises at least one of the following kinds of operations: logarithm operation, exponential operation, absolute value operation, gaussian transformation operation.

Optionally, in the system, the at least one additional operation comprises an additional binning operation binning in the same manner as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation comprises an additional binning operation in a different binning manner than the basic binning operation.

Optionally, in the system, the basic binning operation and the additional binning operation correspond to equal-width binning operations of different widths or equal-depth binning of different depths, respectively.

Optionally, in the system, the different widths or different depths numerically constitute an equal ratio series or an equal difference series.

Optionally, in the system, the step of performing the basic binning operation and/or the additional binning operation comprises: an outlier bin is additionally provided such that consecutive features having outliers are sorted to the outlier bin.

Optionally, in the system, the model training means trains the feature pool model based on a log-probability regression algorithm.

Optionally, in the system, the effect of the feature pool model comprises an AUC of the feature pool model.

Optionally, in the system, the original test data set is composed of acquired historical data records, wherein the model training device divides the acquired historical data records into a plurality of groups of historical data records to train each feature pool model step by step, and the model training device further uses the feature pool model trained by the current group of historical data records to perform prediction on the next group of historical data records to obtain a group AUC corresponding to the next group of historical data records, and synthesizes each group AUC to obtain an AUC of the feature pool model, wherein after obtaining the group AUC corresponding to the next group of historical data records, the feature pool model trained by the current group of historical data records is trained continuously by using the next group of historical data records.

Optionally, in the system, when the next set of history data records includes a missing history data record lacking attribute information for at least a part of features on which the feature pool model is generated, the model training means obtains a group AUC corresponding to the next set of history data records based on one of: calculating a group AUC using only the predicted results of history data records other than the missing history data record in the next set of history data records; calculating a group AUC using the prediction results of all history data records of the next set of history data records, wherein the prediction result of the missing history data record is set as a default value determined based on a value range of the prediction result or based on a marker distribution of the acquired history data record; and multiplying the AUC calculated by the prediction results of other historical data records except the missing historical data record in the next group of historical data records by the proportion of the other historical data records in the next group of historical data records to obtain the grouped AUC.

Optionally, in the system, when the model training device trains the feature pool model based on a log-probability regression algorithm, the regular term set for the continuous features is different from the regular term set for the discontinuous features.

Optionally, the system further comprises: a display device, wherein the model training device further controls the display device to provide an interface for a user to configure at least one item among the following items of the feature pool model: the model training device is configured to train the feature pool models based on at least a part of features of the feature pool models, algorithm types of the feature pool models, algorithm parameters of the feature pool models, operation types of discretization operations, and operation parameters of discretization operations.

Optionally, in the system, the model training means controls the display means to provide the interface to the user in response to an indication by the user of the importance of the determined feature.

Optionally, in the system, the display means also graphically presents the determined importance of each feature to the user.

Optionally, in the system, the display means presents the respective features in order of importance of the features, and/or highlights a part of the features among the respective features, wherein the part of the features includes an important feature corresponding to a high importance, an unimportant feature corresponding to a low importance, and/or an abnormal feature corresponding to an abnormal importance.

According to another exemplary embodiment of the invention, a computing apparatus for determining importance of individual features of a machine learning sample is provided, comprising a storage component having stored therein a set of computer-executable instructions which, when executed by the processor, perform the steps of: (A) obtaining a historical data record, wherein the historical data record comprises marks about machine learning problems and at least one piece of attribute information of each feature used for generating machine learning samples; (B) training at least one feature pool model using the acquired historical data records, wherein the feature pool model is a machine learning model that provides a prediction result about a machine learning problem based on at least a part of the features among the individual features; (C) obtaining an effect of the at least one feature pool model, and determining the importance of each feature according to the obtained effect of the at least one feature pool model, wherein in step (B), the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least one portion of features.

Optionally, in the computing apparatus, in step (C), the importance of the corresponding feature based on the feature pool model is determined according to a difference between effects of the feature pool model on the original test data set and a transformed test data set, where the transformed test data set refers to a data set obtained by replacing a value of a target feature whose importance is to be determined in the original test data set with one of: zero values, random values, values obtained by scrambling the order of the original values of the target features.

Optionally, in the computing apparatus, the at least one feature pool model includes an all-feature model, where the all-feature model refers to a machine learning model that provides a prediction result about a machine learning problem based on all of the features among the respective features.

Optionally, in the computing device, the at least one feature pool model comprises a plurality of machine learning models that provide a prediction result about a machine learning problem based on different feature groups, wherein in step (C), the importance of the respective features is determined according to a difference between effects of the at least one feature pool model on the original test data set.

Optionally, in the computing apparatus, the at least one feature pool model includes one or more main feature pool models and at least one sub-feature pool model respectively corresponding to each main feature pool model, where a sub-feature pool model refers to a machine learning model that provides a prediction result about a machine learning problem based on remaining features except for a target feature whose importance is to be determined among features based on which the corresponding main feature pool model is based, and in step (C), the importance of the corresponding target feature is determined according to a difference between effects of the main feature pool model and the respective sub-feature pool models corresponding thereto on an original test data set.

Optionally, in the computing apparatus, the at least one feature pool model includes a plurality of single-feature models, wherein a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature whose importance is to be determined among the respective features, wherein in step (C), the importance of the corresponding target feature is determined according to a difference between effects of the single-feature models on the original test data set.

Optionally, in the computing device, the discretization operation comprises a basic binning operation and at least one additional operation.

Optionally, in the computing device, the at least one additional operation comprises at least one operation among the following kinds of operations: logarithm operation, exponential operation, absolute value operation, gaussian transformation operation.

Optionally, in the computing device, the at least one additional operation comprises an additional binning operation in the same manner as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation comprises an additional binning operation in a different binning manner than the basic binning operation.

Optionally, in the computing device, the basic binning operation and the additional binning operation correspond to equal-width binning operations of different widths or equal-depth binning of different depths, respectively.

Optionally, in the computing device, the different widths or different depths numerically constitute an equal ratio series or an equal difference series.

Optionally, in the computing device, the step of performing a basic binning operation and/or an additional binning operation comprises: an outlier bin is additionally provided such that consecutive features having outliers are sorted to the outlier bin.

Optionally, in the computing device, in step (B), the feature pool model is trained based on a log-probability regression algorithm.

Optionally, in the computing device, the effect of the feature pool model comprises an AUC of the feature pool model.

Optionally, in the computing device, the original test data set is composed of acquired historical data records, wherein in step (B), the acquired historical data records are divided into multiple sets of historical data records to train respective feature pool models step by step, and step (B) further includes: and performing prediction on the next group of historical data records by using the feature pool model trained by the current group of historical data records to obtain grouped AUCs corresponding to the next group of historical data records, and synthesizing the grouped AUCs to obtain the AUC of the feature pool model, wherein after the grouped AUCs corresponding to the next group of historical data records are obtained, the feature pool model trained by the current group of historical data records is continuously trained by using the next group of historical data records.

Optionally, in the computing apparatus, in the step (B), when the next set of history data records includes a missing history data record lacking attribute information for generating at least a part of features on which the feature pool model is based, when performing prediction for the next set of history data records using the feature pool model trained by the current set of history data records, obtaining a grouped AUC corresponding to the next set of history data records based on one of: calculating a group AUC using only the predicted results of history data records other than the missing history data record in the next set of history data records; calculating a group AUC using the prediction results of all history data records of the next set of history data records, wherein the prediction result of the missing history data record is set as a default value, and the default value is determined based on the value range of the prediction result or based on the acquired marker distribution of the history data record; and multiplying the AUC calculated by the prediction results of other history data records except the missing history data record in the next group of history data records by the proportion of the other history data records in the next group of history data records to obtain a grouped AUC.

Optionally, in the computing apparatus, in the step (B), when the feature pool model is trained based on a log probability regression algorithm, the regularization term set for the continuous features is different from the regularization term set for the discontinuous features.

Optionally, in the computing device, step (B) further comprises: providing an interface to a user for configuring at least one of the following items of the feature pool model: at least a part of the features based on which the feature pool model is based, the algorithm type of the feature pool model, the algorithm parameters of the feature pool model, the operation type of the discretization operation, and the operation parameters of the discretization operation, and in the step (B), the feature pool model is trained according to the items configured by the user through the interface.

Optionally, in the computing device, in step (B), the interface is provided to the user in response to an indication by the user regarding the determination of the characteristic importance.

Optionally, in the computing device, when the set of computer-executable instructions is executed by the processor, the following steps are further performed: (D) the determined importance of the individual features is graphically presented to the user.

Optionally, in the computing device, in the step (D), the respective features are presented in order of importance of the features, and/or a part of the features among the respective features is highlighted, wherein the part of the features includes an important feature corresponding to a high importance, an unimportant feature corresponding to a low importance, and/or an abnormal feature corresponding to an abnormal importance.

In the method and system for determining the feature importance of the machine learning sample according to the exemplary embodiment of the present invention, the importance of each feature is determined accordingly by using the effect of the feature pool model based on at least a part of the features of the machine learning sample, wherein, when the feature pool model is trained, continuous features in the at least a part of the features need to be discretized, so that the importance of the relevant features can be effectively reflected by the effect of the feature pool model, and the importance of each feature can be effectively obtained.

Drawings

These and/or other aspects and advantages of the present invention will become more apparent and more readily appreciated from the following detailed description of the embodiments of the invention, taken in conjunction with the accompanying drawings of which:

FIG. 1 illustrates a block diagram of a system for determining feature importance of machine learning samples according to an exemplary embodiment of the present invention;

FIG. 2 illustrates a flow diagram of a method of determining feature importance of machine learning samples according to an exemplary embodiment of the present invention;

FIG. 3 shows a flowchart of a method of determining feature importance of machine learning samples according to another example embodiment of the present invention;

FIG. 4 illustrates an example of a feature importance presentation interface in accordance with an exemplary embodiment of the present invention; and

fig. 5 illustrates an example of a feature importance presentation interface according to another exemplary embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, exemplary embodiments of the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

In an exemplary embodiment of the present invention, the feature importance is determined by: training a feature pool model based on at least a portion of features of the machine learning samples, wherein successive features are subject to a discretization process. On the basis, the importance of each feature is measured based on the prediction effect of the feature pool model.

Here, machine learning is a necessary product of the development of artificial intelligence research to a certain stage, which is directed to improving the performance of the system itself by means of calculation, using experience. In a computer system, "experience" is usually in the form of "data" from which a "model" can be generated by means of a machine learning algorithm, that is, by providing experience data to a machine learning algorithm, a model can be generated based on these experience data, which model will provide a corresponding judgment, i.e. a prediction result, in the face of a new situation. Whether a machine learning model is trained or predicted using a trained machine learning model, the data needs to be converted into machine learning samples that include various features. Machine learning may be implemented in the form of "supervised learning," "unsupervised learning," or "semi-supervised learning," it being noted that the present invention is not particularly limited to a specific machine learning algorithm. In addition, it should be noted that other means such as statistical algorithms may also be incorporated in the process of training and applying the model.

Fig. 1 illustrates a block diagram of a system for determining feature importance of machine learning samples according to an exemplary embodiment of the present invention. Specifically, the feature importance determination system measures the importance of each corresponding feature by using the prediction effect of a feature pool model based on at least a part of the features, wherein at least a part of original continuous features based on the feature pool model are subjected to discretization processing. In this way, the importance of individual features (particularly consecutive features) can be determined more efficiently.

The system shown in fig. 1 may be implemented entirely by a computer program, as a software program, as a dedicated hardware device, or as a combination of software and hardware. Accordingly, each device constituting the system shown in fig. 1 may be a virtual module that realizes the corresponding function only by means of a computer program, may be a general-purpose or dedicated device that realizes the function by means of a hardware structure, or may be a processor or the like on which the corresponding computer program runs. With the system, the importance of various features of the machine learning samples can be determined, and the importance information is helpful for model training and/or model interpretation.

As shown in fig. 1, the data record acquisition apparatus 100 is configured to acquire a history data record, wherein the history data record includes a label about a machine learning problem and at least one attribute information of each feature used for generating a machine learning sample.

The history data record may be data generated online, data generated and stored in advance, or data received from an external device through an input device or a transmission medium, for example, data received from a client by a cloud or data received from a cloud by the client. Such data may relate to information about an individual, business, or organization, such as identity, academic calendar, occupation, assets, contact details, liabilities, income, profit, tax, and the like. Alternatively, the data may relate to information about business related items, such as transaction amount, transaction parties, subject matter, transaction location, etc. about the contract. It should be noted that the attribute information content mentioned in the exemplary embodiments of the present invention may relate to the performance or nature of any object or matter in some respect, and is not limited to defining or describing individuals, objects, organizations, units, organizations, items, events, and so forth.

The data record acquisition device 100 may acquire structured or unstructured data from different sources, such as text data or numerical data. The acquired historical data records may be used to form machine learning samples, participate in the training and/or testing of machine learning models. Such data may originate from within an entity desiring to apply machine learning, e.g., from a bank, business, school, etc. desiring to apply machine learning; such data may also originate from other than the aforementioned entities, such as from data providers, the internet (e.g., social networking sites), mobile operators, APP operators, courier companies, credit agencies, and so forth. Optionally, the internal data and the external data can be used in combination to form a machine learning sample carrying more information, thereby facilitating the discovery of more important features.

The data may be input to the data record obtaining apparatus 100 through an input device, or automatically generated by the data record obtaining apparatus 100 according to the existing data, or may be obtained by the data record obtaining apparatus 100 from a network (e.g., a storage medium (e.g., a data warehouse) on the network), and furthermore, an intermediate data exchange device such as a server may facilitate the data record obtaining apparatus 100 to obtain the corresponding data from an external data source. Here, the acquired data may be converted into a format that is easy to handle by a data conversion module such as a text analysis module in the data record acquisition apparatus 100. That is, the data record acquisition device 100 may be a device having the capability of receiving and processing data records, or may simply be a device that provides data records that are already prepared. It should be noted that the data record acquisition apparatus 100 may be configured as various modules composed of software, hardware, and/or firmware, and some or all of these modules may be integrated or cooperate together to accomplish a specific function.

The model training apparatus 200 is configured to train at least one feature pool model using the acquired historical data record, where the feature pool model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on at least a part of features among the respective features, and the model training apparatus 200 trains the feature pool model by performing a discretization operation on at least one continuous feature among the at least a part of features.

Here, the feature pool model is designed based on at least a part of the features of the machine learning sample, and accordingly, the model training apparatus 200 may generate a training sample of the feature pool model based on the history data record. Specifically, assume that the history data record has attribute information { p₁，p₂，...，P_mBased on these attribute information and labels, machine learning samples corresponding to the machine learning problem can be generated, which will be applied to model training and/or testing for the machine learning problem. In particular, the characteristic part of the machine learning sample can be expressed as f₁，f₂，...，f_nWhere n is a positive integer, while exemplary embodiments of the present invention are directed to determining a characteristic portion f₁，f₂，...，f_nThe degree of importance of each feature. To this endThe model training apparatus 200 is required to train a feature pool model providing a prediction result about a machine learning problem based on at least a part of the features, where the model training apparatus 200 may select { f }₁，f₂，...，f_nSelecting at least a part of the features as the features of the training samples of the feature pool model, and using the marks of the corresponding historical data records as the marks of the training samples. According to an exemplary embodiment of the present invention, some or all of the continuous features of the selected at least one part of the features are subjected to discretization processing. Here, the model training apparatus 200 may train one or more feature pool models, wherein the importance of the corresponding feature may be obtained by synthesizing the difference in the prediction effect on the original test data set and the transformed test data set based on the same feature pool model (which may be based on all or a part of the features of the machine learning sample), wherein the transformed test data set is obtained by transforming the values of some target features in the original test data set, so that the difference in the prediction effect may reflect the prediction effect, i.e., the importance, of the target feature; alternatively, the importance of the corresponding features may be derived synthetically based on the difference in the predicted effect of the different feature pool models on the same test dataset (i.e., the original test dataset), where the different feature pool models may be designed based on different combinations of features, such that the difference in the predicted effect reflects the respective predicted effect, i.e., importance, of the different features; in particular, a single-feature model can be trained for each feature of the machine learning sample, and accordingly, the prediction effect of the single-feature model can represent the importance of the feature on which the single-feature model is based. It should be noted that the above two ways of measuring feature importance can be used alone or in combination.

As described above, according to the exemplary embodiment of the present invention, when training the feature pool model, the model training apparatus 200 may train the feature pool model by performing a discretization operation on at least one continuous feature, where the model training apparatus 200 may process the continuous feature in any suitable discretization manner, so that the feature pool model trained based on the discretized continuous feature (or along with other features) can better reflect the importance of each feature.

Here, as an example, the discretization operation may include a basic binning operation and at least one additional operation, and accordingly, the model training apparatus 200 may perform the basic binning operation and the at least one additional operation for each of some continuous features according to which the feature pool model is based when training the feature pool model, to generate a basic binning feature and at least one additional feature corresponding to each continuous feature.

Here, among the features of the machine learning sample, there may be a continuous feature generated based on at least a part of the attribute information of the data record, where the continuous feature is a feature as opposed to a discrete feature (e.g., a category feature), and the value thereof may be a numerical value having a certain continuity, such as a distance, an age, an amount, and the like. In contrast, as an example, the values of the discrete features do not have continuity, and may be the features of unordered classification such as "from beijing", "from shanghai", or "from tianjin", "sex is male", and "sex is female", for example.

For example, some continuous value attribute in the history data record can be directly used as a corresponding continuous feature in the machine learning sample, for example, the attributes of distance, age, amount, etc. can be directly used as the corresponding continuous feature. Furthermore, certain attributes (e.g., continuous attributes and/or discrete attributes) in the history data record may also be processed to obtain corresponding continuous features, for example, a height to weight ratio as the corresponding continuous features.

It should be noted that in addition to the continuous features that will be subjected to the basic binning operation and the additional operation, the training samples of the feature pool model may also include other continuous features and/or discrete features included from the machine learning samples, wherein the other continuous features may participate in the training of the feature pool model without undergoing the discretization operation.

It can be seen that according to an exemplary embodiment of the present invention, for each successive feature to be subjected to the basic binning operation, at least one additional operation may additionally be performed, thereby enabling to obtain multiple features characterizing certain properties of the original data record from different angles, scales/layers simultaneously.

Here, the binning operation is a specific method of discretizing a continuous feature, that is, dividing a value range of the continuous feature into a plurality of sections (i.e., a plurality of bins), and determining a corresponding binning feature value based on the divided bins. Binning operations can be broadly divided into supervised binning and unsupervised binning, each of which includes some specific binning modes, e.g., supervised binning includes minimum entropy binning, minimum description length binning, etc., while unsupervised binning includes equal width binning, equal depth binning, k-means cluster-based binning, etc. In each binning mode, corresponding binning parameters, such as width, depth, etc., may be set. It should be noted that, according to the exemplary embodiment of the present invention, the binning operation performed by the model training apparatus 200 is not limited to the kind of binning manner nor to the parameters of the binning operation, and the specific representation manner of the binning features generated accordingly is also not limited.

In addition to performing the basic binning operation, the model training apparatus 200 may perform at least one additional operation on the continuous features, where the additional operation may be any functional operation that may generate continuous features or discrete features, for example, the additional operation may be a logarithmic operation, an exponential operation, an absolute value operation, or the like. In particular, the additional operation may also be a binning operation (referred to as an "additional binning operation"), where the additional binning operation differs from the basic binning operation in the binning mode and/or in the binning parameters. It follows that the at least one additional operation may be an operation of the same or different kind of operation each under the same or different operation parameters (e.g. exponent in exponential operation, base in logarithmic operation, depth in binning operation, width in binning operation, etc.), where the additional operation may be an expression operation with a main body of logarithmic operation, exponential operation, absolute value operation, etc., or may be a combination of multiple operations.

In this way, the model training apparatus 200 can convert each of at least a portion of the continuous features into the basic bin features and the corresponding at least one additional feature, thereby improving the effectiveness of the machine learning material for the feature pool model and providing a better basis for the subsequent feature importance determination.

Next, the model training apparatus 200 may generate a training sample including at least the generated basic binned features and at least one additional feature for training the corresponding feature pool model. Here, in the training sample, in addition to the basic binning feature and the additional feature generated by the model training apparatus 200, any other feature may be included, wherein the other feature may be a feature belonging to a machine learning sample that should be generated based on a history data record.

The model training apparatus 200 may train the feature pool model based on the training samples described above. Here, the model training apparatus 200 may learn an appropriate feature pool model from the training samples using an appropriate machine learning algorithm (e.g., log-probability regression).

The importance determination device 300 is configured to obtain an effect of the trained at least one feature pool model, and determine the importance of each feature according to the obtained effect of the at least one feature pool model. Here, the importance determination apparatus 300 may acquire the effect of the feature pool model by applying the trained feature pool model to the corresponding test data set, and may also receive the effect of the feature pool model from other parties connected thereto.

In particular, the performance of the feature pool model on the test set may serve as a predictive effect for the feature pool model, and this predictive effect may be used to measure the predictive power of the feature group on which the feature pool model is based. By measuring the effect difference of different characteristic pool models on the original test data set or the effect difference of the same characteristic pool model on different test characteristics, the importance of each characteristic of the machine learning sample can be comprehensively obtained.

Here, as an example, the effect of the feature pool model may include AUC (Area Under ROC (Receiver Operating Characteristic) Curve of the feature pool model).

For example, assume that the features upon which a certain feature pool model is based are the feature portions { f ] of the machine-learned samples₁，f₂，...，f_nThree features of { f }₁，f₃，f₅And, successive features f therein₁The training samples of the feature pool model are subjected to discretization, and accordingly AUC of the feature pool model on the test data set can reflect feature combination { f }₁，f₃，f₅The predictive power of. In addition, assume that there are two features based on which another feature pool model is based { f }₁，f₃H, likewise, consecutive features f₁After discretization, the AUC of the feature pool model on the test data set can reflect the feature combination { f₁，f₃The predictive power of. On the basis, the difference value between the two AUCs can be used for reflecting the characteristic f₅The importance of (c).

As another example, assume that the features upon which a certain feature pool model is based are the feature portions { f ] of the machine learning samples₁，f₂，...，f_nThree features of { f }₁，f₃，f₅And, successive features f therein₁The training samples of the feature pool model are subjected to discretization, and accordingly AUC of the feature pool model on an original test data set can reflect feature combination { f }₁，f₃，f₅The predictive power of. Here, to determine the target feature f₅By the features f in each test sample included in the original test data set₅The values of (a) are processed to obtain a transformed test data set, and further obtain an AUC of the feature pool model over the transformed test data set. On the basis, the difference value between the two AUCs can be used for reflecting the target characteristic f₅The importance of (c). As an example, in the transformation process, the feature f in each original test sample may be transformed₅By replacing the value of the feature f with a zero value, a random value, or by replacing the feature f with a value of zero₅The original values of (a) are scrambled in order to obtain values.

It should be understood that each of the above-described devices may be individually configured as software, hardware, firmware, or any combination thereof that performs the specified function. These means may correspond, for example, to an application-specific integrated circuit, to pure software code, or to a combination of software and hardware elements or modules. Further, one or more functions implemented by the apparatuses may also be performed in a unified manner by components in a physical device (e.g., a processor, a client, a server, or the like).

A flow chart of a method of determining the characteristic importance of a machine learning sample according to an exemplary embodiment of the present invention is described below with reference to fig. 2. Here, the method shown in fig. 2 may be performed by the feature importance determination system shown in fig. 1, by way of example, may also be entirely implemented in software by a computer program, and may also be performed by a specifically configured computing device as shown in fig. 2. For convenience of description, it is assumed that the method shown in fig. 2 is performed by the feature importance determining system shown in fig. 1.

As shown, in step S100, a history data record including a label about a machine learning problem and at least one attribute information of each feature used to generate a machine learning sample is acquired by the data record acquisition apparatus 100.

Here, the history data record is a real record about the machine learning problem desired to be predicted, which includes both the attribute information and the label, such history data record will be used to form the machine learning sample as a material of the machine learning, and the exemplary embodiment of the present invention is intended to determine the degree of importance of each feature in the formed machine learning sample.

Specifically, as an example, the data record obtaining apparatus 100 may collect the historical data in a manual, semi-automatic or fully automatic manner, or process the collected raw historical data so that the processed historical data record has a proper format or form. As an example, the data record acquisition apparatus 100 may collect the history data in a batch.

Here, the data record acquisition apparatus 100 may receive the history data record manually input by the user through an input apparatus (e.g., a workstation). Further, the data record acquisition device 100 may systematically retrieve the historical data records from the data source in a fully automated manner, for example, by systematically requesting the data source and obtaining the requested historical data from the response via a timer mechanism implemented in software, firmware, hardware, or a combination thereof. The data sources may include one or more databases or other servers. The manner in which the data is obtained in a fully automated manner may be implemented via an internal network and/or an external network, which may include transmitting encrypted data over the internet. Where servers, databases, networks, etc. are configured to communicate with one another, data collection may be automated without human intervention, but it should be noted that there may still be some user input action in this manner. The semi-automatic mode is between the manual mode and the full-automatic mode. The semi-automatic mode differs from the fully automatic mode in that a trigger mechanism activated by the user replaces, for example, a timer mechanism. In this case, the request for extracting data is generated only in the case where a specific user input is received. Each time data is acquired, the captured historical data may preferably be stored in a non-volatile memory. As an example, a data warehouse may be utilized to store raw data collected during acquisition as well as processed data.

The obtained historical data records can be derived from the same or different data sources, that is, each historical data record can also be a concatenation result of different historical data records. For example, in addition to obtaining information data records (which include attribute information fields of income, academic history, post, property status, etc.) filled in when a customer applies for opening a credit card to a bank, the data record obtaining device 100 may also obtain other data records of the customer at the bank, such as loan records, daily transaction data, etc., and these obtained data records may be spliced into a complete history data record along with an indicia as to whether the customer is a fraudulent customer, as an example. Furthermore, the data record obtaining device 100 can also obtain data from other private sources or common sources, such as data from a data provider, data from the internet (e.g., social networking sites), data from a mobile operator, data from an APP operator, data from an express company, data from a credit agency, and so on.

Optionally, the data record obtaining apparatus 100 may store and/or process the collected data by means of a hardware cluster (such as a Hadoop cluster, a Spark cluster, etc.), for example, store, classify, and otherwise operate offline. In addition, the data record acquisition device 100 may perform online streaming processing on the acquired data.

As an example, a data conversion module such as a text analysis module may be included in the data record obtaining device 100, and accordingly, in step S100, the data record obtaining device 100 may convert unstructured data such as text into structured data that is easier to use for further processing or reference. Text-based data may include emails, documents, web pages, graphics, spreadsheets, call center logs, transaction reports, and the like.

Next, in step S200, at least one feature pool model is trained by the model training device 200 using the acquired historical data record, wherein the feature pool model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on at least a part of features among the respective features, and wherein the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least a part of features.

Here, the model training device 200 may perform any appropriate discretization operation for the at least one continuous feature, respectively, and as an example, the model training device 200 may perform a basic binning operation and at least one additional operation to generate a basic binning feature and at least one additional feature corresponding to each continuous feature, respectively, and the generated basic binning feature and the at least one additional feature may constitute at least a part of features of a training sample of the feature pool model as discretized features.

As described above, the continuous feature is used as a feature in the machine learning sample, which may be generated from at least a part of the attribute information of the history data record, for example, the attribute information of the history data record, such as distance, age, and amount of money, which are continuously taken as the continuous feature, or may be obtained by further processing some of the attribute information of the history data record, for example, the ratio of height to weight may be used as the continuous feature.

After the continuous features are obtained, basic binning may be performed on the obtained continuous features by the model training apparatus 200, where the model training apparatus 200 may perform basic binning in various binning manners and/or binning parameters.

Taking the unsupervised equal-width binning as an example, assuming that the value interval of the continuous feature is [0,100], and the corresponding binning parameter (i.e., width) is 50, 2 bins can be sorted, in which case the continuous feature with a value of 61.5 corresponds to the 2 nd bin, and if the labels of the two bins are 0 and 1, the bin label corresponding to the continuous feature is 1. Alternatively, assuming a bin width of 10, 10 bins may be separated, in which case a consecutive feature with a value of 61.5 corresponds to the 7 th bin, and if the ten bins have a number of 0 to 9, the consecutive feature corresponds to a bin number of 6. Alternatively, assuming a bin width of 2, 50 bins may be separated, in which case a consecutive feature with a value of 61.5 corresponds to the 31 st bin, and if the fifty bins are numbered 0 to 49, the consecutive feature corresponds to bin number 30. By way of example, the box number of a specific continuous feature can be determined and the corresponding feature value can be obtained through an online calculation mode, and a mapping table lookup mode is not needed, so that storage space overhead is saved.

After mapping the sequential features to multiple bins, the corresponding feature values may be any value that is custom defined. That is, a basic binning operation is performed to generate a multi-dimensional basic binning feature corresponding to consecutive features, where each dimension may indicate whether the corresponding bin is binned with the corresponding consecutive feature, for example, by "1" indicating that the consecutive feature is binned into the corresponding bin and "0" indicating that the consecutive feature is not binned into the corresponding bin, and accordingly, in the above example, assuming that 10 bins are binned, the basic binning feature may be a 10-dimensional feature, and a basic binning feature corresponding to a consecutive feature with a value of 61.5 may be represented as [0, 0, 0, 0,1, 0, 0, 0, 0 ]. Alternatively, each dimension may indicate a feature value of the respective continuous feature classified in the corresponding box, and accordingly, in the above example, the basic classified feature corresponding to the continuous feature having a value of 61.5 may be represented as [0, 0, 0, 0, 0, 0, 61.5, 0, 0, 0 ]; or, each dimension indicates an average value of the eigenvalues of all the continuous features classified in the corresponding bin; or, each dimension indicates a median of the eigenvalues of all the consecutive features classified in the corresponding bin; alternatively, each dimension indicates a boundary value of the feature values of all the consecutive features classified in the corresponding box, where the boundary value may be an upper boundary value or a lower boundary value.

In addition, the values of the basic box characteristics can be normalized so as to carry out operation conveniently. Suppose that the j value of the ith continuous feature to be discretized is x_ijThe bin splitting characteristic can be expressed as (BinID, x'_ij) Wherein BinID indicates the number of the box to which the continuous characteristic is divided, and the value range of the number is 0,1, … and B-1, wherein B is the total number of the boxes and x'_ijIs x_ijThe normalized value of (b), the above-mentioned feature (BinID, x'_ij) The characteristic value representing the dimension corresponding to the box with the reference number of BinID in the basic box separation characteristics is x'_ijAnd the characteristic values of the other dimensions are 0.

Wherein, x'_ijCan be represented by the following formula:

therein, max_iIs the maximum value of the ith successive feature, min_iIs the minimum of the ith consecutive feature, and,

wherein,

is a rounded-down operation sign.

Taking the unsupervised equal-width binning as an example, assuming that the value range of the continuous feature is [0,100], in the case of a binning width of 50, the continuous feature having a value of 61.5 may correspond to the basic binning feature (1, 0.23) according to the above calculation formula, and in the case of a binning width of 10, the continuous feature having a value of 61.5 may correspond to the basic binning feature (6, 0.15) according to the above calculation formula.

Here, in order to obtain the above feature (BinID, x'_ij) In step S200, the model training device 200 may calculate the formula for each x_ijValue was subjected to BinID and x'_ijOr, the model training apparatus 200 may generate a mapping table about the value range of each BinID in advance, and obtain the binids corresponding to the consecutive features by looking up the data table.

Further, as an example, noise in the history data record may also be reduced by removing outliers in the continuous features before performing the basic binning operation. In this way, the effectiveness of using the binned features to determine feature importance can be further improved.

Specifically, an outlier bin may be additionally set such that consecutive features having outliers are sorted to the outlier bin. For example, for a continuous feature with a value interval of [0,1000], a certain number of samples may be selected for pre-binning, for example, equal width binning is performed with a bin width of 10, then the number of samples in each bin is recorded, and for bins with a smaller number of samples (e.g., less than a threshold value), they may be combined into at least one outlier bin. As an example, if the number of samples in the bins at both ends is small, the bins with less samples may be merged into an outlier bin, while the remaining bins are kept, and assuming that the number of samples in the bins 0 to 10 is small, the bins 0 to 10 may be merged into an outlier bin, thereby uniformly dividing the continuous features having values of [0,100] into the outlier bins.

In addition to performing the basic binning operation described above, in step S200, the model training apparatus 200 further performs at least one additional operation different from the basic binning operation for the continuous features of the performed basic binning operation to obtain corresponding at least one additional feature.

Here, the additional operation may be any function operation that may have corresponding operation parameters, and the additional operation performed for a single continuous feature may be one or more operations that may be different kinds of operations or operations of the same kind but different operation parameters.

In particular, the additional operation may also indicate a binning operation, where, similar to the basic binning characteristics, the additional binning characteristics generated by the additional binning operation may also be multidimensional characteristics, where each dimension indicates whether a respective continuous characteristic is binned in the corresponding bin; or, each dimension indicates a feature value of a respective continuous feature classified in the corresponding bin; or, each dimension indicates an average value of the eigenvalues of all the consecutive features classified in the corresponding bin; or, each dimension indicates the median of the eigenvalues of all the successive features that are sorted in the corresponding bin; alternatively, each dimension indicates the boundary values of the eigenvalues of all the consecutive features that are sorted into the corresponding bin.

In particular, the at least one additional operation may comprise an additional binning operation in the same manner as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation may comprise an additional binning operation in a different manner than the basic binning operation. The binning mode includes various binning modes under supervision binning and/or unsupervised binning. For example, supervised binning includes minimum entropy binning, minimum description length binning, and the like, while unsupervised binning includes equal width binning, equal depth binning, binning based on k-means clustering, and the like.

As an example, the basic binning operation and the additional binning operation may correspond to equal-width binning operations of different widths, respectively. That is to say, the basic binning operation and the additional binning operation adopt the same binning mode but different division granularities, so that the generated basic binning characteristics and the additional binning characteristics can better depict the rules of the original historical data records, and the importance of each characteristic can be better determined. In particular, the different widths used for the basic binning operation and the additional binning operation may numerically form an equal ratio series, e.g., the basic binning operation may be equally wide binned by a width of a value of 2, and the additional binning operation may be equally wide binned by a width of a value of 4, a value of 8, a value of 16, etc. Alternatively, the different widths used for the basic binning operation and the additional binning operation may numerically form an arithmetic series, e.g., the basic binning operation may be equally wide binned by a width of value 2, and the additional binning operation may be equally wide binned by a width of value 4, value 6, value 8, etc.

As another example, the basic binning operation and the additional binning operation may correspond to equal-depth binning operations of different depths, respectively. That is to say, the basic binning operation and the additional binning operation adopt the same binning mode but different division granularities, so that the generated basic binning characteristics and the additional binning characteristics can better depict the rules of the original historical data records, and the importance of each characteristic can be better determined. In particular, the different depths employed by the basic binning operation and the additional binning operation may numerically form an equal-ratio series, e.g., the basic binning operation may be equally deep binned by a depth of 10, while the additional binning operation may be equally deep binned by a depth of 100, 1000, 10000, etc. Alternatively, the different depths used for the basic binning operation and the additional binning operation may numerically form an arithmetic series, e.g., the basic binning operation may be equally deep binned by a depth of 10, and the additional binning operation may be equally deep binned by a depth of 20, 30, 40, etc.

According to an exemplary embodiment of the invention, the additional operations may further comprise non-binning operations, e.g. the at least one additional operation comprises at least one operation among the following classes of operations, each being the same or notAnd (3) operation under the same operation parameters: logarithm operation, exponential operation, absolute value operation, gaussian transformation operation. It should be noted that the additional operation here is not limited by the kind of operation and the operation parameters, and may take any suitable form of operation, that is, the additional operation may have either a simple form such as a square operation or a complex operation expression, for example, the jth value x for the ith consecutive feature_ijAdditional operations may be performed thereon to obtain an additional feature x ″, as follows_ij：

x″_ij＝sign(x_ij)×log₂(1+|x_ij|), where sign is a sign function.

In addition to the basic binning features and the additional features described above, other features included in the training samples of the feature pool model may be generated, which may be obtained by the model training apparatus 200 by performing various feature processes such as direct extraction, discretization, field combination, extraction of partial field values, rounding, and the like on at least a part of the attribute information of the history data record.

Next, a training sample comprising the above-described feature along with a corresponding labeled feature pool model is generated by the model training apparatus 200. According to an exemplary embodiment of the present invention, the above-described processing may be performed in memory under a distributed parallel computing framework, where the distributed parallel computing framework may have distributed parameter servers.

Further, as an example, the generated training samples may be used directly in the training process of the feature pool model. In particular, the step of generating the training samples may be considered as part of the training process of the feature pool model, and accordingly, the training samples need not be explicitly saved to a hard disk, which may significantly increase the operating speed compared to conventional approaches.

Next, the feature pool model may be trained by the model training apparatus 200 based on the training samples. Here, the model training apparatus 200 may learn an appropriate feature pool model from the training samples using an appropriate machine learning algorithm (e.g., log-probability regression). As an example, in the case that the training samples of the feature pool model include both continuous features and discontinuous features, different regular terms may be set for the continuous features and the discontinuous features, respectively, that is, the regular terms set for the continuous features are different from the regular terms set for the discontinuous features.

In the above example, a more stable and better predicted feature pool model can be trained, so as to effectively determine the importance of each feature based on the predicted effect of the feature pool model.

Specifically, in step S300, the effect of the trained at least one feature pool model is acquired by the importance determination means 300, and the importance of each feature of the machine learning sample is determined according to the acquired effect of the at least one feature pool model.

Here, the importance determining apparatus 300 may acquire the effect of the feature pool model by applying the trained feature pool model to the corresponding test data set, and may also receive the effect of the feature pool model from other parties connected thereto.

As an example, the significance determination apparatus 300 may determine the significance of the respective features on which the feature pool model is based from a difference between effects of the feature pool model on an original test dataset and a transformed test dataset, wherein the transformed test dataset refers to a dataset obtained by replacing values of target features in the original test dataset whose significance is to be determined with one of: zero values, random values, values obtained by scrambling the order of the original values of the target features.

Here, each feature pool model may be based on at least one feature of the machine learning sample, and accordingly, a predictive effect of the feature pool model on the original test data set may be obtained. In addition, the prediction effect of the feature pool model on the transformed test data set can be obtained by transforming the values of the target features on the original test data set. The difference between the two predicted effects can be used to measure the importance of the target feature.

As an example, the at least one feature pool model may comprise a full feature model, wherein the full feature model is machine-basedA machine learning model for learning all the features of the sample to provide a prediction result about a machine learning problem, specifically, it is assumed that the model training apparatus 200 trains an all-feature model in step S200, the all-feature model being trained based on all the features { f } of the machine learning sample₁，f₂，...，f_nTo give a prediction result about the machine learning problem. The significance determination apparatus 300 can obtain the predicted effect (e.g., AUC) of the entire feature model on the original test data set_all) The raw test data set herein may result in additional historical data records being obtained by data record acquisition device 100.

In this example, to determine { f }₁，f₂，...，f_nAny target feature f in_iOf interest (where 1 ≦ i ≦ n), the original test data set may be processed accordingly to obtain the target feature f_iTransforming the test data set, e.g. comparing the features f in the individual test samples of the original test data set_iBy replacing the value of (a) with another value, e.g. a zero value, a random value, or by replacing the characteristic f_iThe values of (a) are obtained after the order is broken between the respective test samples. Accordingly, the significance determination apparatus 300 can obtain the test effect (e.g., AUC) of the above-described overall feature model on the transformation test data set_i)。

After obtaining the effects of all feature models on the original test data set and the transformed test data set, respectively, the significance determination apparatus 300 may determine the difference between the two effects (i.e., AUC)_all-AUC_i) As a measure of the target characteristic f_iReference to the importance of.

The above shows an example of determining the importance of the individual features on which the original test data set is based by means of the same feature pool model by transforming it. However, the exemplary embodiments of the present invention are not limited thereto, and the number of feature pool models and the feature groups on which each feature pool model is based may be designed in any suitable manner as long as the predicted effect of the feature pool models can infer the importance of each feature.

For example, the at least one feature pool model trained by the model training apparatus 200 in step S200 may include a plurality of machine learning models that provide a prediction result regarding a machine learning problem based on different feature groups, and accordingly, in step S300, the importance determining apparatus 300 may determine the importance of the respective features according to a difference between effects of the at least one feature pool model on the original test data set.

Here, the at least one feature pool model includes one or more main feature pool models and at least one sub-feature pool model respectively corresponding to each main feature pool model, wherein a sub-feature pool model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on remaining features except for a target feature whose importance is to be determined among features on which the corresponding main feature pool model is based, and accordingly, the importance determination apparatus 300 may determine the importance of the corresponding target feature according to a difference between effects of the main feature pool model and the respective sub-feature pool models corresponding thereto on the original test data set.

As an example, the at least one feature pool model may include a full feature model as a main feature pool model and at least one sub-feature pool model, wherein the full feature model refers to a machine learning model providing a prediction result regarding a machine learning problem based on all features of a machine learning sample, and correspondingly, the sub-feature pool model refers to a machine learning model providing a prediction result regarding a machine learning problem based on the remaining features except for a target feature whose importance is to be determined among the all features, and accordingly, in step S300, the importance determination apparatus 300 may determine the importance of the corresponding target feature according to a difference between the full feature model and the effect of each sub-feature pool model on the original test data set.

Specifically, it is assumed that the model training apparatus 200 trains an all-feature model, which is trained based on machine learning samples, in step S200All features f₁，f₂，...，f_nTo give a prediction result about the machine learning problem. The significance determination apparatus 300 may obtain the predicted effect (e.g., AUC) of the entire feature model on the original test data set_all) The raw test data set herein may result in additional historical data records being obtained by data record acquisition device 100.

In this example, to determine { f }₁，f₂，..，f_nAny target feature f in_iOf 1 ≦ i ≦ n, a corresponding sub-feature pool model may additionally be determined in step S200, which is trained to be based on features { f) other than the target feature fi₁，f₂，...，f_i-1，f_i+1，...，f_nTo give a prediction result about the machine learning problem. Accordingly, the significance determination apparatus 300 may obtain a predicted effect (e.g., AUC) of the sub-feature pool model on the original test data set_i)。

After separately acquiring the effects of the entire feature model and each sub-feature pool model on the original test data set, the significance determination apparatus 300 may determine the difference between the two effects (i.e., AUC)_all-AUC_i) As a measure of the characteristic f_iReference to the importance of.

Here, it should be noted that all the feature models described above are only examples and are not intended to limit the scope of the exemplary embodiments of the present invention. In fact, in the feature pool model, there may be a plurality of main feature pool models, each having a respective sub-feature pool model, that is, each main feature pool model may be based on at least a portion of the features of the machine learning samples, where common features may or may not be involved between different main feature pool models.

Further, as an alternative, the at least one feature pool model trained by the model training apparatus 200 in step S200 may include a plurality of single-feature models, wherein a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature whose importance is to be determined among the respective features of the machine learning sample, and accordingly, in step S300, the importance determining apparatus 300 may determine the importance of the corresponding target feature according to a difference between effects of the respective single-feature models on the original test data set.

Specifically, it is assumed that the model training apparatus 200 trains a plurality of single-feature models each trained to a certain feature { f ] based on a machine learning sample in step S200_iTo give a prediction result about the machine learning problem. Here, the number of single feature models may be the same as the number of features of the machine learning samples. Accordingly, the significance determination apparatus 300 may obtain the predicted effect (e.g., AUC) of each single-feature model on the same test data set (e.g., original test data set)_i). Here, since the discretization process has been performed on the continuous features (preferably, the basic binning operation and the additional operation can be performed), it is possible to ensure that the single-feature models can reflect the prediction ability of the respective features more stably, and accordingly, after the effects of all the single-feature models on the same test data set are respectively acquired, the importance determination device 300 can acquire the relative importance degree between the respective features based on the difference between the respective effects.

The method for determining the importance of features according to an exemplary embodiment of the present invention is illustrated above with reference to fig. 2, however, it should be understood that the method illustrated in fig. 2 is not intended to limit the concrete implementation manner of the exemplary embodiment of the present invention, but merely to provide an exemplary description of the basic concept of the exemplary embodiment of the present invention, and in fact, a person skilled in the art may implement the exemplary embodiment of the present invention in any suitable manner by modifying and/or embodying the scheme illustrated in fig. 2. For example, the steps in the flowchart shown in fig. 2 are not limited in any way in terms of timing, for example, steps S200 and S300 need not be limited to be performed in a strict order, and alternatively, a part of the model test operation may be performed during the process of training the feature pool model to determine the effect of the feature pool model.

Specifically, as described above, according to an exemplary embodiment of the present invention, in step S200, the trained at least one feature pool model may include a plurality of machine learning models that provide a prediction result regarding a machine learning problem based on different feature groups, and, in step S300, the importance of the respective features may be determined according to a difference between effects of the at least one feature pool model on an original test data set.

Here, the original test data set may be composed of acquired historical data records, and accordingly, in step S200, the acquired historical data records are divided into a plurality of sets of historical data records to train respective feature pool models step by step, and step S200 further includes: and performing prediction on the next group of historical data records by using the feature pool model trained by the current group of historical data records to obtain grouped AUCs corresponding to the next group of historical data records, and synthesizing the grouped AUCs to obtain the AUC of the feature pool model, wherein after the grouped AUCs corresponding to the next group of historical data records are obtained, the feature pool model trained by the current group of historical data records can be continuously trained by using the next group of historical data records.

Fig. 3 shows a flowchart of a method of determining feature importance of machine learning samples according to another exemplary embodiment of the present invention. Also, for convenience of description, it is assumed that the method illustrated in fig. 3 is performed by the feature importance determining system illustrated in fig. 1. Also, as an example, the feature pool model herein may be a machine learning model based on a log probability regression algorithm, and the effect of the feature pool model may be represented by AUC.

Referring to fig. 3, in step S100, a history data record including a label on a machine learning problem and at least one attribute information of each feature used to generate a machine learning sample is acquired by the data record acquisition apparatus 100. Here, for the sake of brevity, various details of the data record acquisition apparatus 100 acquiring the history data record will not be described again.

Next, in step S210, the acquired history data records are divided into a plurality of sets of history data records by the model training apparatus 200, and the divided sets of history data records are used to gradually train the feature pool model in batches. Alternatively, the training process may be performed on-line, in which case the training samples characterizing the pool model need not be explicitly saved to a hard disk.

In step S220, a kth set of history data records is obtained as a next set of history data records by the model training apparatus 200, where k is a positive integer. According to an exemplary embodiment of the present invention, since each feature pool model is stepwise trained in batches using a plurality of sets of history data records, it can be understood that: prior to obtaining the kth set of historical data records, each feature pool model has been trained periodically based on the previous k-1 batches of historical data records, where a particular feature pool model therein may be denoted as LR_k-1。

In step S230, the model training apparatus 200 respectively obtains the corresponding grouped AUC obtained by the trained one or more feature pool models under the test of the kth group of historical data records. Using the above-mentioned specific characteristic pool model LR_k-1For example, the feature pool model LR is used by the model training apparatus 200_k-1To perform prediction for the kth set of historical data records to derive a grouped AUC corresponding to the kth set of historical data records, i.e., AUC_k. Specifically, in order to use the kth group of historical data records as the test data set, a test sample is generated based on each historical data record in the kth group of historical data records, wherein the feature portion of the test sample is consistent with the feature portion of the training sample of the feature pool model, that is, the model training apparatus 200 may obtain the feature portion of the test sample according to the feature engineering process similar to the training sample, and simultaneously discard the labels of the historical data records, thereby obtaining the test sample of the feature pool model. Next, the model training apparatus 200 inputs the obtained test sample into the feature pool model to obtain a corresponding prediction result. Based on these prediction results, the model training apparatus 200 may obtain the feature pool model LR_k-1Grouped AUC for kth group of historical data records_k. In a similar manner, the model training device 200 may acquire previously trained dataAnd (4) grouping AUCs of all the characteristic pool models for the kth group of historical data records, and storing the grouping AUCs.

In practice, some historical data records may lack certain attribute information related to the features of the feature pool model, and in this case, the model training apparatus 200 may take corresponding countermeasures in order to better obtain the AUC of the feature pool model.

Specifically, when prediction is performed for the next set of history data records using the feature pool model trained on the current set of history data records, when the next set of history data records includes a missing history data record lacking attribute information for generating at least a part of features on which the feature pool model is based, the model training apparatus 200 may derive a packet AUC corresponding to the next set of history data records based on one of:

in the first case: the model training apparatus 200 may calculate the group AUC using only the predicted results of the history data records other than the missing history data record in the next set of history data records. Specifically, assume that the kth group of history data records includes 1000 history data records in total, wherein only 100 history data records include all attribute information on which the feature portion of the feature pool model is based, i.e., 900 history data records belong to the missing history data records. In this case, the model training apparatus 200 may perform prediction using only the 100 pieces of history data records having the complete correlation attribute information, and take the AUC obtained based on the prediction result as the group AUC.

In the second case: the model training apparatus 200 may calculate the group AUC using the prediction results of all the historical data records of the next group of historical data records, wherein the prediction result of the missing historical data record is set as a default value determined based on the value range of the prediction result or based on the marker distribution of the obtained historical data record. Specifically, assume that the kth group of history data records includes 1000 history data records in total, wherein only 100 history data records include all attribute information on which the feature portion of the feature pool model is based, i.e., 900 history data records belong to the missing history data records. In this case, the model training apparatus 200 may input the 100 historical data records with complete relevant attribute information into the feature pool model for prediction, and set the prediction results of the 900 historical data records as default values, where, as an example, the default values may be determined based on the value range of the prediction results, for example, in the case that the value range of the prediction results is [0, 1], the default values may be set as an intermediate value of 0.5; alternatively, the default value may be determined based on the label distribution of the acquired historical data records, for example, assuming that there are 300 positive samples (i.e., labels 1) in 1000 historical data records included in the kth group of historical data records, the default value may be set to be a probability of a positive sample, for example, 0.3. When the corresponding prediction results of all 1000 pieces of history data are obtained as described above, the model training apparatus 200 may take the AUC obtained based on the prediction results as the group AUC.

In the third case: the model training apparatus 200 may multiply the AUC calculated using the prediction result of the history data other than the missing history data in the next set of history data with the proportion of the history data other than the missing history data in the next set of history data to obtain the grouped AUC. Specifically, assume that the kth group of history data records includes 1000 history data records in total, wherein only 100 history data records include all attribute information on which the feature portion of the feature pool model is based, i.e., 900 history data records belong to a missing history data record. In this case, the model training apparatus 200 may input the 100 historical data records having complete related attribute information into the feature pool model to perform prediction, obtain corresponding AUC based on the obtained prediction result, and then the model training apparatus 200 may multiply the obtained AUC by the proportion (i.e., 0.1) occupied by the non-missing historical data records to determine the final grouped AUC.

It should be noted that the above three cases are merely exemplary processing manners when there is a missing history data record, and are not intended to limit exemplary embodiments of the present invention. Any means similar or equivalent to the above three means may also be applied to the exemplary embodiments of the present invention.

After the test of the feature pool models is performed, in step S240, the training of one or more feature pool models trained up to now is continued by the model training apparatus 200 based on the k-th set of history data records, respectively.

Using the above-mentioned specific characteristic pool model LR_k-1For example, in step S240, the model training apparatus 200 continues the model training using the kth set of history data records to obtain the updated feature pool model LR_k. Specifically, in order to use the kth group of historical data records as the training data set, training samples need to be generated based on each historical data record in the kth group of historical data records, that is, the model training apparatus 200 may obtain the feature portion of the training sample according to the corresponding feature engineering process, and obtain the training sample of the feature pool model by using the label of the historical data record as the label of the training sample. Then, the model training apparatus 200 continues to train the feature pool model based on the obtained training samples to obtain an updated feature pool model LR_k. In a similar manner, the model training apparatus 200 may update all feature pool models previously trained using the kth set of historical data records.

It can be seen that, according to the exemplary embodiment of the present invention, in the process of training the feature pool model in stages, the corresponding packet AUC can be obtained at the same time, which makes the training and testing of the model more efficient and faster, and realizes the optimization of the whole system. In fact, the AUC obtained in the above example is strongly correlated with the true test AUC (tested, in a particular data set, the correlation may reach above 0.85), so as an example, the importance of each feature of the feature pool model may be determined based on the grouped AUC obtained in the above manner.

Next, in step S250, it is determined by the model training apparatus 200 whether the acquired kth group of history data records is the last group of divided history data records. If it is determined in step S250 that the current kth group of history data records is not the last group of history data records, it returns to step S220 to obtain the next group of divided history data records, i.e., the (k + 1) th group of history data records. In contrast, if it is determined in step S250 that the current kth set of history data records is the last set of history data records, then it proceeds to step S310, where the importance of each feature of the machine learning sample is determined by the importance determining means 300 based on the saved grouped AUC of each feature pool model.

Specifically, in step S310, the importance determination apparatus 300 may integrate the respective grouped AUCs of each feature pool model to derive an AUC representing the performance of the corresponding feature pool model.

After obtaining the performance (i.e., AUC) of each feature pool model, the importance determination apparatus 300 may regard the performance of the feature pool model as an importance reference of a feature group (i.e., at least a part of features among features in a machine learning sample whose importance is to be determined) to which the feature pool model relates, and deduce the importance of each target feature or an importance ranking between the target features by integrating performance differences between the feature pool models.

Also, it should be noted that: the flowchart shown in fig. 3 is not intended to limit the details of the processing such as timing, but is merely used as an example to explain an exemplary embodiment of the present invention. As an example, the training/testing of the various feature pool models may be performed in parallel and/or online.

According to the exemplary embodiments of the present invention, for the machine learning samples used in the machine learning, the importance degree of each feature included therein can be effectively determined, thereby facilitating better model training and/or model interpretation.

Alternatively, the feature importance determination system shown in fig. 1 may further include a display device (not shown), and accordingly, in step S200 shown in fig. 2, the display device may be controlled by the model training device 200 to provide an interface for a user to configure at least one item among the following items of the feature pool model: at least one part of features based on the feature pool model, the algorithm type of the feature pool model, the algorithm parameters of the feature pool model, the operation type of the discretization operation and the operation parameters of the discretization operation. Further, in this step, the model training apparatus 200 may train the feature pool models individually according to items configured by the user through the interface. Here, as an example, in step S200, the interface may be provided to the user in response to an indication of the user regarding determining the feature importance. For example, during the training process of the machine learning model, in order to determine the importance of each feature in the corresponding machine learning training sample, the user may make an indication during the feature engineering process to expect to acquire the importance of each feature. To this end, according to an exemplary embodiment of the present invention, a control such as a feature importance operator may be provided to a user under an interface related to feature engineering or a modeling process, and when the user clicks the control, an interface related to configuring a feature pool model may be presented to the user, in which various items such as an algorithm, a feature, a rule item, and the like of the feature pool model may be set, and particularly, an item related to how to discretize continuous features of the feature pool model (for example, various parameters of binning operation, and the like) may also be set. For example, as an alternative, the regular terms of the continuous features and the non-continuous features may be set respectively, and different weights of the regular terms corresponding to different continuous features may also be set respectively.

Here, the display device may be a simple display screen, in which case the feature importance determination system may further include an input device (e.g., a keyboard, a mouse, a microphone, a camera, etc.) that facilitates a user to configure items through the interface; alternatively, the display device may be a touch display screen with a touch input function, in which case the user may complete the configuration of the items on the interface directly through the touch screen.

In addition, after the feature importance determining system according to an exemplary embodiment of the present invention acquires the importance of each feature of the machine learning sample, importance information of each determined feature may be graphically presented to the user.

Fig. 4 illustrates an example of a feature importance presentation interface according to an exemplary embodiment of the present invention, in the interface illustrated in fig. 4, a feature importance analysis report is presented, in which a feature importance ranking and some additional information are listed, and as an example, when an indication bar of a certain feature is clicked or moved, sample information or attribute information and the like about the feature may be additionally displayed.

Alternatively, the respective features may be presented in order of importance of the features, and/or a part of the features among the respective features including an important feature corresponding to a high importance, an unimportant feature corresponding to a low importance, and/or an abnormal feature corresponding to an abnormal importance may be highlighted.

Fig. 5 shows an example of a feature importance presentation interface according to another exemplary embodiment of the present invention, in the interface shown in fig. 5, not only the features of the machine learning samples are shown in order of importance, but also abnormal features corresponding to abnormal importance are highlighted, optionally, possible reasons for the abnormal features are further provided, and the user interaction experience is enhanced.

It should be understood that: in the existing machine learning field, the programmer is required to write codes to complete the machine learning process in most cases, and even if some software systems such as a modeling platform are developed, the software systems still face the problem that business personnel except machine learning experts are difficult to benefit. However, according to the exemplary embodiments of the present invention, the importance of each feature in the machine learning sample can be effectively and automatically determined, so that the threshold of applying machine learning is reduced. In addition, according to the exemplary embodiment of the invention, the determination result about the feature importance and/or the related setting about the determination mode can be displayed to the user in a friendly interactive mode, so that the usability of the machine learning platform is further enhanced, accordingly, the user with higher machine learning technical capability can conveniently set and/or adjust details in the determination process, and the common user can intuitively know important features, non-important features and/or abnormal features and the like in the machine learning sample.

It should be noted that the feature importance system according to the exemplary embodiment of the present invention may fully rely on the execution of the computer program to realize the corresponding functions, i.e., the respective means correspond to the respective steps in the functional architecture of the computer program, so that the entire system is called by a dedicated software package (e.g., lib library) to realize the corresponding functions.

Alternatively, the various means in the feature importance system may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium so that a processor may perform the corresponding operations by reading and executing the corresponding program code or code segments.

Here, the exemplary embodiments of the present invention may also be realized as a computing apparatus including a storage part in which a set of computer-executable instructions is stored and a processor, which, when executed by the processor, performs the above-described feature importance determination method.

In particular, the computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the above-described instruction sets.

The computing device need not be a single computing device, but can be any collection of devices or circuits that can execute the instructions (or sets of instructions) described above, individually or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

Some of the operations described above with respect to the feature importance determination method may be implemented by software, some of the operations may be implemented by hardware, and further, the operations may be implemented by a combination of hardware and software.

The processor may execute instructions or code stored in one of the memory components, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory component may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage component.

Further, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via a bus and/or network.

The operations described above with respect to the feature importance determination method may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operated on by non-exact boundaries.

In particular, as described above, a computing device for determining importance of various features of a machine learning sample according to an exemplary embodiment of the present invention may include a storage component having a set of computer-executable instructions stored therein that, when executed by the processor, perform the steps of: (A) obtaining a historical data record, wherein the historical data record comprises marks about machine learning problems and at least one attribute information of each feature used for generating machine learning samples; (B) training at least one characteristic pool model by using the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the characteristics; (C) acquiring an effect of the at least one feature pool model, and determining the importance of each feature according to the acquired effect of the at least one feature pool model, wherein in the step (B), the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least one part of features.

It should be noted that the details of the processing of the feature importance determination method according to the exemplary embodiment of the present invention have been described above with reference to fig. 2 to 5, and the details of the processing when the computing device performs the steps will not be described herein.

While exemplary embodiments of the invention have been described above, it should be understood that the above description is illustrative only and not exhaustive, and that the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention should be subject to the scope of the claims.

Claims

1. A method of determining the importance of various features of a machine learning sample, comprising:

(A) obtaining a historical data record, wherein the historical data record comprises marks about machine learning problems and at least one piece of attribute information of each feature used for generating machine learning samples;

(B) training at least one characteristic pool model by using the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the characteristics;

(C) obtaining an effect of the at least one feature pool model, and determining the importance of the respective features according to the obtained effect of the at least one feature pool model,

wherein, in step (B), the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least a portion of features.

2. The method of claim 1, wherein in step (C), the importance of the respective feature on which the feature pool model is based is determined from the difference between the effect of the feature pool model on the original test data set and the transformed test data set,

the transformation test data set is a data set obtained by replacing the value of the target feature to be determined in the importance of the original test data set with one of the following items: zero values, random values, values obtained by scrambling the order of the original values of the target features.

3. The method of claim 1, wherein the at least one feature pool model comprises a plurality of machine learning models that provide predictions about machine learning problems based on different sets of features,

wherein in step (C) the importance of the individual features is determined from the difference between the effects of the at least one feature pool model on the original test data set.

4. The method of claim 3, wherein the at least one feature pool model comprises one or more main feature pool models and at least one sub-feature pool model respectively corresponding to each main feature pool model, wherein a sub-feature pool model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on remaining features except for a target feature whose importance is to be determined among features on which the corresponding main feature pool model is based,

wherein in step (C) the importance of the respective target feature is determined from the difference between the effect of the main feature pool model and its respective sub-feature pool model on the original test data set.

5. The method of claim 3, wherein the at least one feature pool model includes a plurality of single-feature models, wherein a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature of which importance is to be determined among the respective features,

wherein in step (C) the importance of the corresponding target feature is determined from the difference between the effects of the single-feature model on the original test data set.

6. The method of claim 1, wherein the discretized operation comprises a basic binning operation and at least one additional operation.

7. The method of claim 6, wherein the at least one additional operation comprises an additional binning operation that is binned in the same manner as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation comprises an additional binning operation in a different binning manner than the basic binning operation.

8. The method of claim 1, wherein step (B) further comprises: providing an interface to a user for configuring at least one of the following items of the feature pool model: at least a part of the features on which the feature pool model is based, the algorithm type of the feature pool model, the algorithm parameters of the feature pool model, the operation type of the discretization operation, the operation parameters of the discretization operation,

and, in step (B), training the feature pool models individually according to items configured by the user through the interface.

9. A system for determining the importance of various features of a machine learning sample, comprising:

data record acquisition means for acquiring a history data record, wherein the history data record includes a label about a machine learning problem and at least one attribute information for each feature used to generate a machine learning sample;

the model training device is used for training at least one characteristic pool model by utilizing the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the various characteristics;

importance determination means for acquiring an effect of the at least one feature pool model and determining the importance of the respective features based on the acquired effect of the at least one feature pool model,

wherein the model training means trains the feature pool model by performing a discretization operation on at least one continuous feature among the at least one portion of features.

10. A computing device for determining the importance of individual features of a machine learning sample, comprising a storage component having stored therein a set of computer-executable instructions which, when executed by a processor, perform the steps of: