CN108021984A

CN108021984A - Determine the method and system of the feature importance of machine learning sample

Info

Publication number: CN108021984A
Application number: CN201610935697.0A
Authority: CN
Inventors: 罗远飞; 涂威威
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2016-11-01
Filing date: 2016-11-01
Publication date: 2018-05-11
Also published as: CN113435602A

Abstract

A kind of method and system of the feature importance of definite machine learning sample are provided, the described method includes：(A) historgraphic data recording is obtained, wherein, the historgraphic data recording includes the mark and at least one attribute information on Machine Learning Problems；(B) using the historgraphic data recording obtained, at least one feature pool model is trained, wherein, feature pool model refers to provide the machine learning model of the prediction result on Machine Learning Problems based at least a portion feature among each feature；(C) effect of at least one feature pool model is obtained, and the importance of each feature is determined according to the effect of at least one feature pool model of acquisition, wherein, in step (B), by performing discretization computing at least one continuous feature among at least a portion feature come training characteristics pool model.By described method and system, the importance of each feature in machine learning sample can be effectively determined.

Description

Method and system for determining feature importance of machine learning sample

Technical Field

The present invention relates generally to the field of artificial intelligence, and more particularly, to a method and system for determining feature importance of a machine learning sample.

Background

With the advent of massive amounts of data, artificial intelligence techniques have evolved rapidly, and in order to extract value from the massive amounts of data, it is necessary to generate samples suitable for machine learning based on data records.

Here, each data record may be considered as a description of an event or object, corresponding to an example or sample. In a data record, various items are included that reflect the performance or nature of an event or object in some respect, and these items may be referred to as "attributes".

In practice, the predictive effect of a machine learning model is related to the selection of the model, the extraction of available data and features, etc. How to extract the features of the machine learning samples from the various attributes of the raw data records will have a great influence on the effect of the machine learning model. Accordingly, it is highly desirable to know the importance of various features of a machine learning sample, both from a model training and model understanding perspective. For example, the expected splitting gain for each feature may be calculated and then the feature importance may be calculated based on a tree model trained based on XGBoost. Although the above method can consider the interaction between features, the training cost is high, and the influence of different parameters on the importance of the features is large.

In fact, the importance of features is difficult to determine intuitively, technicians are often required to not only master machine learning knowledge, but also to understand actual prediction problems deeply, and the prediction problems are often combined with different practical experiences of different industries, so that satisfactory effects are difficult to achieve.

Disclosure of Invention

Exemplary embodiments of the present invention aim to overcome the deficiencies of the prior art in which it is difficult to efficiently determine the importance of various features of a machine-learned sample.

According to an exemplary embodiment of the invention, there is provided a method of determining importance of individual features of a machine learning sample, comprising: (A) obtaining a historical data record, wherein the historical data record comprises marks about machine learning problems and at least one piece of attribute information of each feature used for generating machine learning samples; (B) training at least one characteristic pool model by using the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the characteristics; (C) acquiring an effect of the at least one feature pool model, and determining the importance of each feature according to the acquired effect of the at least one feature pool model, wherein in the step (B), the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least one part of features.

Optionally, in the method, in step (C), the importance of the corresponding feature based on the feature pool model is determined according to a difference between effects of the feature pool model on an original test data set and a transformed test data set, where the transformed test data set refers to a data set obtained by replacing a value of a target feature whose importance is to be determined in the original test data set with one of: zero values, random values, values obtained by scrambling the order of the original values of the target features.

Optionally, in the method, the at least one feature pool model includes an all-feature model, where the all-feature model refers to a machine learning model that provides a prediction result about a machine learning problem based on all of the features among the individual features.

Optionally, in the method, the at least one feature pool model comprises a plurality of machine learning models that provide a prediction result about the machine learning problem based on different feature groups, wherein in step (C), the importance of the respective features is determined according to a difference between effects of the at least one feature pool model on the original test data set.

Optionally, in the method, the at least one feature pool model includes one or more main feature pool models and at least one sub-feature pool model respectively corresponding to each main feature pool model, where a sub-feature pool model refers to a machine learning model that provides a prediction result about a machine learning problem based on remaining features except for a target feature whose importance is to be determined among features based on which the corresponding main feature pool model is based, and in step (C), the importance of the corresponding target feature is determined according to a difference between effects of the main feature pool model and the respective sub-feature pool models corresponding thereto on the original test data set.

Optionally, in the method, the at least one feature pool model includes a plurality of single-feature models, wherein a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature whose importance is to be determined among the respective features, wherein in step (C), the importance of the corresponding target feature is determined according to a difference between effects of the single-feature models on the original test data set.

Optionally, in the method, the discretization operation comprises a basic binning operation and at least one additional operation.

Optionally, in the method, the at least one additional operation comprises at least one operation among the following kinds of operations: logarithm operation, exponential operation, absolute value operation, gaussian transformation operation.

Optionally, in the method, the at least one additional operation comprises an additional binning operation binning in the same manner as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation comprises an additional binning operation in a different binning manner than the basic binning operation.

Optionally, in the method, the basic binning operation and the additional binning operation correspond to equal-width binning operations of different widths or equal-depth binning of different depths, respectively.

Optionally, in the method, the different widths or different depths numerically constitute an geometric series or an arithmetic series.

Optionally, in the method, the step of performing a basic binning operation and/or an additional binning operation comprises: an outlier bin is additionally provided such that consecutive features having outliers are sorted to the outlier bin.

Optionally, in the method, in step (B), the feature pool model is trained based on a logistic regression (logarithmically regression) algorithm.

Optionally, in the method, the effect of the feature pool model comprises AUC of the feature pool model.

Optionally, in the method, the raw test data set is composed of acquired historical data records, wherein in step (B), the acquired historical data records are divided into a plurality of sets of historical data records to train respective feature pool models step by step, and step (B) further includes: and performing prediction on the next group of historical data records by using the feature pool model trained by the current group of historical data records to obtain grouped AUCs corresponding to the next group of historical data records, and synthesizing the grouped AUCs to obtain the AUC of the feature pool model, wherein after the grouped AUCs corresponding to the next group of historical data records are obtained, the feature pool model trained by the current group of historical data records is continuously trained by using the next group of historical data records.

Optionally, in the method, in step (B), when the next set of history data records includes a missing history data record lacking attribute information for at least a part of features on which the feature pool model is generated, when prediction is performed for the next set of history data records using the feature pool model trained with the current set of history data records, a group AUC corresponding to the next set of history data records is obtained based on one of: calculating a group AUC using only the predicted results of history data records other than the missing history data record in the next set of history data records; calculating a group AUC using the prediction results of all history data records of the next set of history data records, wherein the prediction result of the missing history data record is set as a default value, and the default value is determined based on the value range of the prediction result or based on the acquired marker distribution of the history data record; and multiplying the AUC calculated by the prediction results of other historical data records except the missing historical data record in the next group of historical data records by the proportion of the other historical data records in the next group of historical data records to obtain the grouped AUC.

Optionally, in the method, in step (B), when the feature pool model is trained based on a log-probability regression algorithm, the regularization term set for the continuous features is different from the regularization term set for the discontinuous features.

Optionally, in the method, step (B) further comprises: providing an interface to a user for configuring at least one of the following items of the feature pool model: at least a part of the features based on which the feature pool model is based, the algorithm type of the feature pool model, the algorithm parameters of the feature pool model, the operation type of the discretization operation, and the operation parameters of the discretization operation, and in the step (B), the feature pool model is trained according to the items configured by the user through the interface.

Optionally, in the method, in step (B), the interface is provided to the user in response to an indication by the user regarding the determination of feature importance.

Optionally, the method further comprises: (D) the determined importance of the individual features is graphically presented to the user.

Optionally, in the method, in the step (D), the respective features are presented in order of importance of the features, and/or a part of the features among the respective features is highlighted, wherein the part of the features includes an important feature corresponding to a high importance, an unimportant feature corresponding to a low importance, and/or an abnormal feature corresponding to an abnormal importance.

According to another exemplary embodiment of the invention, a system for determining importance of individual features of a machine learning sample is provided, comprising: data record acquisition means for acquiring a history data record, wherein the history data record includes a label about a machine learning problem and at least one attribute information for each feature used to generate a machine learning sample; the model training device is used for training at least one characteristic pool model by utilizing the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the various characteristics; and importance determination means for acquiring an effect of the at least one feature pool model, and determining importance of the respective features based on the acquired effect of the at least one feature pool model, wherein the model training means trains the feature pool model by performing a discretization operation on at least one continuous feature among the at least one part of features.

Optionally, in the system, the importance determination device determines the importance of the corresponding feature based on the feature pool model according to a difference between effects of the feature pool model on an original test data set and a transformed test data set, where the transformed test data set refers to a data set obtained by replacing a value of a target feature whose importance is to be determined in the original test data set with one of: zero values, random values, values obtained by scrambling the order of the original values of the target features.

Optionally, in the system, the at least one feature pool model includes an all-features model, wherein the all-features model is a machine learning model that provides a prediction result about a machine learning problem based on all of the features among the respective features.

Optionally, in the system, the at least one feature pool model comprises a plurality of machine learning models that provide a prediction result about a machine learning problem based on different feature groups, wherein the importance determination means determines the importance of the respective features according to a difference between effects of the at least one feature pool model on the original test data set.

Optionally, in the system, the at least one feature pool model includes one or more main feature pool models and at least one sub-feature pool model respectively corresponding to each main feature pool model, wherein a sub-feature pool model refers to a machine learning model that provides a prediction result about a machine learning problem based on remaining features except for a target feature whose importance is to be determined among features on which the corresponding main feature pool model is based, and wherein the importance determination means determines the importance of the corresponding target feature according to a difference between effects of the main feature pool model and the respective sub-feature pool models corresponding thereto on the original test data set.

Optionally, in the system, the at least one feature pool model includes a plurality of single-feature models, wherein a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature whose importance is to be determined among the respective features, wherein the importance determination means determines the importance of the corresponding target feature according to a difference between effects of the single-feature models on the original test data set.

Optionally, in the system, the discretization operation comprises a basic binning operation and at least one additional operation.

Optionally, in the system, the at least one additional operation comprises at least one operation among the following classes of operations: logarithm operation, exponential operation, absolute value operation, gaussian transformation operation.

Optionally, in the system, the at least one additional operation comprises an additional binning operation that is binned in the same manner as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation comprises an additional binning operation in a different binning manner than the basic binning operation.

Optionally, in the system, the basic binning operation and the additional binning operation correspond to equal-width binning operations of different widths or equal-depth binning of different depths, respectively.

Optionally, in the system, the different widths or different depths numerically constitute an geometric series or an arithmetic series.

Optionally, in the system, the step of performing a basic binning operation and/or an additional binning operation comprises: an outlier bin is additionally provided such that consecutive features having outliers are sorted to the outlier bin.

Optionally, in the system, the model training means trains the feature pool model based on a log-probability regression algorithm.

Optionally, in the system, the effect of the feature pool model comprises an AUC of the feature pool model.

Optionally, in the system, the original test data set is composed of acquired historical data records, wherein the model training device divides the acquired historical data records into a plurality of groups of historical data records to train each feature pool model step by step, and the model training device further performs prediction on a next group of historical data records using the feature pool model trained on the current group of historical data records to obtain a group AUC corresponding to the next group of historical data records, and synthesizes each group AUC to obtain an AUC of the feature pool model, wherein after obtaining the group AUC corresponding to the next group of historical data records, the feature pool model trained on the current group of historical data records continues to be trained using the next group of historical data records.

Optionally, in the system, when the next set of history data records includes a missing history data record lacking attribute information for at least a part of features on which the feature pool model is generated, the model training means obtains a group AUC corresponding to the next set of history data records based on one of: calculating a group AUC using only the predicted results of history data records other than the missing history data record in the next set of history data records; calculating a group AUC using the prediction results of all history data records of the next set of history data records, wherein the prediction result of the missing history data record is set as a default value, and the default value is determined based on the value range of the prediction result or based on the acquired marker distribution of the history data record; and multiplying the AUC calculated by the prediction results of other historical data records except the missing historical data record in the next group of historical data records by the proportion of the other historical data records in the next group of historical data records to obtain the grouped AUC.

Optionally, in the system, when the model training device trains the feature pool model based on a log-probability regression algorithm, the regular term set for the continuous features is different from the regular term set for the discontinuous features.

Optionally, the system further comprises: a display device, wherein the model training device further controls the display device to provide an interface for a user to configure at least one item among the following items of the feature pool model: the model training device is configured to train the feature pool models based on at least a part of features of the feature pool models, algorithm types of the feature pool models, algorithm parameters of the feature pool models, operation types of discretization operations, and operation parameters of discretization operations.

Optionally, in the system, the model training means controls the display means to provide the interface to the user in response to an indication by the user of the importance of the determined feature.

Optionally, in the system, the display means also graphically presents the determined importance of the respective feature to the user.

Optionally, in the system, the display means presents the respective features in order of importance of the features, and/or highlights a part of the features among the respective features, wherein the part of the features includes an important feature corresponding to a high importance, an unimportant feature corresponding to a low importance, and/or an abnormal feature corresponding to an abnormal importance.

According to another exemplary embodiment of the present invention, a computing apparatus for determining importance of respective features of a machine learning sample is provided, comprising a storage component having stored therein a set of computer-executable instructions which, when executed by the processor, perform the steps of: (A) obtaining a historical data record, wherein the historical data record comprises marks about machine learning problems and at least one piece of attribute information of each feature used for generating machine learning samples; (B) training at least one characteristic pool model by using the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the characteristics; (C) acquiring an effect of the at least one feature pool model, and determining the importance of each feature according to the acquired effect of the at least one feature pool model, wherein in the step (B), the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least one part of features.

Optionally, in the computing apparatus, in step (C), the importance of the corresponding feature based on the feature pool model is determined according to a difference between effects of the feature pool model on an original test data set and a transformed test data set, where the transformed test data set refers to a data set obtained by replacing a value of a target feature whose importance is to be determined in the original test data set with one of: zero values, random values, values obtained by scrambling the order of the original values of the target features.

Optionally, in the computing apparatus, the at least one feature pool model includes an all-features model, where the all-features model refers to a machine learning model that provides a prediction result about a machine learning problem based on all of the features among the respective features.

Optionally, in the computing device, the at least one feature pool model comprises a plurality of machine learning models that provide a prediction result about a machine learning problem based on different feature groups, wherein in step (C), the importance of the respective features is determined according to a difference between effects of the at least one feature pool model on the original test data set.

Optionally, in the computing apparatus, the at least one feature pool model includes one or more main feature pool models and at least one sub-feature pool model respectively corresponding to each main feature pool model, where a sub-feature pool model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on remaining features except for a target feature whose importance is to be determined among features based on which the corresponding main feature pool model is based, and in step (C), the importance of the corresponding target feature is determined according to a difference between effects of the main feature pool model and the respective sub-feature pool models corresponding thereto on an original test data set.

Optionally, in the computing apparatus, the at least one feature pool model includes a plurality of single-feature models, wherein a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature whose importance is to be determined among the respective features, wherein in step (C), the importance of the corresponding target feature is determined according to a difference between effects of the single-feature models on the original test data set.

Optionally, in the computing device, the discretization operation comprises a basic binning operation and at least one additional operation.

Optionally, in the computing device, the at least one additional operation comprises at least one operation among the following kinds of operations: logarithm operation, exponential operation, absolute value operation, gaussian transformation operation.

Optionally, in the computing device, the at least one additional operation comprises an additional binning operation in the same manner as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation comprises an additional binning operation in a different binning manner than the basic binning operation.

Optionally, in the computing device, the basic binning operation and the additional binning operation correspond to equal-width binning operations of different widths or equal-depth binning of different depths, respectively.

Optionally, in the computing device, the different widths or different depths numerically constitute an geometric series or an arithmetic series.

Optionally, in the computing device, the step of performing a basic binning operation and/or an additional binning operation comprises: an outlier bin is additionally provided such that consecutive features having outliers are sorted to the outlier bin.

Optionally, in the computing device, in step (B), the feature pool model is trained based on a log-probability regression algorithm.

Optionally, in the computing device, the effect of the feature pool model comprises an AUC of the feature pool model.

Optionally, in the computing device, the raw test data set is composed of acquired historical data records, wherein in step (B), the acquired historical data records are divided into a plurality of sets of historical data records to train respective feature pool models step by step, and step (B) further includes: and performing prediction on the next group of historical data records by using the feature pool model trained by the current group of historical data records to obtain grouped AUCs corresponding to the next group of historical data records, and synthesizing the grouped AUCs to obtain the AUC of the feature pool model, wherein after the grouped AUCs corresponding to the next group of historical data records are obtained, the feature pool model trained by the current group of historical data records is continuously trained by using the next group of historical data records.

Optionally, in the computing apparatus, in the step (B), when the prediction is performed for the next set of history data records using the feature pool model trained on the current set of history data records, and when the next set of history data records includes a missing history data record lacking attribute information for at least a part of features on which the feature pool model is generated, a group AUC corresponding to the next set of history data records is obtained based on one of: calculating a group AUC using only the predicted results of history data records other than the missing history data record in the next set of history data records; calculating a group AUC using the prediction results of all history data records of the next set of history data records, wherein the prediction result of the missing history data record is set as a default value, and the default value is determined based on the value range of the prediction result or based on the acquired marker distribution of the history data record; and multiplying the AUC calculated by the prediction results of other historical data records except the missing historical data record in the next group of historical data records by the proportion of the other historical data records in the next group of historical data records to obtain the grouped AUC.

Optionally, in the computing apparatus, in the step (B), when the feature pool model is trained based on a log-probability regression algorithm, the regularization term set for the continuous features is different from the regularization term set for the discontinuous features.

Optionally, in the computing device, step (B) further comprises: providing an interface to a user for configuring at least one of the following items of the feature pool model: at least a part of the features based on which the feature pool model is based, the algorithm type of the feature pool model, the algorithm parameters of the feature pool model, the operation type of the discretization operation, and the operation parameters of the discretization operation, and in the step (B), the feature pool model is trained according to the items configured by the user through the interface.

Optionally, in the computing device, in step (B), the interface is provided to the user in response to an indication by the user regarding the determination of feature importance.

Optionally, in the computing device, when the set of computer-executable instructions is executed by the processor, the following steps are further performed: (D) the determined importance of the individual features is graphically presented to the user.

Optionally, in the computing device, in the step (D), the respective features are presented in order of importance of the features, and/or a part of the features among the respective features is highlighted, wherein the part of the features includes an important feature corresponding to a high importance, an unimportant feature corresponding to a low importance, and/or an abnormal feature corresponding to an abnormal importance.

In the method and system for determining the feature importance of the machine learning sample according to the exemplary embodiment of the present invention, the importance of each feature is determined accordingly by using the effect of the feature pool model based on at least a part of the features of the machine learning sample, wherein continuous features in the at least a part of the features need to be discretized when the feature pool model is trained, so that the importance of the relevant features can be effectively reflected by the effect of the feature pool model, and the importance of each feature can be effectively obtained.

Drawings

These and/or other aspects and advantages of the present invention will become more apparent and more readily appreciated from the following detailed description of the embodiments of the invention, taken in conjunction with the accompanying drawings of which:

FIG. 1 illustrates a block diagram of a system for determining feature importance of machine learning samples according to an exemplary embodiment of the present invention;

FIG. 2 illustrates a flow diagram of a method of determining feature importance of machine learning samples according to an exemplary embodiment of the present invention;

FIG. 3 shows a flowchart of a method of determining feature importance of machine learning samples according to another example embodiment of the present invention;

FIG. 4 illustrates an example of a feature importance presentation interface in accordance with an exemplary embodiment of the present invention; and

fig. 5 illustrates an example of a feature importance presentation interface according to another exemplary embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, exemplary embodiments thereof will be described in further detail below with reference to the accompanying drawings and detailed description.

In an exemplary embodiment of the present invention, the feature importance is determined by: and training the feature pool model based on at least one part of features of the machine learning samples, wherein continuous features need to be subjected to discretization processing. On the basis, the importance of each feature is measured based on the prediction effect of the feature pool model.

Here, machine learning is a necessary product of the development of artificial intelligence research to a certain stage, which is directed to improving the performance of the system itself by means of calculation, using experience. In a computer system, "experience" is usually in the form of "data" from which a "model" can be generated by a machine learning algorithm, i.e. by providing empirical data to a machine learning algorithm, a model can be generated based on these empirical data, which provides a corresponding judgment, i.e. a prediction, in the face of a new situation. Whether the machine learning model is trained or predicted using the trained machine learning model, the data needs to be converted into machine learning samples including various features. Machine learning may be implemented in the form of "supervised learning," "unsupervised learning," or "semi-supervised learning," it being noted that the present invention is not particularly limited to a particular machine learning algorithm. It should also be noted that other means such as statistical algorithms may also be incorporated during the training and application of the model.

Fig. 1 illustrates a block diagram of a system for determining feature importance of machine learning samples according to an exemplary embodiment of the present invention. Specifically, the feature importance determination system measures the importance of each corresponding feature by using the prediction effect of a feature pool model based on at least a part of the features, wherein at least a part of original continuous features based on the feature pool model are subjected to discretization processing. In this way, the importance of individual features (particularly consecutive features) can be determined more efficiently.

The system shown in fig. 1 may be implemented entirely by a computer program, as a software program, as a dedicated hardware device, or as a combination of software and hardware. Accordingly, each device constituting the system shown in fig. 1 may be a virtual module that realizes the corresponding function only by means of a computer program, may be a general-purpose or dedicated device that realizes the function by means of a hardware structure, or may be a processor or the like on which the corresponding computer program runs. With the system, the importance of various features of the machine learning samples can be determined, and the importance information is helpful for model training and/or model interpretation.

As shown in fig. 1, the data record acquisition apparatus 100 is configured to acquire a history data record, wherein the history data record includes a label about a machine learning problem and at least one attribute information of each feature used for generating a machine learning sample.

The history data record may be data generated online, data generated and stored in advance, or data received from an external device through an input device or a transmission medium, for example, data received by the cloud from the client or data received by the client from the cloud. Such data may relate to information about an individual, business, or organization, such as identity, academic calendar, occupation, assets, contact details, liabilities, income, profit, tax, and the like. Alternatively, the data may relate to information about business related items, such as transaction amount, transaction parties, subject matter, transaction location, etc. about the contract. It should be noted that the attribute information content mentioned in the exemplary embodiments of the present invention may relate to the performance or nature of any object or matter in some respect, and is not limited to defining or describing individuals, objects, organizations, units, organizations, items, events, and so forth.

The data record acquisition device 100 may acquire structured or unstructured data from different sources, such as text data or numerical data. The acquired historical data records may be used to form machine learning samples, participate in training and/or testing of machine learning models. Such data may originate from within an entity desiring to apply machine learning, e.g., from a bank, business, school, etc. desiring to apply machine learning; such data may also originate from other than the aforementioned entities, such as from data providers, the internet (e.g., social networking sites), mobile operators, APP operators, courier companies, credit agencies, and so forth. Optionally, the internal data and the external data can be used in combination to form a machine learning sample carrying more information, thereby facilitating the discovery of more important features.

The data may be input to the data record obtaining apparatus 100 through an input device, or automatically generated by the data record obtaining apparatus 100 according to the existing data, or may be obtained by the data record obtaining apparatus 100 from a network (e.g., a storage medium (e.g., a data warehouse) on the network), and furthermore, an intermediate data exchange device such as a server may facilitate the data record obtaining apparatus 100 to obtain the corresponding data from an external data source. Here, the acquired data may be converted into a format that is easy to handle by a data conversion module such as a text analysis module in the data record acquisition apparatus 100. That is, the data record acquisition device 100 may be a device having the capability of receiving and processing data records, or may simply be a device that provides data records that are already prepared. It should be noted that the data record acquisition apparatus 100 may be configured as various modules composed of software, hardware, and/or firmware, and some or all of these modules may be integrated or cooperate together to accomplish a specific function.

The model training apparatus 200 is configured to train at least one feature pool model using the acquired historical data record, where the feature pool model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on at least a part of the features, and the model training apparatus 200 trains the feature pool model by performing a discretization operation on at least one continuous feature of the at least a part of the features.

Here, the feature pool model is designed based on at least a part of the features of the machine learning sample, and accordingly, the model training apparatus 200 may generate a training sample of the feature pool model based on the history data record. Specifically, assume that the history data record has attribute information { p₁,p₂,…,p_mBased on these attribute information and labels, machine learning samples corresponding to the machine learning problem can be generated, which will be applied to model training and/or testing for the machine learning problem. In particular, the characteristic part of the machine learning sample can be expressed as f₁,f₂,…,f_nWhere n is a positive integer, while exemplary embodiments of the present invention are directed to determining a characteristic portion f₁,f₂,…,f_nThe degree of importance of each feature in (1). To this end, the model training apparatus 200 needs to train a feature pool model that provides a prediction result regarding a machine learning problem based on at least a portion of the features, where the model training apparatus 200 may select { f } f₁,f₂,…,f_nSelecting at least a part of the features as the features of the training samples of the feature pool model, and using the marks of the corresponding historical data records as the marks of the training samples. According to an exemplary embodiment of the present invention, some or all of the continuous features of the selected at least one part of the features are subjected to discretization processing. Here, the model training apparatus 200 may train one or more feature pool models, wherein the importance of the corresponding feature may be comprehensively obtained based on the difference of the prediction effects of the same feature pool model (which may be based on all or a part of the features of the machine learning sample) on the original test data set and the transformed test data set, wherein the transformed test data set is obtained by transforming the values of some target features in the original test data set, so that the difference of the prediction effects may reflect the prediction effect, i.e., the importance, of the target feature; alternatively, the importance of the corresponding features may be derived synthetically based on the difference in the predicted effect of the different feature pool models on the same test data set (i.e., the original test data set), where the different feature pool models may be designed based on different combinations of features, such that the difference in the predicted effect reflects the respective predicted effect, i.e., importance, of the different features; in particular, a single-feature model can be trained for each feature of the machine learning sample, and accordingly, the prediction effect of the single-feature model can represent the importance of the feature on which the single-feature model is based. It should be noted that the above two ways of measuring feature importance can be used alone or in combination.

As described above, according to the exemplary embodiment of the present invention, when training the feature pool model, the model training apparatus 200 may train the feature pool model by performing a discretization operation on at least one continuous feature, where the model training apparatus 200 may process the continuous feature in any suitable discretization manner, so that the feature pool model trained based on the discretized continuous feature (or along with other features) can better reflect the importance of each feature.

Here, as an example, the discretization operation may include a basic binning operation and at least one additional operation, and accordingly, the model training apparatus 200 may perform the basic binning operation and the at least one additional operation for each of some continuous features according to which the feature pool model is based when training the feature pool model, to generate a basic binning feature and at least one additional feature corresponding to each continuous feature.

Here, among the features of the machine learning sample, there may be a continuous feature generated based on at least a part of the attribute information of the data record, where the continuous feature is a feature as opposed to a discrete feature (e.g., a category feature), and a value thereof may be a numerical value having a certain continuity, for example, a distance, an age, an amount, and the like. In contrast, as an example, the values of the discrete features do not have continuity, and may be the features of unordered classification such as "from beijing", "from shanghai", or "from tianjin", "sex is male", and "sex is female", for example.

For example, some continuous value attribute in the history data record can be directly used as a corresponding continuous feature in the machine learning sample, for example, the attributes of distance, age, amount, etc. can be directly used as the corresponding continuous feature. In addition, certain attributes (e.g., continuous attributes and/or discrete attributes) in the history data record can be processed to obtain corresponding continuous features, for example, the ratio of height to weight is used as the corresponding continuous features.

It should be noted that in addition to the continuous features that will be subjected to the basic binning operation and the additional operation, the training samples of the feature pool model may also include other continuous features and/or discrete features included from the machine learning samples, wherein the other continuous features may participate in the training of the feature pool model without undergoing the discretization operation.

It can be seen that according to an exemplary embodiment of the present invention, for each successive feature to be subjected to the basic binning operation, at least one additional operation may additionally be performed, thereby enabling to obtain a plurality of features characterizing certain properties of the original data record from different angles, scales/layers simultaneously.

Here, the binning operation refers to a specific method of discretizing a continuous feature, that is, dividing a value range of the continuous feature into a plurality of sections (i.e., a plurality of bins), and determining a corresponding binning feature value based on the divided bins. Binning operations can be broadly divided into supervised binning and unsupervised binning, with each of these two types including some specific binning modes, e.g., supervised binning including minimum entropy binning, minimum description length binning, etc., and unsupervised binning including equal width binning, equal depth binning, k-means cluster-based binning, etc. In each binning mode, corresponding binning parameters, such as width, depth, etc., may be set. It should be noted that, according to the exemplary embodiment of the present invention, the binning operation performed by the model training apparatus 200 is not limited to the kind of binning manner nor to the parameters of the binning operation, and the specific representation manner of the accordingly generated binning feature is also not limited.

In addition to performing the basic binning operation, the model training apparatus 200 may perform at least one additional operation on the continuous features, where the additional operation may be any functional operation that may generate continuous features or discrete features, for example, the additional operation may be a logarithmic operation, an exponential operation, an absolute value operation, or the like. In particular, the additional operation may also be a binning operation (referred to as an "additional binning operation"), where the additional binning operation differs from the basic binning operation in the binning mode and/or the binning parameters. It follows that the at least one additional operation may be an operation of the same or different kind of operation, each under the same or different operation parameters (e.g. exponent in exponential operation, base in logarithmic operation, depth in binning operation, width in binning operation, etc.), where the additional operation may be an expression operation with a main body of logarithmic operation, exponential operation, absolute value operation, etc., or may be a combination of a plurality of operations.

In this way, the model training apparatus 200 can convert each of at least a portion of the continuous features into the basic bin features and the corresponding at least one additional feature, thereby improving the effectiveness of the machine learning material for the feature pool model and providing a better basis for the subsequent feature importance determination.

Next, the model training apparatus 200 may generate training samples including at least the generated basic binned features and at least one additional feature for training the corresponding feature pool models. Here, in the training sample, any other feature may be included in addition to the basic binning feature and the additional feature generated by the model training device 200, wherein the other feature may be a feature belonging to a machine learning sample that should be generated based on a history data record.

The model training apparatus 200 may train the feature pool model based on the training samples described above. Here, the model training apparatus 200 may learn an appropriate feature pool model from the training samples using an appropriate machine learning algorithm (e.g., logarithmic probability regression).

The importance determination device 300 is configured to obtain an effect of the trained at least one feature pool model, and determine the importance of each feature according to the obtained effect of the at least one feature pool model. Here, the importance determination apparatus 300 may acquire the effect of the feature pool model by applying the trained feature pool model to the corresponding test data set, and may also receive the effect of the feature pool model from other parties connected thereto.

In particular, the performance of the feature pool model on the test set may serve as a predictive effect of the feature pool model, and this predictive effect may be used to measure the predictive power of the feature group on which the feature pool model is based. The importance of each feature of the machine learning sample can be comprehensively obtained by measuring the effect difference of different feature pool models on the original test data set or the effect difference of the same feature pool model on different test features.

Here, as an example, the effect of the feature pool model may include AUC (Area Under ROC (Receiver Operating Characteristic) Curve of the feature pool model).

For example, assume that the features upon which a certain feature pool model is based are the feature portions { f ] of the machine-learned samples₁,f₂,…,f_nThree features of { f }₁,f₃,f₅And, successive features f therein₁The training samples of the feature pool model are subjected to discretization, and accordingly AUC of the feature pool model on the test data set can reflect feature combination { f }₁,f₃,f₅The predictive power of. In addition, assume that there are two features on which another feature pool model is based as { f }₁,f₃H, likewise, consecutive features f₁After discretization, the AUC of the feature pool model on the test data set can reflect the feature combination { f₁,f₃The predictive power of. On the basis, the difference value between the two AUCs can be used for reflecting the characteristic f₅The importance of (c).

As another example, assume that the features upon which a certain feature pool model is based are the feature portions { f ] of the machine learning samples₁,f₂,…,f_nThree features of { f }₁,f₃,f₅And, successive features f therein₁The training samples of the feature pool model are subjected to discretization, and accordingly AUC of the feature pool model on an original test data set can reflect feature combination { f }₁,f₃,f₅The predictive power of. Here, to determine the target feature f₅By the features f in each test sample included in the original test data set₅Is subjected toProcessing to obtain a transformed test data set and, in turn, an AUC of the feature pool model over the transformed test data set. On the basis, the difference value between the two AUCs can be used for reflecting the target characteristic f₅The importance of (c). As an example, in the transformation process, the feature f in each original test sample may be transformed₅By replacing the value of (f) by a zero value, a random value, or by replacing the characteristic f₅The original values of (a) are scrambled in order to obtain values.

It should be understood that each of the above-described devices may be individually configured as software, hardware, firmware, or any combination thereof that performs the specified function. These means may correspond, for example, to an application-specific integrated circuit, to pure software code, or to a combination of software and hardware elements or modules. Further, one or more functions implemented by these apparatuses may also be collectively performed by components in a physical entity device (e.g., a processor, a client, a server, or the like).

A flow chart of a method of determining feature importance of machine learning samples according to an exemplary embodiment of the present invention is described below with reference to fig. 2. Here, the method shown in fig. 2 may be performed by the feature importance determination system shown in fig. 1, by way of example, may also be entirely implemented in software by a computer program, and may also be performed by a specifically configured computing device as shown in fig. 2. For convenience of description, it is assumed that the method shown in fig. 2 is performed by the feature importance determining system shown in fig. 1.

As shown, in step S100, a history data record including a label about a machine learning problem and at least one attribute information of each feature used to generate a machine learning sample is acquired by the data record acquisition apparatus 100.

Here, the history data record is a real record about the machine learning problem desired to be predicted, which includes both the attribute information and the label, such history data record will be used to form a machine learning sample as a material of machine learning, and the exemplary embodiment of the present invention is intended to determine the degree of importance of each feature in the formed machine learning sample.

Specifically, as an example, the data record obtaining apparatus 100 may collect the historical data in a manual, semi-automatic or fully automatic manner, or process the collected raw historical data so that the processed historical data record has a proper format or form. As an example, the data record acquisition apparatus 100 may collect the history data in a batch.

Here, the data record obtaining apparatus 100 may receive the history data record manually input by the user through an input device (e.g., a workstation). Further, the data record acquisition device 100 may systematically retrieve the historical data records from the data source in a fully automated manner, for example, by systematically requesting the data source and obtaining the requested historical data from the response via a timer mechanism implemented in software, firmware, hardware, or a combination thereof. The data sources may include one or more databases or other servers. The manner in which the data is obtained in a fully automated manner may be implemented via an internal network and/or an external network, which may include transmitting encrypted data over the internet. Where servers, databases, networks, etc. are configured to communicate with one another, data collection may be automated without human intervention, but it should be noted that certain user input operations may still exist in this manner. The semi-automatic mode is between the manual mode and the full-automatic mode. The semi-automatic mode differs from the fully automatic mode in that a trigger mechanism activated by the user replaces, for example, a timer mechanism. In this case, the request for extracting data is generated only in the case where a specific user input is received. Each time data is acquired, the captured historical data may preferably be stored in non-volatile memory. As an example, a data warehouse may be utilized to store raw data collected during acquisition as well as processed data.

The obtained historical data records may originate from the same or different data sources, that is, each historical data record may also be a concatenation of different historical data records. For example, in addition to obtaining information data records (which include attribute information fields of income, academic history, post, property status, etc.) filled by a customer when applying for opening a credit card to a bank, the data record obtaining device 100 may also obtain other data records of the customer at the bank, such as loan records, daily transaction data, etc., and these obtained data records may be spliced into a complete historical data record along with indicia as to whether the customer is a fraudulent customer. Furthermore, the data record acquisition device 100 may also acquire data originating from other private or public sources, such as data originating from a data provider, data originating from the internet (e.g., social networking sites), data originating from a mobile operator, data originating from an APP operator, data originating from an express company, data originating from a credit agency, and so forth.

Optionally, the data record acquiring apparatus 100 may store and/or process the acquired data by means of a hardware cluster (such as a Hadoop cluster, a Spark cluster, etc.), for example, store, sort, and perform other offline operations. In addition, the data record acquisition device 100 may perform online streaming processing on the acquired data.

As an example, a data conversion module such as a text analysis module may be included in the data record obtaining device 100, and accordingly, in step S100, the data record obtaining device 100 may convert unstructured data such as text into more easily usable structured data for further processing or reference later. Text-based data may include emails, documents, web pages, graphics, spreadsheets, call center logs, transaction reports, and the like.

Next, in step S200, at least one feature pool model is trained by the model training device 200 using the acquired historical data record, wherein the feature pool model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on at least a part of features among the respective features, and wherein the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least a part of features.

Here, the model training device 200 may perform any suitable discretization operation for the at least one continuous feature, respectively, and as an example, the model training device 200 may perform a basic binning operation and at least one additional operation to generate a basic binning feature and at least one additional feature corresponding to each continuous feature, respectively, and the generated basic binning feature and the at least one additional feature may constitute at least a part of the features of the training samples of the feature pool model as discretized features.

As described above, the continuous feature is used as a feature in the machine learning sample, which may be generated from at least a part of the attribute information of the history data record, for example, the continuously-valued attribute information of the distance, age, and amount of money of the history data record may be directly used as the continuous feature, or for example, the continuous feature may be obtained by further processing some attribute information of the history data record, for example, the ratio of height to weight may be used as the continuous feature.

After the continuous features are obtained, basic binning may be performed on the obtained continuous features by the model training apparatus 200, where the model training apparatus 200 may perform basic binning in various binning manners and/or binning parameters.

Taking the unsupervised equal-width binning as an example, assuming that the value interval of the continuous feature is [0,100], and the corresponding binning parameter (i.e., width) is 50, 2 bins can be sorted, in which case the continuous feature with a value of 61.5 corresponds to the 2 nd bin, and if the two bins are numbered 0 and 1, the bin corresponding to the continuous feature is numbered 1. Alternatively, assuming a bin width of 10, 10 bins may be separated, in which case a consecutive feature with a value of 61.5 corresponds to the 7 th bin, and if the ten bins are numbered 0 to 9, the consecutive feature corresponds to the bin numbered 6. Alternatively, assuming a bin width of 2, 50 bins may be separated, in which case a consecutive feature with a value of 61.5 corresponds to the 31 st bin, and if the fifty bins are numbered 0 to 49, the consecutive feature corresponds to the bin number of 30. By way of example, the box number of a specific continuous feature can be determined and the corresponding feature value can be obtained through an online calculation mode, a mapping table lookup mode is not needed, and therefore storage space overhead is saved.

After mapping the sequential features to multiple bins, the corresponding feature values may be any value that is custom defined. That is, a basic binning operation is performed to generate a multi-dimensional basic binning feature corresponding to consecutive features, where each dimension may indicate whether the corresponding bin is binned with the corresponding consecutive feature, for example, by "1" indicating that the consecutive feature is binned into the corresponding bin and "0" indicating that the consecutive feature is not binned into the corresponding bin, and accordingly, in the above example, assuming that 10 bins are binned, the basic binning feature may be a 10-dimensional feature, and a basic binning feature corresponding to a consecutive feature with a value of 61.5 may be represented as [0,0,0,0, 1,0,0,0, 0 ]. Alternatively, each dimension may indicate a feature value of the respective continuous feature classified in the corresponding box, and accordingly, in the above example, the basic classified feature corresponding to the continuous feature having a value of 61.5 may be represented as [0,0,0,0,0,0,61.5,0,0,0 ]; or, each dimension indicates an average value of the eigenvalues of all the continuous features classified in the corresponding bin; or, each dimension indicates a median of the eigenvalues of all the successive features classified in the corresponding bin; alternatively, each dimension indicates a boundary value of the feature values of all the consecutive features classified in the corresponding box, where the boundary value may be an upper boundary value or a lower boundary value.

In addition, the values of the basic box characteristics can be normalized so as to execute the operation. Suppose that the j value of the ith continuous feature to be discretized is x_ijThe bin splitting characteristic can be expressed as (BinID, x'_ij) Wherein BinID indicates the number of the box to which the continuous characteristic is divided, and the value range of the number is 0,1, … and B-1, wherein B is the total number of the boxes and x'_ijIs x_ijNormalized value of (2), above feature (BinID, x'_ij) The sum symbol representing a basic binning feature being BinID, the characteristic value of the corresponding dimension of the box is x'_ijAnd the characteristic values of the other dimensions are 0.

Wherein, x'_ijCan be represented by the following formula:

therein, max_iIs the maximum value of the ith successive feature, min_iIs the minimum of the ith consecutive feature, and,

wherein,is a rounded-down operation sign.

Taking the unsupervised equal-width binning as an example, assuming that the value interval of the continuous feature is [0,100], in the case of a binning width of 50, according to the above calculation formula, the continuous feature having a value of 61.5 may correspond to the basic binning feature (1,0.23), and in the case of a binning width of 10, according to the above calculation formula, the continuous feature having a value of 61.5 may correspond to the basic binning feature (6, 0.15).

Here, in order to obtain the above feature (BinID, x'_ij) In step S200, the model training device 200 may calculate the formula for each x_ijValue was subjected to BinID and x'_ijOr, the model training apparatus 200 may generate a mapping table about the value range of each BinID in advance, and obtain the binids corresponding to the consecutive features by looking up the data table.

Further, as an example, noise in the history data record may also be reduced by removing outliers in the continuous features before performing the basic binning operation. In this way, the effectiveness of using the binned features to determine feature importance can be further improved.

Specifically, an outlier bin may be additionally set such that consecutive features having outliers are sorted to the outlier bin. For example, for a continuous feature with a value interval of [0,1000], a certain number of samples may be selected for pre-binning, for example, equal width binning is performed with a bin width of 10, then the number of samples in each bin is recorded, and for bins with a smaller number of samples (e.g., less than a threshold value), they may be combined into at least one outlier bin. As an example, if the number of samples in the bins at both ends is small, the bins with less samples may be merged into an outlier bin while the remaining bins are kept, and assuming that the number of samples in the bins 0-10 is small, the bins 0-10 may be merged into an outlier bin, thereby uniformly dividing the continuous features having values of [0,100] into the outlier bins.

In addition to performing the basic binning operation described above, in step S200, the model training apparatus 200 further performs at least one additional operation different from the basic binning operation on the continuous features of the performed basic binning operation to obtain corresponding at least one additional feature.

Here, the additional operation may be any function operation that may have corresponding operation parameters, and the additional operation performed for a single continuous feature may be one or more operations that may be of different kinds or operations of the same kind but different operation parameters.

In particular, the additional operation may also indicate a binning operation, where, similar to the basic binning characteristics, the additional binning characteristics generated by the additional binning operation may also be multi-dimensional characteristics, where each dimension indicates whether a corresponding bin is binned into a respective continuous characteristic; or, each dimension indicates a feature value of a respective continuous feature classified in the corresponding bin; or, each dimension indicates an average value of the eigenvalues of all the continuous features classified in the corresponding bin; or, each dimension indicates a median of the eigenvalues of all the successive features classified in the corresponding bin; alternatively, each dimension indicates the boundary values of the eigenvalues of all the consecutive features that are sorted into the corresponding bin.

In particular, the at least one additional operation may comprise an additional binning operation in the same manner as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation may comprise an additional binning operation that is binned differently than the basic binning operation. The binning mode includes various binning modes under supervision binning and/or unsupervised binning. For example, supervised binning includes minimum entropy binning, minimum description length binning, and the like, while unsupervised binning includes equal width binning, equal depth binning, k-means cluster-based binning, and the like.

As an example, the basic binning operation and the additional binning operation may correspond to equal-width binning operations of different widths, respectively. That is to say, the basic binning operation and the additional binning operation use the same binning mode but different granularity of division, so that the generated basic binning characteristics and the additional binning characteristics can better depict the rules of the original historical data record, thereby being more beneficial to determining the importance of each characteristic. In particular, the different widths used for the basic binning operation and the additional binning operation may numerically form an equal ratio series, e.g., the basic binning operation may be equally wide binned by a width of a value of 2, and the additional binning operation may be equally wide binned by a width of a value of 4, a value of 8, a value of 16, etc. Alternatively, the different widths used for the basic binning operation and the additional binning operation may numerically form an arithmetic series, e.g., the basic binning operation may be equally wide binned by a width of a value of 2, and the additional binning operation may be equally wide binned by a width of a value of 4, a value of 6, a value of 8, etc.

As another example, the basic binning operation and the additional binning operation may correspond to equal-depth binning operations of different depths, respectively. That is to say, the basic binning operation and the additional binning operation use the same binning mode but different granularity of division, so that the generated basic binning characteristics and the additional binning characteristics can better depict the rules of the original historical data record, thereby being more beneficial to determining the importance of each characteristic. In particular, the different depths employed by the basic binning operation and the additional binning operation may numerically form an geometric series, e.g., the basic binning operation may be equally deep binned by a depth of 10, while the additional binning operation may be equally deep binned by a depth of 100, 1000, 10000, etc. Alternatively, the different depths used for the basic binning operation and the additional binning operation may numerically form an arithmetic series, e.g., the basic binning operation may be equally deep binned by a depth of 10, and the additional binning operation may be equally deep binned by a depth of 20, 30, 40, etc.

According to an exemplary embodiment of the invention, the additional operations may further comprise non-binning operations, e.g. the at least one additional operation comprises an operation of at least one of the following kinds of operations, each at the same or different operational parameters: logarithm operation, exponential operation, absolute value operation, gaussian transformation operation. It should be noted that the additional operation here is not limited by the kind of operation and the operation parameters, and may take any suitable form of operation, that is, the additional operation may have either a simple form such as a square operation or a complex operation expression, for example, the jth value x for the ith consecutive feature_ijAdditional operations may be performed thereon to obtain an additional feature x ″, as follows_ij：

x″_ij＝sign(x_ij)×log₂(1+|x_ij|), where sign is a sign function.

In addition to the basic binning features and the additional features described above, other features included in the training samples of the feature pool model may be generated, which may be obtained by the model training apparatus 200 by performing various feature processes such as direct extraction, discretization, field combination, extraction of partial field values, rounding, and the like on at least a part of the attribute information of the history data record.

Next, a training sample comprising the above-described feature along with a corresponding labeled feature pool model is generated by the model training apparatus 200. According to an exemplary embodiment of the present invention, the above-described processing may be performed in memory under a distributed parallel computing framework, where the distributed parallel computing framework may have distributed parameter servers.

Further, as an example, the generated training samples may be used directly in the training process of the feature pool model. In particular, the step of generating the training samples may be considered as part of the training process of the feature pool model, and accordingly, the training samples need not be explicitly saved to a hard disk, which may significantly increase the operating speed compared to conventional approaches.

Next, the feature pool model may be trained by the model training apparatus 200 based on the training samples. Here, the model training apparatus 200 may learn an appropriate feature pool model from the training samples using an appropriate machine learning algorithm (e.g., logarithmic probability regression). As an example, in the case that the training samples of the feature pool model include both continuous features and discontinuous features, different regular terms may be set for the continuous features and the discontinuous features, respectively, that is, the regular terms set for the continuous features are different from the regular terms set for the discontinuous features.

In the above example, a more stable and better predicted feature pool model can be trained, so as to effectively determine the importance of each feature based on the predicted effect of the feature pool model.

Specifically, in step S300, the effect of the trained at least one feature pool model is acquired by the importance determination means 300, and the importance of each feature of the machine learning sample is determined according to the acquired effect of the at least one feature pool model.

Here, the importance determination apparatus 300 may acquire the effect of the feature pool model by applying the trained feature pool model to the corresponding test data set, and may also receive the effect of the feature pool model from other parties connected thereto.

As an example, the significance determination apparatus 300 may determine the significance of the respective features on which the feature pool model is based according to a difference between effects of the feature pool model on an original test data set and a transformed test data set, wherein the transformed test data set refers to a data set obtained by replacing values of target features of which significance is to be determined in the original test data set with one of: zero values, random values, values obtained by scrambling the order of the original values of the target features.

Here, each feature pool model may be based on at least one feature of the machine learning sample, and accordingly, a predictive effect of the feature pool model on the original test data set may be obtained. In addition, the prediction effect of the feature pool model on the transformed test data set can be obtained by transforming the values of the target features on the original test data set. The difference between the two predicted effects can be used to measure the importance of the target feature.

As an example, the at least one feature pool model may include an all-feature model, wherein the all-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on all features among the respective features of the machine learning samples, and specifically, it is assumed that the model training apparatus 200 trains an all-feature model, which is trained based on all features { f ] of the machine learning samples, in step S200₁,f₂,…,f_nTo give a prediction result about the machine learning problem. The significance determination apparatus 300 may obtain the predicted effect (e.g., AUC) of the entire feature model on the original test data set_all) The raw test data set herein may result in additional historical data records being obtained by data record acquisition device 100.

In this example, to determine { f }₁,f₂,…,f_nAny target feature f in_iOf interest (where 1 ≦ i ≦ n), the original test data set may be processed accordingly to obtain the target feature f_iTransforming the test data set, e.g. from individual test samples of the original test data setCharacteristic f of_iBy replacing the value of (a) with another value, e.g. a zero value, a random value, or by replacing the characteristic f_iThe values of (a) are obtained after the order is broken between the respective test samples. Accordingly, the significance determination apparatus 300 can obtain the test effect (e.g., AUC) of the above-described overall feature model on the transformation test data set_i)。

After obtaining the effects of all feature models on the original test data set and the transformed test data set, respectively, the significance determination apparatus 300 may determine the difference between the two effects (i.e., AUC)_all-AUC_i) As a measure of the target characteristic f_iReference to the importance of.

The above shows an example of determining the importance of the individual features on which the original test data set depends by means of the same feature pool model by transforming it. However, the exemplary embodiments of the present invention are not limited thereto, and the number of feature pool models and the feature groups on which each feature pool model is based may be designed in any suitable manner as long as the predicted effect of the feature pool models can infer the importance of each feature.

For example, the at least one feature pool model trained by the model training apparatus 200 in step S200 may include a plurality of machine learning models that provide prediction results regarding machine learning problems based on different feature groups, and accordingly, in step S300, the importance determining apparatus 300 may determine the importance of the respective features according to the difference between the effects of the at least one feature pool model on the original test data set.

Here, the at least one feature pool model includes one or more main feature pool models and at least one sub-feature pool model respectively corresponding to each main feature pool model, wherein a sub-feature pool model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on remaining features except for a target feature whose importance is to be determined among features on which the corresponding main feature pool model is based, and accordingly, the importance determination apparatus 300 may determine the importance of the corresponding target feature according to a difference between effects of the main feature pool model and the respective sub-feature pool models corresponding thereto on the original test data set.

As an example, the at least one feature pool model may include one overall feature model as a main feature pool model and at least one corresponding sub-feature pool model, wherein the overall feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on all features of the machine learning sample, and correspondingly, the sub-feature pool model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on the remaining features except for a target feature whose importance is to be determined among the overall features, and accordingly, in step S300, the importance determination apparatus 300 may determine the importance of the corresponding target feature according to a difference between the overall feature model and the effect of each sub-feature pool model on the original test data set.

Specifically, it is assumed that the model training apparatus 200 trains an all-feature model, which is trained based on all features { f ] of the machine learning samples, in step S200₁,f₂,…,f_nTo give a prediction result about the machine learning problem. The significance determination apparatus 300 may obtain the predicted effect (e.g., AUC) of the entire feature model on the original test data set_all) The raw test data set herein may result in additional historical data records being obtained by data record acquisition device 100.

In this example, to determine { f }₁,f₂,…,f_nAny target feature f in_iOf interest (where 1 ≦ i ≦ n), a corresponding sub-feature pool model trained based on a model other than the target feature f may additionally be determined in step S200_iOther characteristics of { f }₁,f₂,…,f_i-1,f_i+1,…,f_nTo give a prediction result about the machine learning problem. Accordingly, the importance determination apparatus 300 may acquire the sub-feature pool modelPredictive effects (e.g., AUC) on the original test data set_i)。

After separately acquiring the effects of the entire feature model and each sub-feature pool model on the original test data set, the significance determination apparatus 300 may determine the difference between the two effects (i.e., AUC)_all-AUC_i) As a measure of the characteristic f_iReference to the importance of.

Here, it should be noted that all the feature models described above are only examples, and are not intended to limit the scope of the exemplary embodiments of the present invention. In fact, in the feature pool model, there may be a plurality of main feature pool models each having a respective sub-feature pool model, that is, each main feature pool model may be based on at least a portion of the features of the machine learning samples, where there may or may not be common features involved between different main feature pool models.

Further, as an alternative, the at least one feature pool model trained by the model training apparatus 200 in step S200 may include a plurality of single-feature models, wherein a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature whose importance is to be determined among respective features of the machine learning sample, and accordingly, in step S300, the importance determining apparatus 300 may determine the importance of the respective target feature according to a difference between effects of the respective single-feature models on the original test data set.

Specifically, it is assumed that the model training apparatus 200 trains a plurality of single-feature models each trained to a certain feature { f ] based on a machine learning sample in step S200_iTo give a prediction result about the machine learning problem. Here, the number of single feature models may be the same as the number of features of the machine learning sample. Accordingly, the significance determination apparatus 300 may obtain the predicted effect (e.g., AUC) of each single-feature model on the same test data set (e.g., original test data set)_i). Here, since the discretization process has been performed for the continuous feature (it is preferable)Basic binning operation and additional operation may be performed), it may be ensured that the single feature model can relatively stably reflect the prediction capability of each feature, and accordingly, after the effects of all the single feature models on the same test data set are respectively obtained, the importance determination apparatus 300 may obtain the relative importance degree between the corresponding features based on the difference between the respective effects.

The method of determining feature importance according to the exemplary embodiment of the present invention is illustrated above with reference to fig. 2, however, it should be understood that the method illustrated in fig. 2 is not intended to limit the specific implementation manner of the exemplary embodiment of the present invention, but merely provides an exemplary description of the basic concept of the exemplary embodiment of the present invention, and in fact, a person skilled in the art may implement the exemplary embodiment of the present invention by modifying and/or embodying the scheme illustrated in fig. 2 in any appropriate manner. For example, the steps in the flowchart shown in fig. 2 are not limited in any way in terms of timing, for example, steps S200 and S300 need not be limited to be performed in a strict order, and alternatively, a part of the model test operation may be performed during the training of the feature pool model to determine the effect of the feature pool model.

Specifically, as described above, according to an exemplary embodiment of the present invention, in step S200, the trained at least one feature pool model may include a plurality of machine learning models that provide a prediction result regarding a machine learning problem based on different feature groups, and, in step S300, the importance of the respective features may be determined according to a difference between effects of the at least one feature pool model on the original test data set.

Here, the original test data set may be composed of acquired history data records, and accordingly, in step S200, the acquired history data records are divided into a plurality of sets of history data records to train respective feature pool models step by step, and step S200 further includes: and performing prediction on the next group of historical data records by using the feature pool model trained by the current group of historical data records to obtain grouped AUCs corresponding to the next group of historical data records, and synthesizing the grouped AUCs to obtain the AUC of the feature pool model, wherein after the grouped AUCs corresponding to the next group of historical data records are obtained, the feature pool model trained by the current group of historical data records can be continuously trained by using the next group of historical data records.

Fig. 3 shows a flowchart of a method of determining feature importance of machine learning samples according to another exemplary embodiment of the present invention. Also, for convenience of description, it is assumed that the method shown in fig. 3 is performed by the feature importance determining system shown in fig. 1. Also, as an example, the feature pool model herein may be a machine learning model based on a log probability regression algorithm, and the effect of the feature pool model may be represented by AUC.

Referring to fig. 3, in step S100, a history data record including a label on a machine learning problem and at least one attribute information of each feature used to generate a machine learning sample is acquired by the data record acquisition apparatus 100. Here, for the sake of brevity, various details of the data record acquisition apparatus 100 acquiring the history data record will not be described again.

Next, in step S210, the acquired historical data records are divided into a plurality of sets of historical data records by the model training apparatus 200, and the divided sets of historical data records are used to gradually train the feature pool models in batches. Alternatively, the training process may be performed on-line, in which case the training samples for the feature pool model need not be explicitly saved to a hard disk.

In step S220, a kth set of history data records is obtained as a next set of history data records by the model training apparatus 200, where k is a positive integer. According to an exemplary embodiment of the present invention, since each feature pool model is stepwise trained in batches using a plurality of sets of history data records, it can be understood that: prior to obtaining the kth set of historical data records, the feature pool models have been trained periodically based on the previous k-1 batches of historical data records, whereThe specific feature pool model therein is expressed as LR_k-1。

In step S230, the model training apparatus 200 respectively obtains the corresponding group AUC obtained by testing the kth group of historical data records of the trained feature pool model or feature pool models. Using the above-mentioned specific characteristic pool model LR_k-1For example, the feature pool model LR is used by the model training apparatus 200_k-1To perform prediction for the kth set of historical data records to derive a grouped AUC corresponding to the kth set of historical data records, i.e., AUC_k. Specifically, in order to use the kth group of historical data records as the test data set, a test sample is generated based on each historical data record in the kth group of historical data records, wherein the feature portion of the test sample is consistent with the feature portion of the training sample of the feature pool model, that is, the model training apparatus 200 may obtain the feature portion of the test sample according to the feature engineering process similar to the training sample, and simultaneously discard the labels of the historical data records, thereby obtaining the test sample of the feature pool model. Then, the model training apparatus 200 inputs the obtained test sample into the feature pool model to obtain a corresponding prediction result. Based on these prediction results, the model training apparatus 200 may obtain the feature pool model LR_k-1Grouped AUC for kth group of historical data records_k. In a similar manner, the model training apparatus 200 can acquire the grouped AUCs of all the previously trained feature pool models for the kth set of historical data records and save the grouped AUCs.

In practice, some historical data records may lack certain attribute information related to the features of the feature pool model, and in this case, the model training apparatus 200 may take corresponding countermeasures in order to better obtain the AUC of the feature pool model.

Specifically, when prediction is performed for the next set of historical data records using the feature pool model trained on the current set of historical data records, when the next set of historical data records includes a missing historical data record lacking attribute information for at least a portion of the features on which the feature pool model is generated, the model training apparatus 200 may derive a packet AUC corresponding to the next set of historical data records based on one of:

in the first case: the model training apparatus 200 may calculate the group AUC using only the predicted results of the other history data records in the next set of history data records except the missing history data record. Specifically, assume that the kth group of history data records includes 1000 history data records in total, wherein only 100 history data records include all attribute information on which the feature portion of the feature pool model is based, i.e., 900 history data records belong to the missing history data records. In this case, the model training apparatus 200 may perform prediction using only the 100 pieces of historical data records having complete correlation attribute information, and take the AUC obtained based on the prediction result as the packet AUC.

In the second case: the model training apparatus 200 may calculate the group AUC using the prediction results of all the historical data records of the next group of historical data records, wherein the prediction result of the missing historical data record is set as a default value determined based on the value range of the prediction result or based on the label distribution of the acquired historical data record. Specifically, assume that the kth group of history data records includes 1000 history data records in total, wherein only 100 history data records include all attribute information on which the feature portion of the feature pool model is based, i.e., 900 history data records belong to the missing history data records. In this case, the model training apparatus 200 may input the 100 historical data records with complete relevant attribute information into the feature pool model for prediction, and set the prediction results of the 900 historical data records as default values, where, as an example, the default values may be determined based on the value range of the prediction results, for example, in the case that the value range of the prediction results is [0,1], the default values may be set as an intermediate value of 0.5; alternatively, the default value may be determined based on the label distribution of the acquired historical data records, for example, assuming that there are 300 positive samples (i.e., labels 1) in 1000 historical data records included in the kth group of historical data records, the default value may be set to be a probability of a positive sample, for example, 0.3. When the corresponding prediction results of all 1000 pieces of history data are obtained as described above, the model training apparatus 200 may take the AUC obtained based on the prediction results as the group AUC.

In the third case: the model training apparatus 200 may multiply the AUC calculated using the prediction result of the history data other than the missing history data in the next set of history data with the proportion of the history data other than the missing history data in the next set of history data to obtain the grouped AUC. Specifically, assume that the kth group of history data records includes 1000 history data records in total, wherein only 100 history data records include all attribute information on which the feature portion of the feature pool model is based, i.e., 900 history data records belong to the missing history data records. In this case, the model training apparatus 200 may input the 100 historical data records with complete related attribute information into the feature pool model for prediction, obtain corresponding AUC based on the obtained prediction result, and then the model training apparatus 200 may multiply the obtained AUC by the proportion (i.e., 0.1) occupied by the non-missing historical data records to determine the final grouped AUC.

It should be noted that the above three cases are merely exemplary processing manners when there is a missing history data record, and are not intended to limit exemplary embodiments of the present invention. Any means similar or equivalent to the above three means may also be applied to the exemplary embodiments of the present invention.

After the test of the feature pool models is performed, in step S240, the training of one or more feature pool models trained up to now is continued by the model training apparatus 200 based on the k-th set of history data records, respectively.

Using the above-mentioned specific characteristic pool model LR_k-1For example, in step S240, the model training apparatus 200 continues the model training using the kth set of history data recordsTo an updated feature pool model LR_k. Specifically, in order to use the kth group of historical data records as the training data set, training samples need to be generated based on each historical data record in the kth group of historical data records, that is, the model training apparatus 200 may obtain the feature portion of the training sample according to the corresponding feature engineering process, and meanwhile, obtain the training sample of the feature pool model by using the label of the historical data record as the label of the training sample. Then, the model training apparatus 200 continues to train the feature pool model based on the obtained training samples to obtain an updated feature pool model LR_k. In a similar manner, the model training apparatus 200 may update all previously trained feature pool models using the kth set of historical data records.

It can be seen that, according to the exemplary embodiment of the present invention, in the process of training the feature pool model in stages, the corresponding packet AUC can be obtained at the same time, which makes the training and testing of the model more efficient and faster, and realizes the optimization of the whole system. In fact, the AUC obtained in the above example is strongly correlated with the true test AUC (tested, the correlation can reach above 0.85 in a particular data set), and therefore, as an example, the importance of each feature of the feature pool model can be determined based on the grouped AUC obtained in the above manner.

Next, in step S250, it is determined by the model training apparatus 200 whether the acquired kth group of history data records is the last group of divided history data records. If it is determined in step S250 that the current kth group of history data records is not the last group of history data records, it returns to step S220 to obtain the next group of divided history data records, i.e., the (k + 1) th group of history data records. In contrast, if it is determined in step S250 that the current kth set of history data records is the last set of history data records, it proceeds to step S310, where the importance of each feature of the machine learning sample is determined by the importance determining means 300 based on the saved grouped AUC of each feature pool model.

Specifically, in step S310, the importance determination apparatus 300 may integrate the respective grouped AUCs of each feature pool model to derive an AUC representing the performance of the corresponding feature pool model.

After obtaining the performance (i.e., AUC) of each feature pool model, the importance determination apparatus 300 may regard the performance of the feature pool model as an importance reference of a feature group (i.e., at least a part of features among features in a machine learning sample whose importance is to be determined) to which the feature pool model relates, and deduce the importance of each target feature or an importance ranking between the target features by integrating performance differences between the feature pool models.

Also, it should be noted that: the flowchart shown in fig. 3 is not intended to limit details of processing such as timing, but is merely used as an example to explain an exemplary embodiment of the present invention. As an example, the training/testing of the various feature pool models may be performed in parallel and/or online.

According to the exemplary embodiments of the present invention, for the machine learning samples used in the machine learning, the importance degree of each feature included therein can be effectively determined, thereby facilitating better model training and/or model interpretation.

Alternatively, the feature importance determination system shown in fig. 1 may further include a display device (not shown), and accordingly, in step S200 shown in fig. 2, the display device may be controlled by the model training device 200 to provide an interface for configuring at least one item among the following items of the feature pool model to the user: at least one part of features based on the feature pool model, the algorithm type of the feature pool model, the algorithm parameters of the feature pool model, the operation type of the discretization operation and the operation parameters of the discretization operation. Further, in this step, the model training apparatus 200 may train the feature pool models individually according to items configured by the user through the interface. Here, as an example, in step S200, the interface may be provided to the user in response to an indication of the user regarding determining the feature importance. For example, during the training process of the machine learning model, in order to determine the importance of each feature in the corresponding machine learning training sample, the user may make an indication during the feature engineering process to expect to acquire the importance of each feature. To this end, according to an exemplary embodiment of the present invention, a control such as a feature importance operator may be provided to a user under other relevant interfaces of a feature engineering or modeling process, and when the user clicks the control, an interface related to configuring a feature pool model may be presented to the user, in which various items such as an algorithm, a feature, a regular term, and the like of the feature pool model may be set, and particularly, items related to how to discretize continuous features of the feature pool model (for example, various parameters of a binning operation, and the like) may also be set. For example, as an alternative, the regular terms of the continuous features and the non-continuous features may be set separately, and different weights of the regular terms corresponding to different continuous features may also be set separately.

Here, the display device may be a simple display screen, in which case the feature importance determination system may further include an input device (e.g., a keyboard, a mouse, a microphone, a camera, etc.) that facilitates a user to configure items through the interface; alternatively, the display device may be a touch display screen with touch input function, in which case the user may complete the configuration of the items on the interface directly through the touch screen.

In addition, after the feature importance determination system according to an exemplary embodiment of the present invention obtains the importance of each feature of the machine learning sample, the determined importance information of each feature may be graphically presented to the user.

Fig. 4 illustrates an example of a feature importance presentation interface according to an exemplary embodiment of the present invention, in the interface illustrated in fig. 4, a feature importance analysis report is presented, in which a feature importance ranking and some additional information are listed, and as an example, when an indication bar of a certain feature is clicked or moved, sample information or attribute information and the like about the feature may be additionally displayed.

Alternatively, the respective features may be presented in order of importance of the features, and/or a part of the features among the respective features may be highlighted, wherein the part of the features includes an important feature corresponding to a high importance, an unimportant feature corresponding to a low importance, and/or an abnormal feature corresponding to an abnormal importance.

Fig. 5 shows an example of a feature importance presentation interface according to another exemplary embodiment of the present invention, in the interface shown in fig. 5, not only the features of the machine learning samples are shown in order of importance, but also abnormal features corresponding to abnormal importance are highlighted, optionally, possible reasons for the abnormal features are further provided, and the user interaction experience is enhanced.

It should be understood that: in the existing machine learning field, the programmer is required to write codes to complete the machine learning process in most cases, and even if some software systems such as a modeling platform are developed, the software systems still face the problem that business personnel except a machine learning expert are difficult to benefit. However, according to the exemplary embodiments of the present invention, the importance of each feature in the machine learning sample can be effectively and automatically determined, so that the threshold of applying machine learning is reduced. In addition, according to the exemplary embodiment of the present invention, the determination result about the feature importance and/or the related setting about the determination manner can be presented to the user in a friendly interactive manner, so that the usability of the machine learning platform is further enhanced, accordingly, the user with higher machine learning technical ability can conveniently set and/or adjust the details in the determination process, and the ordinary user can intuitively know the important features, the non-important features and/or the abnormal features and the like in the machine learning sample.

It should be noted that the feature importance system according to the exemplary embodiment of the present invention may fully rely on the execution of the computer program to realize the corresponding functions, i.e., the respective means correspond to the respective steps in the functional architecture of the computer program, so that the entire system is called by a dedicated software package (e.g., lib library) to realize the corresponding functions.

Alternatively, the various means in the feature importance system may be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that a processor may perform the corresponding operations by reading and executing the corresponding program code or code segments.

Here, the exemplary embodiments of the present invention may also be realized as a computing apparatus including a storage part in which a set of computer-executable instructions is stored and a processor, which, when executed by the processor, performs the above-described feature importance determination method.

In particular, the computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the set of instructions described above.

The computing device need not be a single computing device, but can be any device or collection of circuits capable of executing the instructions (or sets of instructions) described above, individually or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

Some of the operations described above with respect to the feature importance determination method may be implemented by software, some of the operations may be implemented by hardware, and further, the operations may be implemented by a combination of software and hardware.

The processor may execute instructions or code stored in one of the memory components, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory component may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage component may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage component and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage component.

Further, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via a bus and/or a network.

The operations described above with respect to the feature importance determination method may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operated on by non-exact boundaries.

In particular, as described above, a computing device for determining importance of various features of a machine learning sample according to an exemplary embodiment of the present invention may include a storage component having stored therein a set of computer-executable instructions that, when executed by the processor, perform the steps of: (A) obtaining a historical data record, wherein the historical data record comprises marks about machine learning problems and at least one piece of attribute information of each feature used for generating machine learning samples; (B) training at least one characteristic pool model by using the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the characteristics; (C) acquiring an effect of the at least one feature pool model, and determining the importance of each feature according to the acquired effect of the at least one feature pool model, wherein in the step (B), the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least one part of features.

It should be noted that the details of the processing of the feature importance determination method according to the exemplary embodiment of the present invention have been described above with reference to fig. 2 to 5, and the details of the processing when the computing apparatus performs the steps will not be described herein again.

While exemplary embodiments of the invention have been described above, it should be understood that the above description is illustrative only and not exhaustive, and that the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention should be subject to the scope of the claims.

Claims

1. A method of determining the importance of various features of a machine learning sample, comprising:

(A) obtaining a historical data record, wherein the historical data record comprises marks about machine learning problems and at least one piece of attribute information of each feature used for generating machine learning samples;

(B) training at least one characteristic pool model by using the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the characteristics;

(C) obtaining an effect of the at least one feature pool model, and determining the importance of the respective features according to the obtained effect of the at least one feature pool model,

wherein, in step (B), the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least a portion of features.

2. The method of claim 1, wherein in step (C), the importance of the respective feature on which the feature pool model is based is determined from the difference between the effect of the feature pool model on the original test data set and the transformed test data set,

the transformation test data set is a data set obtained by replacing the value of the target feature to be determined in the importance of the original test data set with one of the following items: zero values, random values, values obtained by scrambling the order of the original values of the target features.

3. The method of claim 1, wherein the at least one feature pool model comprises a plurality of machine learning models that provide predictions about machine learning problems based on different sets of features,

wherein in step (C) the importance of the individual features is determined from the difference between the effects of the at least one feature pool model on the original test data set.

4. The method of claim 3, wherein the at least one feature pool model comprises one or more main feature pool models and at least one sub-feature pool model respectively corresponding to each main feature pool model, wherein a sub-feature pool model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on remaining features except for a target feature whose importance is to be determined among features on which the corresponding main feature pool model is based,

wherein in step (C) the importance of the respective target feature is determined from the difference between the effect of the main feature pool model and its respective sub-feature pool model on the original test data set.

5. The method of claim 3, wherein the at least one feature pool model includes a plurality of single-feature models, wherein a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature of which importance is to be determined among the respective features,

wherein in step (C) the importance of the corresponding target feature is determined from the difference between the effects of the single-feature model on the original test data set.

6. The method of claim 1, wherein the discretized operation comprises a basic binning operation and at least one additional operation.

7. The method of claim 6, wherein the at least one additional operation comprises an additional binning operation that is binned in the same manner as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation comprises an additional binning operation in a different binning manner than the basic binning operation.

8. The method of claim 1, wherein step (B) further comprises: providing an interface to a user for configuring at least one of the following items of the feature pool model: at least a part of the features on which the feature pool model is based, the algorithm type of the feature pool model, the algorithm parameters of the feature pool model, the operation type of the discretization operation, the operation parameters of the discretization operation,

and, in step (B), training the feature pool models individually according to items configured by the user through the interface.

9. A system for determining the importance of various features of a machine learning sample, comprising:

data record acquisition means for acquiring a history data record, wherein the history data record includes a label about a machine learning problem and at least one attribute information for each feature used to generate a machine learning sample;

the model training device is used for training at least one characteristic pool model by utilizing the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the various characteristics;

importance determination means for acquiring an effect of the at least one feature pool model and determining the importance of the respective features based on the acquired effect of the at least one feature pool model,

wherein the model training means trains the feature pool model by performing a discretization operation on at least one continuous feature among the at least one portion of features.

10. A computing device for determining the importance of individual features of a machine learning sample, comprising a storage component having stored therein a set of computer-executable instructions which, when executed by a processor, perform the steps of: