CN108364106A - A kind of expense report Risk Forecast Method, device, terminal device and storage medium - Google Patents

A kind of expense report Risk Forecast Method, device, terminal device and storage medium Download PDF

Info

Publication number
CN108364106A
CN108364106A CN201810161565.6A CN201810161565A CN108364106A CN 108364106 A CN108364106 A CN 108364106A CN 201810161565 A CN201810161565 A CN 201810161565A CN 108364106 A CN108364106 A CN 108364106A
Authority
CN
China
Prior art keywords
model
prediction
reimbursement
risk level
success rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810161565.6A
Other languages
Chinese (zh)
Inventor
袁军
陆源
魏尧东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810161565.6A priority Critical patent/CN108364106A/en
Priority to PCT/CN2018/081527 priority patent/WO2019165673A1/en
Publication of CN108364106A publication Critical patent/CN108364106A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of expense report Risk Forecast Method, device, equipment and media.This method includes:History expense report information is obtained as sample data, and is training sample and test sample according to preset ratio cut partition;According to preset expense report risk class, the risk class of each training sample is determined;For the training sample in each risk class, model training is carried out using association rule algorithm, obtains initial predicted model;Test sample is predicted using initial predicted model, under each combination for selecting a group model parameter to be combined in each risk class, calculate success rate prediction and total success rate prediction under each combination and the testing time of each risk class;Regression analysis is made to model parameter, success rate prediction, testing time and total success rate prediction, obtains target prediction model, to which ancillary staff efficiently identifies the risk class of expense report, improves the accuracy rate of prediction expense report risk class.

Description

Reimbursement bill risk prediction method and device, terminal equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a reimbursement bill risk prediction method and device, terminal equipment and a storage medium.
Background
In daily expense reimbursement, some conditions of malicious reimbursement and false reimbursement exist, and in order to strengthen risk management, a reimbursement bill risk level prediction model is mostly established by using a mining algorithm based on association rules at present to predict the risk level of a reimbursement bill. However, when the distribution of the reimbursement note risk level data is uneven, the proportion of the reimbursement note with the low probability risk level in the training data is small, the reimbursement note data with the low probability risk level can be discarded as noise processing by a traditional mining algorithm based on the association rule, so that the characteristics of the reimbursement note data with the low probability risk level cannot be obtained by training and learning of the established model, and the prediction accuracy is low when the established model is used for predicting the risk level of a new reimbursement note.
Disclosure of Invention
The embodiment of the invention provides a risk prediction method for a reimbursement bill, which aims to solve the problem that the accuracy of the risk level prediction of the reimbursement bill by the current reimbursement bill risk level prediction model is low.
In a first aspect, an embodiment of the present invention provides a risk prediction method for a reimbursement order, including: acquiring historical reimbursement bill information, and taking the historical reimbursement bill information as sample data;
dividing the sample data into a training sample and a test sample according to a preset proportion;
determining the reimbursement bill risk level of each training sample according to the definition of N preset reimbursement bill risk levels, wherein N is a positive integer;
performing model training by using an association rule algorithm aiming at the training sample in each reimbursement order risk grade to obtain an initial prediction model, wherein the initial prediction model comprises an association rule which meets the preset model parameter requirement in each reimbursement order risk grade, and the model parameters comprise support degree and confidence degree;
performing model prediction on the test sample by using the initial prediction model, and calculating the prediction success rate of each reimbursement order risk grade, the total prediction success rate and the test time in each combination mode under each combination mode obtained by selecting a group of model parameters from each reimbursement order risk grade for combination;
and carrying out regression analysis on the model parameters, the prediction success rate, the test time and the total prediction success rate to obtain a target prediction model.
In a second aspect, an embodiment of the present invention provides a reimbursement order risk prediction apparatus, including:
the sample data acquisition module is used for acquiring historical reimbursement note information and taking the historical reimbursement note information as sample data;
the first dividing module is used for dividing the sample data into a training sample and a test sample according to a preset proportion;
the risk grade presetting module is used for determining the reimbursement bill risk grade of each training sample according to the definition of N preset reimbursement bill risk grades, wherein N is a positive integer;
an initial prediction model obtaining module, configured to perform model training on the training samples in each reimbursement order risk level by using an association rule algorithm to obtain an initial prediction model, where the initial prediction model includes an association rule that satisfies a preset model parameter requirement in each reimbursement order risk level, and the model parameter includes a support degree and a confidence degree;
the initial prediction model testing module is used for performing model prediction on the test sample by using the initial prediction model, and calculating the prediction success rate of each reimbursement order risk grade and the total prediction success rate and the test time of each reimbursement order risk grade under each combination mode obtained by selecting a group of model parameters from each reimbursement order risk grade and combining the model parameters;
and the target prediction model acquisition module is used for carrying out regression analysis on the model parameters, the prediction success rate, the test time and the total prediction success rate to obtain a target prediction model.
In a third aspect, an embodiment of the present invention provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the reimbursement order risk prediction method when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of the reimbursement order risk prediction method.
In the risk prediction method, the risk prediction device, the terminal equipment and the storage medium for the reimbursement bill provided by the embodiment of the invention, the quality of a model obtained by training a training sample can be evaluated through the testing sample by acquiring historical reimbursement bill information as the sample data and dividing the sample data into the training sample and the testing sample according to a preset proportion; after the reimbursement bill risk level is defined and the reimbursement bill risk level of each training sample is determined, model training is carried out on the training samples in each reimbursement bill risk level by using an association rule algorithm, a target association rule which meets preset model parameter requirements in each reimbursement bill risk level is obtained, an initial prediction model is constructed, the characteristics of reimbursement bill data which occupies a smaller proportion in sample data can be learned by carrying out model training according to different reimbursement bill risk levels, the condition that the reimbursement bill data are discarded when being treated as noise is avoided, and therefore the accuracy of the model is improved; and finally, performing model prediction on the test sample by using an initial prediction model, calculating the prediction success rate, the total prediction success rate and the test time of each reimbursement order risk grade in each combination mode under each combination mode obtained by selecting a group of model parameters from each reimbursement order risk grade and combining the model parameters, performing regression analysis on the discrete data to obtain a target prediction model, and obtaining accurate model configuration parameters through model prediction and regression analysis, so that the target prediction model can assist workers to accurately and efficiently identify the risk grade of the reimbursement order, and the accuracy of predicting the reimbursement order risk grade is effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a flowchart of a risk prediction method for reimbursement orders provided in embodiment 1 of the present invention;
fig. 2 is a flowchart illustrating the implementation of step S4 in the risk prediction method for reimbursement orders provided in embodiment 1 of the present invention;
fig. 3 is a flowchart of implementing step S5 in the reimbursement order risk prediction method provided in embodiment 1 of the present invention;
fig. 4 is a flowchart of implementing step S6 in the reimbursement order risk prediction method provided in embodiment 1 of the present invention;
FIG. 5 is a flowchart illustrating an implementation of testing the accuracy of a target prediction model by using a cross validation method in the risk prediction method for reimbursement orders provided in embodiment 1 of the present invention;
FIG. 6 is a schematic diagram of a risk prediction apparatus for reimbursement orders provided in embodiment 2 of the present invention;
fig. 7 is a schematic diagram of a terminal device provided in embodiment 4 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1, fig. 1 shows an implementation flow of a reimbursement bill risk prediction method according to an embodiment of the present invention. The reimbursement bill risk prediction method is applied to reimbursement bill auditing systems of all enterprises and public institutions, and is used for identifying the risk level of the reimbursement bill and improving the accuracy of predicting the risk level of the reimbursement bill. As shown in fig. 1, the risk prediction method of the reimbursement order includes steps S1 to S6, which are detailed as follows:
s1: and acquiring historical reimbursement bill information, and taking the historical reimbursement bill information as sample data.
In the embodiment of the invention, the sample data is collected from the historical reimbursement bill of the reimbursement bill database to obtain the historical reimbursement bill information.
The historical reimbursement bill is the data stored in the reimbursement bill database by the enterprise and public institution during the production and operation process. Each historical reimbursement bill information includes information obtained from the reimbursement bill and information generated in the process of processing the reimbursement bill, and specifically, the historical reimbursement bill information includes but is not limited to multiple attribute information such as reimbursement bill number, reimbursement bill name, manager Chinese name, reimburser Chinese name, department name, reimbursement amount, total amount, number of attached documents and the like, and mining learning is performed by taking the historical reimbursement bill information as sample data.
Specifically, when sample data of the reimbursement bill is collected, stored and processed, the Hadoop big data platform is used for collecting the sample data from the historical reimbursement bill stored in the reimbursement bill database.
Hadoop is a distributed System infrastructure, a distributed File System (HDFS) is realized, and the HDFS can provide high-throughput data access and is very suitable for application on a large-scale data set. In the collection process of sample data, data processing is carried out by adopting a distributed file system (HDFS) and a data warehouse tool hive, wherein the hive is a data warehouse tool based on Hadoop and is used for storing, inquiring and analyzing large-scale data stored in the Hadoop, so that the collection of the sample data by adopting a Hadoop big data platform has the advantage of high collection efficiency.
S2: dividing the sample data into training samples and testing samples according to a preset proportion.
In the embodiment of the present invention, a ratio for dividing sample data is set in advance.
It should be noted that the preset proportion may be a proportion obtained according to historical experience, or a proportion obtained by analyzing sample data, and may be specifically set according to the needs of practical application, which is not limited herein.
The training sample is a sample data set for machine learning, data feature learning is carried out, namely, data information in the training sample is adopted to train the machine learning model so as to determine parameters of the machine learning model, and the testing sample is used for testing the resolving capability of the machine learning model after training is finished, such as the prediction success rate of the risk level of the reimbursement note.
Specifically, the sample data is divided into training samples and test samples according to a preset proportion. For example, the sample data is divided according to a ratio of 9:1, that is, 90% of the sample data is used as a training sample, and the remaining 10% of the sample data is used as a test sample. If the total collected sample data is 605 ten thousand, according to the proportion of 9:1, 544.5 ten thousand sample data are used as training samples to perform feature learning, the rest 60.5 ten thousand sample data are used as test samples to predict the reimbursement bill risk level, and the prediction success rate of the model is verified.
S3: and determining the reimbursement bill risk level of each training sample according to the definition of N preset reimbursement bill risk levels, wherein N is a positive integer.
In the embodiment of the present invention, definitions of N reimbursement order risk levels are preset for distinguishing risks of reimbursement orders, where N is a positive integer, and the definitions of the reimbursement order risk levels may be set according to requirements of actual applications, which is not limited herein. The greater the risk rating of the reimbursement order, the higher the risk that the reimbursement order will have.
Specifically, the reimbursement bill risk level of each training sample is determined according to the definition of the preset reimbursement bill risk level, and the identification information of the reimbursement bill risk level corresponding to each training sample is identified.
For a better understanding of this step, a specific reimbursement order risk classification is described below as an example. As shown in Table I, Table I shows the classification criteria of the risk level of the reimbursement order into four risk levels of 0,1,2 and 3.
Watch 1
S4: and performing model training by using an association rule algorithm aiming at the training samples in each reimbursement order risk level to obtain an initial prediction model, wherein the initial prediction model comprises an association rule which meets the requirements of preset model parameters in each reimbursement order risk level, and the model parameters comprise support degree and confidence degree.
Specifically, according to the reimbursement bill risk level identification information of each training sample identified in step S3, the training samples obtained by collection and arrangement are grouped according to a preset reimbursement bill risk level classification standard, and machine learning is performed by using an association rule algorithm respectively. Presetting model parameter requirements including but not limited to preset support degree threshold values and confidence degree threshold values for each group of training samples, screening out model parameters meeting the support degree threshold values and the confidence degree threshold values and corresponding association rules according to the model parameter requirements, and constructing and obtaining an initial prediction model according to the model parameters and the association rules corresponding to the model parameters.
It should be noted that, a group of support threshold and confidence threshold may be preset in the preset model parameter requirement, or multiple groups of support threshold and confidence threshold may be preset, and the preset support threshold and confidence threshold may be taken according to historical experience, or may be taken according to the distribution of data, which is not limited herein.
For example, when the reimbursement bill risk level is preset to four levels of 0,1,2 and 3, the specific grouping is as follows:
P0:sup0=x0,confid0=y0
P1:sup1=x1,confid1=y1
P2:sup2=x2,confdi2=y2
P3:sup3=x3,confid3=y3
wherein, P0、P1、P2、P3Respectively classifying the training samples into groups according to risk grades of 0,1,2 and 3 reimbursement notes, supiFor support threshold, configiAs confidence threshold, xi∈[0,1],yi∈[0,1]And y isi≥xiAnd i is 0,1,2 and 3. For example, xiAnd a specific value of yi may be x0=0.6,y0=0.8;x1=0.1,y1=0.7;x2=0.6,y2=0.95;x3=0.1,y30.7 or x0=0.8,y0=0.95;x1=0.2,y1=0.7;x2=0.8,y2=0.9;x3=0.4,y30.7, etc.
S5: and performing model prediction on the test sample by using an initial prediction model, and calculating the prediction success rate of each reimbursement order risk level, the total prediction success rate and the test time in each combination mode under each combination mode obtained by selecting a group of model parameters from each reimbursement order risk level and combining the model parameters.
In the embodiment of the invention, the training samples under each reimbursement order risk level are subjected to data mining, one or more groups of model parameter requirements are preset in each reimbursement order risk level to screen the association rules meeting the preset model parameter requirements, an initial prediction model is used for carrying out model prediction on the test samples under each combination mode obtained by selecting one group of model parameters from each reimbursement order risk level and combining the selected model parameters, the prediction success rate and the total prediction success rate of each reimbursement order risk level under each combination mode are calculated, and the test time t for completing the reimbursement order risk level prediction of all the test samples under the combination mode is obtained.
S6: and carrying out regression analysis on the model parameters, the prediction success rate, the test time and the total prediction success rate to obtain a target prediction model.
In the embodiment of the invention, discrete data such as the prediction success rate, the test time, the total prediction success rate and the like obtained in each combination mode in the step S5 are subjected to regression analysis, the quantitative relation of mutual dependence among variables is determined, a continuous function or a denser discrete equation is obtained, the function or the discrete equation is matched with the discrete data, the function or the discrete equation is solved and analyzed, a group of discrete data with the highest total prediction success rate and the highest value of model parameters is taken as the optimal configuration parameters of the model, wherein the larger the support threshold and the confidence threshold are, the more accurate the obtained association rules are, a target prediction model is constructed according to the optimal configuration parameters of the model and the association rules correspondingly meeting the requirements of the optimal configuration parameters of the model, and the target prediction model is obtained for predicting the risk level of the reimbursement bill, the accuracy of the reimbursement bill risk prediction model is improved.
In the embodiment corresponding to fig. 1, by acquiring historical reimbursement bill information as sample data and dividing the sample data into a training sample and a test sample according to a preset proportion, the quality of a model obtained by training the training sample can be evaluated through the test sample; after the reimbursement bill risk level is defined and the reimbursement bill risk level of each training sample is determined, model training is carried out on the training samples in each reimbursement bill risk level by using an association rule algorithm, a target association rule which meets preset model parameter requirements in each reimbursement bill risk level is obtained, an initial prediction model is constructed, the characteristics of reimbursement bill data which occupies a smaller proportion in sample data can be learned by carrying out model training according to different reimbursement bill risk levels, the condition that the reimbursement bill data are discarded when being treated as noise is avoided, and therefore the accuracy of the model is improved; and finally, performing model prediction on the test sample by using an initial prediction model, calculating the prediction success rate, the total prediction success rate and the test time of each reimbursement order risk grade in each combination mode under each combination mode obtained by selecting a group of model parameters from each reimbursement order risk grade and combining the model parameters, performing regression analysis on the discrete data to obtain a target prediction model, and obtaining accurate model configuration parameters through model prediction and regression analysis, so that the target prediction model can assist workers to accurately and efficiently identify the risk grade of the reimbursement order, and the accuracy of predicting the reimbursement order risk grade is effectively improved.
Next, based on the embodiment corresponding to fig. 1, a specific implementation method of performing model training on the training sample in each reimbursement order risk level mentioned in step S4 by using an association rule algorithm to obtain an initial prediction model is described in detail below by using a specific embodiment.
Referring to fig. 2, fig. 2 shows a specific implementation flow of step S4 provided in the embodiment of the present invention, which is detailed as follows:
s41: and carrying out data preprocessing on the training samples in each reimbursement bill risk level to obtain a data set to be processed in each reimbursement bill risk level.
In the embodiment of the invention, the data preprocessing process comprises data cleaning, data integration and data conversion on the training samples.
And the data cleaning is to select attribute information required in a training sample as a characteristic value to carry out training and learning. The data integration is to integrate the data of the training sample of each reimbursement bill risk level into one data file as a data set. The data conversion is to convert the data types of the training samples in the data set into a uniform format, for example, if an association rule algorithm is generally applied to mining boolean data, all the data types are converted into boolean data.
And after the training samples in each reimbursement bill risk level are subjected to data preprocessing, a data set to be processed in each reimbursement bill risk level is obtained, so that the data quality of the training samples is improved.
S42: and carrying out data mining on the data set to be processed by using an association rule algorithm to obtain a plurality of item sets in each reimbursement bill risk level.
In the embodiment of the invention, an association rule algorithm is used for carrying out data mining on each data set to be processed, each reimbursement bill training sample is a transaction and is marked as T, corresponding transaction identification information is marked for each training sample, the set of the transactions is a transaction set and is marked as D, each attribute in the reimbursement bill is an item and is marked as W, each transaction comprises a plurality of attributes, the set of the items is an item set, and the item set W is { W ═ W1,w2,...,wjJ is the number of items in the item set. After each training sample in the data set to be processed is identified, the identification information of each transaction corresponds to one item set, and a plurality of item sets in each reimbursement bill risk level are obtained.
S43: and aiming at each reimbursement bill risk level, screening out a target item set meeting the requirements of the model parameters from the item sets in the reimbursement bill risk level, and establishing an association rule according to the target item set.
In the embodiment of the invention, aiming at the training learning of each reimbursement bill risk level training sample, one or more corresponding support degree threshold values and confidence degree threshold values are preset, a target item set with the support degree being greater than or equal to the support degree threshold value is screened out from each data set to be used as a frequent item set, then a preliminary rule is generated by the frequent item set, the confidence degree of the preliminary rule is calculated, and the rule with the confidence degree being greater than or equal to the confidence degree threshold value is obtained to be used as an association rule.
It should be noted that, the support degree is the percentage of the total number of transactions in the transaction set D that contains both the transaction a and the transaction B, the confidence degree is the percentage of the total number of transactions in the transaction set D that contains both the transaction a and the transaction B and the number of transactions that contain the transaction a, and the rule may be expressed by the following formulaIndicating that the association between transaction a and transaction B is reflected.
Specifically, the support degree can be calculated according to formula (1):
wherein sup is the support degree,the transaction set D includes the number of transactions a and B, and | D | | is the number of transactions in the transaction set D.
Specifically, the confidence may be calculated according to equation (2):
wherein,is a ruleThe degree of confidence of (a) is,the transaction number of the transaction A is contained in the transaction set D.
S44: and constructing an initial prediction model according to the association rule and the model parameter requirement corresponding to the association rule.
Specifically, on the basis that an association rule is obtained by performing data mining on a training sample by using an association rule algorithm according to a preset support degree threshold value and a preset confidence degree threshold value, the obtained association rules are summarized by using the preset support degree threshold value and the preset confidence degree threshold value as model parameters, and an initial prediction model is generated and used for predicting the risk level of a reimbursement order in a test sample.
In the embodiment corresponding to fig. 2, the quality of data used for training the machine learning model is improved by preprocessing the training samples in each reimbursement order risk level, a support threshold and a confidence threshold are preset for each reimbursement order risk level as model parameter requirements, the training samples in each reimbursement order risk level are subjected to data mining by using an association rule algorithm, the association among the data is mined to obtain an association rule, and an initial prediction model is generated by combining the preset model parameters for predicting the risk level of the reimbursement order. By adopting the mode of carrying out model training according to different reimbursement bill risk levels, the characteristics of reimbursement bill data with a small proportion in sample data can be learned, the condition that the reimbursement bill data is discarded as noise is avoided, and the accuracy of the model is improved.
Based on the embodiment corresponding to fig. 1 or fig. 2, a specific embodiment is described below, in which model prediction is performed on the test sample by using the initial prediction model in step S5, and the prediction success rate of each reimbursement item risk level is calculated in each combination mode obtained by selecting a set of model parameters from each reimbursement item risk level and combining the model parameters, and the specific implementation method of the total prediction success rate and the test time in the combination mode is described in detail.
Referring to fig. 3, fig. 3 shows a specific implementation flow of step S5 provided in the embodiment of the present invention, which is detailed as follows:
s51: and determining the reimbursement bill risk level of each test sample and the number of the test samples of each reimbursement bill risk level according to the definition of the preset N reimbursement bill risk levels.
In the embodiment of the present invention, according to the definition of N reimbursement order risk levels preset in step S3, the risk level of each reimbursement order in the test sample is determined, the identification information of the reimbursement order risk level corresponding to each test sample is identified, and the number of test samples of each reimbursement order risk level is obtained through statistics according to the identification information.
And predicting the risk level of the reimbursement bill of the test sample by using an initial prediction model obtained by training and learning, and verifying and correcting rules generated in the generation process of the model.
S52: calculating the probability of each reimbursement order risk level in the test sample according to formula (3):
wherein i ∈ [1, N ∈ ]],PiProbability of the ith reimbursement order risk level, R, in a test sampleiThe number of test samples for the ith reimbursement order risk level, and S is the total number of test samples.
S53: and selecting a group of model parameters from each reimbursement bill risk level to combine to obtain L combination modes, wherein L is a positive integer.
In the embodiment of the invention, the association rule mining is carried out on the training sample of each reimbursement order risk grade according to one or more groups of preset model parameter requirements in each reimbursement order risk grade, and the model parameters comprise a support degree threshold value and a confidence degree threshold value. And in the training sample of each reimbursement bill risk level, screening according to each group of model parameters to obtain a corresponding association rule meeting the requirements of the model parameters.
Specifically, in multiple sets of model parameters of N reimbursement order risk levels, one set of model parameters is selected from each reimbursement order risk level to be combined, so as to obtain L different combination modes, wherein L is a positive integer.
For example, when the risk level of the reimbursement order is preset to four levels of 0,1,2 and 3, the model parameters of each reimbursement order risk level are respectively preset to:
P0:(sup0,confid0)={(x01,y01),(x02,y02),(x03,y03)}
P1:(sup1,confid1)={(x11,y11),(x12,y12)}
P2:(sup2,confid2)={(x21,y21)}
P3:(sup3,confid3)={(x31,y31),(x32,y32)}
the combination method is as follows:
L1:{(x01,y01),(x11,y11),(x21,y21),(x31,y31)}
L2:{(x01,y01),(x12,y12),(x21,y21),(x31,y31)}
L3:{(x01,y01),(x11,y11),(x21,y21),(x32,y32)}
there are 12 combinations of 3 × 2 × 1 × 2.
S54: and aiming at each combination mode, performing reimbursement bill risk level prediction on the test samples by using the initial prediction model according to the sequence of the probability from high to low to obtain the prediction result of each test sample, and acquiring the test time for performing reimbursement bill risk level prediction in the combination mode.
In the embodiment of the invention, the probability of each reimbursement order risk level in the test sample is calculated according to the formula (3), the initial prediction model obtained by training is used for performing reimbursement order risk level prediction on the test sample according to the sequence from high probability to low probability aiming at each combination mode to obtain the prediction result of each test sample, the test time for completing reimbursement order risk level prediction on all the test samples in the combination mode is obtained, and the prediction result of each test sample and the corresponding test time in L combination modes are obtained together and are used for further analyzing the accuracy of the initial prediction model.
S55: and comparing the prediction result of each test sample with the reimbursement bill risk level of the test sample, if the prediction result of each test sample is the same as the reimbursement bill risk level of the test sample, confirming that the test sample is successfully predicted, and counting the number of the test samples successfully predicted under each reimbursement bill risk level.
Specifically, according to the prediction result of the reimbursement bill risk level of each test sample predicted in step S54, the prediction result is compared with the identification information of the reimbursement bill risk level of the test sample for analysis, if the two reimbursement bill risk levels are the same, the test sample is determined to be successfully predicted, and if the two reimbursement bill risk levels are different, the test sample is determined to be failed in prediction.
And counting the number of successful predictions of the test samples under each reimbursement bill risk level, and calculating the success rate of prediction of each reimbursement bill risk level under each combination mode.
S56: calculating the prediction success rate of each reimbursement bill risk grade under each combination mode according to a formula (4):
wherein hitrateiPrediction success rate for ith reimbursement order risk level, MiNumber of test samples, R, that were successfully predicted for the ith reimbursement order risk leveliNumber of test samples for the ith reimbursement order risk level.
S57: the total prediction success rate under each combination is calculated according to equation (5):
where hitRate is the total prediction success rate, MiAnd predicting the number of successful test samples under the ith reimbursement bill risk level, wherein S is the total number of the test samples.
For example, when the reimbursement bill risk level is preset to four levels of 0,1,2 and 3, 605790 reimbursement bill test samples are collected to perform reimbursement bill risk level prediction, and the number of test samples of each reimbursement bill risk level is identified and counted according to the preset reimbursement bill risk level definition, wherein the number of reimbursement bill samples of 0 risk level is 561627, the number of reimbursement bill samples of 1 risk level is 34818, the number of reimbursement bill samples of 2 risk level is 13, and the number of reimbursement bill samples of 3 risk level is 9332.
When in use0=0.8,confid0=0.95,sup1=0.4,confid1=0.7,sup2=0.4,confdi2=0.95,sup3=0.4,confid3And (3) performing reimbursement bill risk level prediction on the test samples as a preset model parameter requirement, comparing the prediction result of each test sample with the reimbursement bill risk level identified by the identification information of the test sample, and obtaining various risks and the likeThe successful results of the stage prediction are: the number of the reimbursement bills with the 0 risk level is 561527, the number of the reimbursement bills with the 1 risk level is 30821, the number of the reimbursement bills with the 2 risk level is 1, the number of the reimbursement bills with the 3 risk level is 1532, and the total number of the reimbursement bills with successful prediction is 593881.
Calculated according to equation (4): 0 risk class reimbursement order prediction success rate hitrate0The success rate hitrate of the reimbursement bill prediction of 1 risk class is 99.98219% for 561527/5616271Success rate hitrate is predicted for a bill of reimbursement at risk level of 2, with 30821/34818 ═ 88.52285%21/13-7.69230%, 3 risk level reimbursement order prediction success rate hitrate31532/9332 ═ 16.41663%. The total prediction success rate hitRate is calculated according to equation (5) to be 593881/605790-98.03413%.
In the embodiment corresponding to fig. 3, the probability of each reimbursement order risk level in the test sample is calculated, a set of model parameters is selected from each reimbursement order risk level to be combined, the initial prediction model is used for performing reimbursement order risk level prediction on the test sample according to the sequence from high probability to low probability, the recognition rate of the initial prediction model is checked, and the efficiency of model testing is improved. The prediction result of each test sample is compared with the pre-identified reimbursement order risk level to obtain the number of test sample prediction successes under each reimbursement order risk level, the prediction success rate and the total prediction success rate of each reimbursement order risk level under each combination mode are calculated, so that the accuracy of the initial prediction model is further analyzed according to the prediction success rate, the test time and the total prediction success rate, rules generated in the generation process of the model are verified and corrected, the initial prediction model is optimized, an accurate target prediction model is obtained, the target prediction model can assist workers to accurately and efficiently identify the risk level of the reimbursement order, and the accuracy of predicting the reimbursement order risk level is effectively improved.
Based on the embodiment corresponding to fig. 3, a specific implementation method for obtaining the target prediction model by performing regression analysis on the model parameters, the prediction success rate, the test time, and the total prediction success rate mentioned in step S6 through a specific embodiment is described in detail below.
Referring to fig. 4, fig. 4 shows a specific implementation flow of step S6 provided in the embodiment of the present invention, which is detailed as follows:
s61: and taking the model parameters, the prediction success rate and the test time in each reimbursement bill risk level as design variables, taking the total prediction success rate as a target variable, and performing function fitting by using the design variables and the target variable to obtain a fitting function.
In the embodiment of the present invention, the model parameters in each reimbursement order risk level, the prediction success rate and the test time are used as design variables, the total prediction success rate is used as a target variable, the design variables and the target variable are used for performing function fitting, the result of predicting the test sample in each combination mode is used as a group of data, and L groups of result data obtained in step S53 are fitted, and the fitting mode can be specifically expressed as:
wherein n represents the number of reimbursement order risk levels, t is the testing time for completing the reimbursement order risk level prediction of all test samples in each combination mode, δ is an operation configuration parameter, δ is a preset constant configured according to software and hardware of the system, and can be specifically set according to the requirements of practical application, and the method is not limited here.
The program execution efficiency of the fitting process can be adjusted by a combination of the parameter t and the parameter δ.
Specifically, the function fitting mode may use tools such as office software (Microsoft Excel, Excel) or mathematical software (Matrix Laboratory, matlab) to perform fitting, perform nonlinear regression analysis on discrete data including model parameters including support degree and confidence degree, prediction success rate, total prediction success rate, and the like, find a relationship between a design variable and a target variable, and determine an expression f (x) of a fitting function according to the relationship, thereby fitting a discrete equation fitting the discrete data.
S62: and solving the fitting function, taking a group of design variables with the highest total prediction success rate and the highest values of the model parameters as model configuration parameters according to the solving result, and constructing a target prediction model according to the model configuration parameters, wherein the model accuracy of the target prediction model is the highest total prediction success rate.
Specifically, a fitting function f (x) obtained through fitting is solved, a group of design variables with the highest total prediction success rate and the highest model parameter value are used as model configuration parameters according to a solving result, the obtained association rule is more accurate when a support degree threshold value and a confidence degree threshold value are larger, and a target prediction model is constructed according to the model configuration parameters and the association rule meeting the requirements of the model configuration parameters.
When the target prediction model is used for predicting the risk level of the reimbursement bill, the model accuracy of the target prediction model is taken as the highest total prediction success rate and is taken as the standard for evaluating the quality of the model, and the higher the total prediction success rate of the model is, the higher the model accuracy is.
In the embodiment corresponding to fig. 4, the model parameters in each reimbursement order risk level, the prediction success rate and the test time are used as design variables, the total prediction success rate is used as a target variable, nonlinear regression analysis is performed to perform function fitting to find the relationship between the design variables and the target variable to obtain an expression of the fitting function, the fitting function is solved, a group of design variables with the highest total prediction success rate and the highest values of the model parameters is used as model configuration parameters according to the solving result, the accuracy of the association rules is improved, and the target prediction model is constructed according to the model configuration parameters and the association rules corresponding to the requirements of the model parameters, so that the accuracy of the prediction of the target prediction model is improved.
On the basis of the embodiment corresponding to fig. 4, after the model parameters, the prediction success rate, the test time, and the total prediction success rate are subjected to regression analysis in step S6 to obtain the target prediction model, a reasonable model may be further selected by using a cross-validation method.
As shown in fig. 5, the risk prediction method for reimbursement order further includes:
s71: the sample data is partitioned into K sub-sample data.
In the embodiment of the invention, the target prediction model after fitting optimization is verified by using a cross-validation precision test method, the collected reimbursement single sample data is divided into K sub-sample data by adopting a random division mode, the target prediction model is constructed for multiple times in a machine learning mode, the precision of the constructed target prediction model is evaluated, and the overfitting of the trained model is avoided, wherein K is a positive integer.
The overfitting means that the fitting function is highly consistent with the training sample, but the model configuration parameters obtained by solving are used for predicting the condition that the success rate of the reimbursement note risk level of the test sample is not high.
It should be noted that, the cross validation may adopt a leave validation (hold cross validation), a K-fold cross validation (K-fold cross validation) or a leave-one-out cross validation (loocv) or the like, after the sample data is cut into smaller subsamples, most of the subsamples are obtained for model construction, and the remaining small part of the subsamples are used for testing the established model.
S72: selecting one sub-sample data as a test sample from K sub-sample data, and performing model training, model prediction and regression analysis on the remaining K-1 sub-sample data as training samples to obtain K target prediction models and the model accuracy of each target prediction model, wherein K is a positive integer.
In the embodiment of the invention, one of the K pieces of sub-sample data is selected as a test sample of the verification model, and the other K-1 pieces of sub-sample data are used as training samples for feature learning, the processes from the step S3 to the step S6 are executed, model training, model prediction and regression analysis are carried out, and the construction of the target prediction model is completed once, so that the target prediction model and the model accuracy thereof are obtained. According to the construction mode, each sub-sample data in the K sub-sample data is used as a test sample to carry out one-time construction of the target prediction model, and K results are obtained, wherein the K results comprise the K target prediction models and the model accuracy of each target prediction model.
S73: and taking the target prediction model with the highest model accuracy as a reasonable model.
Specifically, model accuracy of the K target prediction models and model accuracy of each target prediction model are compared and analyzed, and the target prediction model with the highest model accuracy is used as a reasonable model, so that a reliable and stable reasonable model is obtained.
The reasonable model can be used for fitting sample data on one hand, and predicting the risk level of new reimbursement bill data at high accuracy on the other hand, can be used for predicting the accurate reimbursement bill risk level, and stores the reimbursement bill data and the corresponding reimbursement bill risk level into the reimbursement bill database.
Further, according to a preset time interval, which may be 1 month, 2 months or other time ranges, the historical reimbursement bill information is randomly acquired from the reimbursement bill database at predetermined time intervals, the processes from the step S1 to the step S6 are repeatedly executed, autonomous machine learning is completed, and the updated target prediction model is obtained, so that the accuracy of the model is further optimized, the success rate of predicting the reimbursement bill risk level is improved, and accurate prediction of the reimbursement bill risk level is realized.
In the embodiment corresponding to fig. 5, the model precision is tested by a cross validation method, and the randomly divided sub-sample data is used for training and validating for multiple times, so that the condition that the target prediction model obtained by training is not fit correctly is avoided, and the target prediction model with the highest model precision is selected from the validation result as a reasonable model, so that sample data can be fitted, the prediction of new reimbursement bill data can be realized with high accuracy, and the accuracy of reimbursement bill risk level prediction is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Example 2
Fig. 6 shows a report risk prediction device corresponding to the report risk prediction method shown in embodiment 1, and for convenience of explanation, only the parts related to the embodiment of the present invention are shown.
As shown in fig. 6, the risk prediction apparatus for reimbursement orders includes a sample data collection module 61, a sample data division module 62, a risk level presetting module 63, an initial prediction model obtaining module 64, an initial prediction model testing module 65, and a target prediction model obtaining module 66. The functional modules are explained in detail as follows:
the sample data acquisition module 61 is used for acquiring historical reimbursement note information and taking the historical reimbursement note information as sample data;
the first dividing module 62 is configured to divide the sample data into a training sample and a test sample according to a preset proportion;
a risk grade presetting module 63, configured to determine an reimbursement bill risk grade of each training sample according to definitions of N reimbursement bill risk grades, where N is a positive integer;
an initial prediction model obtaining module 64, configured to perform model training using an association rule algorithm on a training sample in each reimbursement order risk level to obtain an initial prediction model, where the initial prediction model includes an association rule that satisfies a preset model parameter requirement in each reimbursement order risk level, and the model parameters include a support degree and a confidence degree;
an initial prediction model testing module 65, configured to perform model prediction on a test sample by using an initial prediction model, and calculate a prediction success rate of each reimbursement order risk level, and a total prediction success rate and test time in each combination mode obtained by selecting a set of model parameters from each reimbursement order risk level for combination;
and the target prediction model obtaining module 66 is used for performing regression analysis on the model parameters, the prediction success rate, the test time and the total prediction success rate to obtain a target prediction model.
Further, the initial prediction model obtaining module 64 includes:
the data preprocessing unit 641 is configured to perform data preprocessing on the training samples in each reimbursement order risk level to obtain a to-be-processed data set in each reimbursement order risk level;
the training sample mining unit 642 is used for performing data mining on the data sets to be processed by using an association rule algorithm to obtain a plurality of item sets in each reimbursement note risk level;
an association rule obtaining unit 643, configured to, for each reimbursement bill risk level, screen out a target item set that meets the requirement of the model parameter from the item sets in the reimbursement bill risk level, and establish an association rule according to the target item set;
the initial prediction model building unit 644 is configured to build an initial prediction model according to the association rule and the model parameter requirement corresponding to the association rule.
Further, the initial prediction model test module 65 includes:
the first statistical unit 651 is configured to determine the reimbursement bill risk level of each test sample and the number of test samples of each reimbursement bill risk level according to the definition of the preset N reimbursement bill risk levels;
a first calculating unit 652, configured to calculate a probability of each reimbursement order risk level in the test sample according to the following formula:
wherein i ∈ [1, N ∈ ]],PiProbability of the ith reimbursement order risk level, R, in a test sampleiThe number of test samples of the ith reimbursement bill risk level is S, and the S is the total number of the test samples;
the prediction mode combination unit 653 is configured to select a set of model parameters from each reimbursement bill risk level for combination, so as to obtain L types of combination modes, where L is a positive integer;
the test sample prediction unit 654 is configured to, for each combination mode, perform reimbursement order risk level prediction on the test samples by using the initial prediction model in the order from high probability to low probability, obtain a prediction result of each test sample, and obtain test time for performing reimbursement order risk level prediction in the combination mode;
the second statistical unit 655 is configured to compare the prediction result of each test sample with the reimbursement order risk level of the test sample, confirm that the test sample is successfully predicted if the prediction result of each test sample is the same as the reimbursement order risk level of the test sample, and count the number of the test sample in each reimbursement order risk level in each combination mode;
a second calculating unit 656, configured to calculate a prediction success rate of each reimbursement note risk level in each combination according to the following formula:
wherein hitrateiPrediction success rate for ith reimbursement order risk level, MiPredicting the number of successful test samples under the ith reimbursement bill risk level;
a third calculating unit 657, configured to calculate a total prediction success rate in each combination formula according to the following formula:
wherein, hitRate is the total prediction success rate.
Further, the target prediction model obtaining module 66 includes:
a data fitting unit 661, configured to use the model parameters, the prediction success rate, and the test time in each reimbursement order risk level as design variables, use the total prediction success rate as a target variable, and perform function fitting using the design variables and the target variable to obtain a fitting function;
and the target prediction model construction unit 662 is configured to solve the fitting function, use a set of design variables with the highest total prediction success rate and the highest values of the model parameters as model configuration parameters according to the solution result, and construct a target prediction model according to the model configuration parameters, where the model accuracy of the target prediction model is the highest total prediction success rate.
Further, the reimbursement bill risk prediction device further includes:
a second partitioning module 67, configured to partition the sample data into K sub-sample data;
the cross validation module 68 is configured to select one sub-sample data as a test sample from the K sub-sample data, perform model training, model prediction, and regression analysis on the remaining K-1 sub-sample data as training samples, and obtain K target prediction models and model accuracy of each target prediction model, where K is a positive integer;
and a reasonable model obtaining module 69, configured to use the target prediction model with the highest model accuracy as the reasonable model.
The process of implementing each function by each module in the reimbursement note risk prediction apparatus provided in this embodiment may specifically refer to the description of method embodiment 1, and is not described herein again.
Example 3
This embodiment provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for predicting risk of reimbursement bills in embodiment 1 is implemented, and details are not repeated here to avoid repetition. Alternatively, the computer program, when executed by the processor, implements the functions of each module/unit in the reimbursement note risk prediction apparatus in embodiment 2, and is not described herein again to avoid redundancy.
It is to be understood that the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and the like.
Example 4
Fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 7, the terminal device 7 of this embodiment includes: a processor 70, a memory 71, and a computer program 72, such as a reimbursement order risk prediction program, stored in memory 71 and executable on processor 70. Processor 70, when executing computer program 72, implements the steps of the above-described embodiments of the risk prediction method for reimbursement orders, such as steps S1-S6 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, implements the functions of each module/unit in each device embodiment described above, such as the functions of the module 61 to the step 66 shown in fig. 6.
Illustratively, the computer program 72 may be divided into one or more modules/units, which are stored in the memory 71 and executed by the processor 70 to carry out the invention. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 72 in the terminal device 7. The computer program 72 may be partitioned into a sample data collection module, a sample data partitioning module, a risk level presetting module, an initial prediction model obtaining module, an initial prediction model testing module, and a target prediction model obtaining module. The modules are described in detail as follows:
the sample data acquisition module is used for acquiring historical reimbursement note information and taking the historical reimbursement note information as sample data;
the first dividing module is used for dividing the sample data into a training sample and a test sample according to a preset proportion;
the risk grade presetting module is used for determining the reimbursement bill risk grade of each training sample according to the preset definition of N reimbursement bill risk grades, wherein N is a positive integer;
the initial prediction model acquisition module is used for carrying out model training by using an association rule algorithm aiming at a training sample in each reimbursement order risk grade to obtain an initial prediction model, wherein the initial prediction model comprises an association rule which meets the preset model parameter requirement in each reimbursement order risk grade, and the model parameters comprise support degree and confidence degree;
the initial prediction model testing module is used for performing model prediction on a test sample by using an initial prediction model, and calculating the prediction success rate of each reimbursement order risk level, the total prediction success rate and the test time of each reimbursement order risk level under each combination mode obtained by selecting a group of model parameters from each reimbursement order risk level and combining the model parameters;
and the target prediction model acquisition module is used for carrying out regression analysis on the model parameters, the prediction success rate, the test time and the total prediction success rate to obtain a target prediction model.
Further, the initial prediction model obtaining module comprises:
the data preprocessing unit is used for preprocessing the training samples in each reimbursement bill risk level to obtain a data set to be processed in each reimbursement bill risk level;
the training sample mining unit is used for carrying out data mining on the data set to be processed by using an association rule algorithm to obtain a plurality of item sets in each reimbursement note risk level;
the association rule obtaining unit is used for screening out a target item set meeting the requirements of the model parameters from the item sets in the reimbursement bill risk level aiming at each reimbursement bill risk level and establishing an association rule according to the target item set;
and the initial prediction model building unit is used for building an initial prediction model according to the association rule and the model parameter requirement corresponding to the association rule.
Further, the initial prediction model testing module comprises:
the first statistical unit is used for determining the reimbursement bill risk level of each test sample and the number of the test samples of each reimbursement bill risk level according to the definition of N preset reimbursement bill risk levels;
the first calculating unit is used for calculating the probability of each reimbursement bill risk level in the test sample according to the following formula:
wherein i ∈ [1, N ∈ ]],PiProbability of the ith reimbursement order risk level, R, in a test sampleiReimburse for the ithThe number of test samples of a single risk level, S is the total number of the test samples;
the prediction mode combination unit is used for selecting a group of model parameters from each reimbursement bill risk grade to be combined to obtain L combination modes, wherein L is a positive integer;
the test sample prediction unit is used for performing reimbursement bill risk level prediction on the test samples by using the initial prediction model according to the sequence of the probability from high to low aiming at each combination mode to obtain the prediction result of each test sample and obtain the test time for performing reimbursement bill risk level prediction in the combination mode;
the second statistical unit is used for comparing the prediction result of each test sample with the reimbursement bill risk level of the test sample, confirming the success of the prediction of the test sample if the prediction result of each test sample is the same as the reimbursement bill risk level of the test sample, and counting the number of the success of the prediction of the test sample under each reimbursement bill risk level in each combination mode;
a second calculating unit, configured to calculate a prediction success rate of each reimbursement bill risk level in each combination formula as follows:
wherein hitrateiPrediction success rate for ith reimbursement order risk level, MiPredicting the number of successful test samples under the ith reimbursement bill risk level;
a third calculating unit, configured to calculate a total prediction success rate in each combination according to the following formula:
wherein, hitRate is the total prediction success rate.
Further, the target prediction model obtaining module includes:
the data fitting unit is used for taking the model parameters in the risk level of each reimbursement bill, the prediction success rate and the test time as design variables, taking the total prediction success rate as a target variable, and performing function fitting by using the design variables and the target variable to obtain a fitting function;
and the target prediction model construction unit is used for solving the fitting function, taking a group of design variables with the highest total prediction success rate and the highest values of the model parameters as model configuration parameters according to the solving result, and constructing the target prediction model according to the model configuration parameters, wherein the model accuracy of the target prediction model is the highest total prediction success rate.
Further, the computer program 72 may also be divided into:
a second partitioning module for partitioning the sample data into K sub-sample data;
the cross validation module is used for selecting one sub-sample data as a test sample from K sub-sample data, taking the residual K-1 sub-sample data as a training sample, and performing model training, model prediction and regression analysis to obtain K target prediction models and the model accuracy of each target prediction model, wherein K is a positive integer;
and the reasonable model obtaining module is used for taking the target prediction model with the highest model accuracy as a reasonable model.
The terminal device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device 7 may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is merely an example of a terminal device 7 and does not constitute a limitation of the terminal device 7 and may include more or less components than those shown, or combine certain components, or different components, e.g. the terminal device 7 may also include input output devices, network access devices, buses, etc.
The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk provided on the terminal device 7, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 71 may also include both an internal storage unit of the terminal device 7 and an external storage device. The memory 71 is used for storing computer programs and other programs and data required by the terminal device 7. The memory 71 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A reimbursement bill risk prediction method, comprising:
acquiring historical reimbursement bill information, and taking the historical reimbursement bill information as sample data;
dividing the sample data into a training sample and a test sample according to a preset proportion;
determining the reimbursement bill risk level of each training sample according to the definition of N preset reimbursement bill risk levels, wherein N is a positive integer;
performing model training by using an association rule algorithm aiming at the training sample in each reimbursement order risk grade to obtain an initial prediction model, wherein the initial prediction model comprises an association rule which meets the preset model parameter requirement in each reimbursement order risk grade, and the model parameters comprise support degree and confidence degree;
performing model prediction on the test sample by using the initial prediction model, and calculating the prediction success rate of each reimbursement order risk grade, the total prediction success rate and the test time in each combination mode under each combination mode obtained by selecting a group of model parameters from each reimbursement order risk grade for combination;
and carrying out regression analysis on the model parameters, the prediction success rate, the test time and the total prediction success rate to obtain a target prediction model.
2. The reimbursement bill risk prediction method of claim 1, wherein the model training using association rule algorithm for the training samples in each of the reimbursement bill risk classes to obtain an initial prediction model comprises:
performing data preprocessing on the training samples in each reimbursement order risk grade to obtain a data set to be processed in each reimbursement order risk grade;
performing data mining on the data set to be processed by using an association rule algorithm to obtain a plurality of item sets in each reimbursement bill risk level;
aiming at each reimbursement bill risk level, screening out a target item set meeting the requirement of the model parameters from the item sets in the reimbursement bill risk level, and establishing an association rule according to the target item set;
and constructing the initial prediction model according to the association rule and the model parameter requirement corresponding to the association rule.
3. The reimbursement note risk prediction method according to claim 1 or 2, wherein the model prediction using the initial prediction model for the test sample is performed, the prediction success rate of each reimbursement note risk level is calculated in each combination manner by selecting a set of the model parameters from each reimbursement note risk level and the total prediction success rate and the test time in the combination manner include:
determining the reimbursement bill risk level of each test sample and the number of the test samples of each reimbursement bill risk level according to the definition of the preset N reimbursement bill risk levels;
calculating the probability of each reimbursement order risk level in the test sample according to the following formula:
wherein i ∈ [1, N ∈ ]],PiProbability of reimbursement order risk level for ith in the test sample, RiThe number of test samples of the ith reimbursement bill risk level is S, and the S is the total number of the test samples;
selecting a group of model parameters from each reimbursement bill risk grade to combine to obtain L combination modes, wherein L is a positive integer;
for each combination mode, according to the sequence of the probability from high to low, the initial prediction model is used for conducting reimbursement order risk level prediction on the test samples to obtain the prediction result of each test sample, and the test time for conducting reimbursement order risk level prediction in the combination mode is obtained;
comparing the prediction result of each test sample with the reimbursement bill risk level of the test sample, if the prediction result of each test sample is the same as the reimbursement bill risk level of the test sample, confirming that the test sample is successfully predicted, and counting the number of the test sample under each reimbursement bill risk level in each combination mode;
calculating the prediction success rate of each reimbursement note risk grade under each combination mode according to the following formula:
wherein hitrateiFor the prediction success rate of the ith said reimbursement note risk level, MiPredicting the number of successful test samples under the ith reimbursement bill risk level;
the total prediction success rate under each of the combination modes is calculated according to the following formula:
wherein hitRate is the total prediction success rate.
4. The reimbursement note risk prediction method of claim 3, wherein the performing regression analysis on the model parameters, the prediction success rate, the test time, and the total prediction success rate to obtain a target prediction model comprises:
taking the model parameters in each reimbursement bill risk level, the prediction success rate and the test time as design variables, taking the total prediction success rate as a target variable, and performing function fitting by using the design variables and the target variable to obtain a fitting function;
and solving the fitting function, taking a group of design variables with the highest total prediction success rate and the highest values of the model parameters as model configuration parameters according to a solving result, and constructing a target prediction model according to the model configuration parameters, wherein the model accuracy of the target prediction model is the highest total prediction success rate.
5. The reimbursement note risk prediction method of claim 4, wherein after performing regression analysis on the model parameters, the prediction success rate, the test time, and the total prediction success rate to obtain a target prediction model, the reimbursement note risk prediction method further comprises:
partitioning the sample data into K sub-sample data;
selecting one sub-sample data from the K sub-sample data as the test sample, and performing the model training, the model prediction and the regression analysis on the remaining K-1 sub-sample data as the training samples to obtain K target prediction models and the model accuracy of each target prediction model, wherein K is a positive integer;
and taking the target prediction model with the highest model accuracy as a reasonable model.
6. A reimbursement order risk prediction device, comprising:
the sample data acquisition module is used for acquiring historical reimbursement note information and taking the historical reimbursement note information as sample data;
the first dividing module is used for dividing the sample data into a training sample and a test sample according to a preset proportion;
the risk grade presetting module is used for determining the reimbursement bill risk grade of each training sample according to the definition of N preset reimbursement bill risk grades, wherein N is a positive integer;
an initial prediction model obtaining module, configured to perform model training on the training samples in each reimbursement order risk level by using an association rule algorithm to obtain an initial prediction model, where the initial prediction model includes an association rule that satisfies a preset model parameter requirement in each reimbursement order risk level, and the model parameter includes a support degree and a confidence degree;
the initial prediction model testing module is used for performing model prediction on the test sample by using the initial prediction model, and calculating the prediction success rate of each reimbursement order risk grade and the total prediction success rate and the test time of each reimbursement order risk grade under each combination mode obtained by selecting a group of model parameters from each reimbursement order risk grade and combining the model parameters;
and the target prediction model acquisition module is used for carrying out regression analysis on the model parameters, the prediction success rate, the test time and the total prediction success rate to obtain a target prediction model.
7. The reimbursement order risk prediction device of claim 6, wherein the initial prediction model acquisition module comprises:
the data preprocessing unit is used for preprocessing the training samples in each reimbursement bill risk grade to obtain a to-be-processed data set in each reimbursement bill risk grade;
the training sample mining unit is used for carrying out data mining on the data set to be processed by using an association rule algorithm to obtain a plurality of item sets in each reimbursement note risk level;
an association rule obtaining unit, configured to, for each reimbursement bill risk level, screen out, from the item sets in the reimbursement bill risk level, a target item set that meets the requirement of the model parameter, and establish an association rule according to the target item set;
and the initial prediction model building unit is used for building the initial prediction model according to the association rule and the model parameter requirement corresponding to the association rule.
8. The reimbursement bill risk prediction device according to claim 6, wherein said reimbursement bill risk prediction device further comprises:
a second partitioning module for partitioning the sample data into K sub-sample data;
the cross validation module is used for selecting one sub-sample data from the K sub-sample data as the test sample, using the remaining K-1 sub-sample data as the training sample, and performing the model training, the model prediction and the regression analysis to obtain K target prediction models and the model accuracy of each target prediction model, wherein K is a positive integer;
and the reasonable model obtaining module is used for taking the target prediction model with the highest model accuracy as a reasonable model.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the reimbursement order risk prediction method according to any one of claims 1 to 5.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the reimbursement order risk prediction method according to any one of claims 1 to 5.
CN201810161565.6A 2018-02-27 2018-02-27 A kind of expense report Risk Forecast Method, device, terminal device and storage medium Pending CN108364106A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810161565.6A CN108364106A (en) 2018-02-27 2018-02-27 A kind of expense report Risk Forecast Method, device, terminal device and storage medium
PCT/CN2018/081527 WO2019165673A1 (en) 2018-02-27 2018-04-02 Reimbursement form risk prediction method, apparatus, terminal device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810161565.6A CN108364106A (en) 2018-02-27 2018-02-27 A kind of expense report Risk Forecast Method, device, terminal device and storage medium

Publications (1)

Publication Number Publication Date
CN108364106A true CN108364106A (en) 2018-08-03

Family

ID=63003052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810161565.6A Pending CN108364106A (en) 2018-02-27 2018-02-27 A kind of expense report Risk Forecast Method, device, terminal device and storage medium

Country Status (2)

Country Link
CN (1) CN108364106A (en)
WO (1) WO2019165673A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109493245A (en) * 2018-11-07 2019-03-19 平安医疗健康管理股份有限公司 The risk management and control and relevant apparatus of medical insurance reimbursement data
CN109522304A (en) * 2018-11-23 2019-03-26 中国联合网络通信集团有限公司 Exception object recognition methods and device, storage medium
CN109544385A (en) * 2018-11-07 2019-03-29 平安医疗健康管理股份有限公司 A kind of diagnosis and treatment authenticity detection method and system based on reimbursement data
CN109784343A (en) * 2019-01-25 2019-05-21 上海深杳智能科技有限公司 A kind of resource allocation methods and terminal based on deep learning model
CN109816158A (en) * 2019-01-04 2019-05-28 平安科技(深圳)有限公司 Combined method, device, equipment and the readable storage medium storing program for executing of prediction model
CN109903165A (en) * 2018-12-14 2019-06-18 阿里巴巴集团控股有限公司 A kind of model merging method and device
CN110046229A (en) * 2019-04-18 2019-07-23 北京百度网讯科技有限公司 For obtaining the method and device of information
CN111160662A (en) * 2019-12-31 2020-05-15 咪咕文化科技有限公司 Risk prediction method, electronic equipment and storage medium
CN112084106A (en) * 2019-06-14 2020-12-15 中国移动通信集团浙江有限公司 Test data selection method and device, computing equipment and computer storage medium
CN112785112A (en) * 2019-11-11 2021-05-11 华为技术有限公司 Risk rule extraction method and risk rule extraction device
CN113112352A (en) * 2021-05-27 2021-07-13 中国工商银行股份有限公司 Risk service detection model training method, risk service detection method and device
CN113254919A (en) * 2021-07-14 2021-08-13 杭州云信智策科技有限公司 Abnormal device identification method, electronic device, and computer-readable storage medium
CN113485910A (en) * 2021-06-07 2021-10-08 广发银行股份有限公司 Test risk early warning method, system, equipment and storage medium
CN113496302A (en) * 2020-04-02 2021-10-12 中国石油化工股份有限公司 Method and system for intelligently identifying and early warning drilling risks
CN113656558A (en) * 2021-08-25 2021-11-16 平安科技(深圳)有限公司 Method and device for evaluating association rule based on machine learning
CN113723800A (en) * 2021-08-27 2021-11-30 上海幻电信息科技有限公司 Risk identification model training method and device and risk identification method and device
CN118114069A (en) * 2024-03-26 2024-05-31 北京合思信息技术有限公司 Matching degree risk prediction method and device based on linear regression model

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191871A (en) * 2019-11-21 2020-05-22 深圳壹账通智能科技有限公司 Project baseline data generation method and device, computer equipment and storage medium
CN113313279A (en) * 2020-02-27 2021-08-27 北京沃东天骏信息技术有限公司 Document auditing method and device
CN111652746B (en) * 2020-05-29 2023-08-29 泰康保险集团股份有限公司 Information generation method, device, electronic equipment and storage medium
CN112464987B (en) * 2020-10-30 2024-09-06 中国石油天然气集团有限公司 First arrival position prediction result evaluation method and device
CN112308170B (en) * 2020-11-10 2024-08-20 维沃移动通信有限公司 Modeling method and device and electronic equipment
CN114629797B (en) * 2022-03-11 2024-03-08 阿里巴巴(中国)有限公司 Bandwidth prediction method, model generation method and device
CN115481929B (en) * 2022-10-17 2023-11-24 四川大学华西医院 Reconstruction measure effectiveness evaluation method and device, terminal equipment and storage medium
CN117094184B (en) * 2023-10-19 2024-01-26 上海数字治理研究院有限公司 Modeling method, system and medium of risk prediction model based on intranet platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376221A (en) * 2014-11-21 2015-02-25 环境保护部南京环境科学研究所 Method for predicating skin permeability coefficients of organic chemicals
CN105740984A (en) * 2016-02-01 2016-07-06 北京理工大学 Product concept performance evaluation method based on performance prediction
WO2017083675A1 (en) * 2015-11-13 2017-05-18 Biotheranostics, Inc. Integration of tumor characteristics with breast cancer index
CN107104978A (en) * 2017-05-24 2017-08-29 赖洪昌 A kind of network risks method for early warning based on deep learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130317889A1 (en) * 2012-05-11 2013-11-28 Infosys Limited Methods for assessing transition value and devices thereof
CN105718490A (en) * 2014-12-04 2016-06-29 阿里巴巴集团控股有限公司 Method and device for updating classifying model
CN105022829A (en) * 2015-07-30 2015-11-04 四川长虹电器股份有限公司 System and method for processing data
CN106934586A (en) * 2015-12-31 2017-07-07 远光软件股份有限公司 The method and device of reimbursement document Examination and approval
CN106228441A (en) * 2016-08-03 2016-12-14 北京天职信息技术有限公司西安分公司 Checking method is uploaded in a kind of network Fiscal reimbursement

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376221A (en) * 2014-11-21 2015-02-25 环境保护部南京环境科学研究所 Method for predicating skin permeability coefficients of organic chemicals
WO2017083675A1 (en) * 2015-11-13 2017-05-18 Biotheranostics, Inc. Integration of tumor characteristics with breast cancer index
CN105740984A (en) * 2016-02-01 2016-07-06 北京理工大学 Product concept performance evaluation method based on performance prediction
CN107104978A (en) * 2017-05-24 2017-08-29 赖洪昌 A kind of network risks method for early warning based on deep learning

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544385A (en) * 2018-11-07 2019-03-29 平安医疗健康管理股份有限公司 A kind of diagnosis and treatment authenticity detection method and system based on reimbursement data
CN109493245A (en) * 2018-11-07 2019-03-19 平安医疗健康管理股份有限公司 The risk management and control and relevant apparatus of medical insurance reimbursement data
CN109544385B (en) * 2018-11-07 2023-06-02 平安医疗健康管理股份有限公司 Diagnosis and treatment authenticity detection method and system based on reimbursement data
CN109522304A (en) * 2018-11-23 2019-03-26 中国联合网络通信集团有限公司 Exception object recognition methods and device, storage medium
TWI718690B (en) * 2018-12-14 2021-02-11 開曼群島商創新先進技術有限公司 Model merging method and device
CN109903165A (en) * 2018-12-14 2019-06-18 阿里巴巴集团控股有限公司 A kind of model merging method and device
CN109903165B (en) * 2018-12-14 2020-10-16 阿里巴巴集团控股有限公司 Model merging method and device
CN109816158A (en) * 2019-01-04 2019-05-28 平安科技(深圳)有限公司 Combined method, device, equipment and the readable storage medium storing program for executing of prediction model
CN109784343A (en) * 2019-01-25 2019-05-21 上海深杳智能科技有限公司 A kind of resource allocation methods and terminal based on deep learning model
CN109784343B (en) * 2019-01-25 2023-05-12 上海深杳智能科技有限公司 Resource allocation method and terminal based on deep learning model
CN110046229A (en) * 2019-04-18 2019-07-23 北京百度网讯科技有限公司 For obtaining the method and device of information
CN112084106B (en) * 2019-06-14 2023-08-01 中国移动通信集团浙江有限公司 Method and device for selecting test data, computing equipment and computer storage medium
CN112084106A (en) * 2019-06-14 2020-12-15 中国移动通信集团浙江有限公司 Test data selection method and device, computing equipment and computer storage medium
CN112785112A (en) * 2019-11-11 2021-05-11 华为技术有限公司 Risk rule extraction method and risk rule extraction device
CN111160662A (en) * 2019-12-31 2020-05-15 咪咕文化科技有限公司 Risk prediction method, electronic equipment and storage medium
CN113496302A (en) * 2020-04-02 2021-10-12 中国石油化工股份有限公司 Method and system for intelligently identifying and early warning drilling risks
CN113496302B (en) * 2020-04-02 2024-05-14 中国石油化工股份有限公司 Method and system for carrying out intelligent identification and early warning on drilling risk
CN113112352A (en) * 2021-05-27 2021-07-13 中国工商银行股份有限公司 Risk service detection model training method, risk service detection method and device
CN113485910A (en) * 2021-06-07 2021-10-08 广发银行股份有限公司 Test risk early warning method, system, equipment and storage medium
CN113254919A (en) * 2021-07-14 2021-08-13 杭州云信智策科技有限公司 Abnormal device identification method, electronic device, and computer-readable storage medium
CN113656558B (en) * 2021-08-25 2023-07-21 平安科技(深圳)有限公司 Method and device for evaluating association rule based on machine learning
CN113656558A (en) * 2021-08-25 2021-11-16 平安科技(深圳)有限公司 Method and device for evaluating association rule based on machine learning
CN113723800A (en) * 2021-08-27 2021-11-30 上海幻电信息科技有限公司 Risk identification model training method and device and risk identification method and device
CN113723800B (en) * 2021-08-27 2024-06-07 上海幻电信息科技有限公司 Risk identification model training method and device, and risk identification method and device
CN118114069A (en) * 2024-03-26 2024-05-31 北京合思信息技术有限公司 Matching degree risk prediction method and device based on linear regression model

Also Published As

Publication number Publication date
WO2019165673A1 (en) 2019-09-06

Similar Documents

Publication Publication Date Title
CN108364106A (en) A kind of expense report Risk Forecast Method, device, terminal device and storage medium
US10846620B2 (en) Machine learning-based patent quality metric
US20210365963A1 (en) Target customer identification method and device, electronic device and medium
CN111797320B (en) Data processing method, device, equipment and storage medium
CN110852881B (en) Risk account identification method and device, electronic equipment and medium
US20190114711A1 (en) Financial analysis system and method for unstructured text data
CN109685537B (en) User behavior analysis method, device, medium and electronic equipment
CN111639690A (en) Fraud analysis method, system, medium, and apparatus based on relational graph learning
CN110647995A (en) Rule training method, device, equipment and storage medium
CN111861521A (en) Data processing method and device, computer readable medium and electronic equipment
CN112836750A (en) System resource allocation method, device and equipment
CN116915710A (en) Traffic early warning method, device, equipment and readable storage medium
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
CN112734352A (en) Document auditing method and device based on data dimensionality
CN110569363A (en) Decision flow component generation method and device, electronic equipment and storage medium
CN112434862B (en) Method and device for predicting financial dilemma of marketing enterprises
CN114092230A (en) Data processing method and device, electronic equipment and computer readable medium
CN107644042B (en) Software program click rate pre-estimation sorting method and server
CN112632000A (en) Log file clustering method and device, electronic equipment and readable storage medium
CN118134652A (en) Asset configuration scheme generation method and device, electronic equipment and medium
Wirawan et al. Application of data mining to prediction of timeliness graduation of students (a case study)
CN108305174B (en) Resource processing method, device, storage medium and computer equipment
CN111753992A (en) Screening method and screening system
CN113849618B (en) Strategy determination method and device based on knowledge graph, electronic equipment and medium
CN115099934A (en) High-latency customer identification method, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180803

RJ01 Rejection of invention patent application after publication