CN116631572B

CN116631572B - Acute myocardial infarction clinical decision support system and device based on artificial intelligence

Info

Publication number: CN116631572B
Application number: CN202310908402.0A
Authority: CN
Inventors: 何昆仑; 孙宇慧
Original assignee: Chinese PLA General Hospital
Current assignee: Chinese PLA General Hospital
Priority date: 2023-07-24
Filing date: 2023-07-24
Publication date: 2023-12-19
Anticipated expiration: 2043-07-24
Also published as: CN116631572A

Abstract

The invention provides an artificial intelligence-based acute myocardial infarction clinical decision support system and equipment, and relates to the field of intelligent medical treatment. The system comprises: the device comprises an acquisition unit, a feature extraction unit, a subtype classification unit, a medication decision unit and a classification model construction unit. The unit is sequentially used for acquiring genetic information of a patient sample, extracting characteristics, inputting a classification model for classification, giving medication advice based on classification results and constructing a classification model. The invention discloses an acute myocardial infarction clinical decision support system, which gives medication advice based on genetic information of patients and provides molecular indexes for clinical diagnosis of acute myocardial infarction; the system classifies the data set by using a random forest classification algorithm with self-adaptive pruning optimization, improves the efficiency of clinical decision-making of acute myocardial infarction, and reduces the waiting time of patients and the workload of doctors.

Description

Acute myocardial infarction clinical decision support system and device based on artificial intelligence

Technical Field

The invention belongs to the field of intelligent medical treatment, and in particular relates to an artificial intelligence-based acute myocardial infarction clinical decision support system, equipment, a computer-readable storage medium and application thereof.

Background

Acute myocardial infarction (Acute Myocardial Infarction, abbreviated as AMI) is a clinical manifestation of myocardial infarction, a cardiovascular emergency, which is a manifestation of coronary heart disease. Compared with stable angina, acute myocardial infarction is characterized by sudden onset of myocardial ischemia necrosis caused by coronary artery spasm or blockage, accompanied by significant myocardial injury and cardiac dysfunction. The current clinical decision of acute myocardial infarction is mainly based on the age, sex, complications, new functions, symptoms, clinical manifestations, time windows, coronary artery disease conditions and the like of patients.

With the development of artificial intelligence and bioinformatics, it has become possible to collect and analyze clinical information at the molecular level of patients and to provide personalized clinical decisions to patients based on such information. The clinical information on the molecular level refers to personalized information obtained by analyzing data on the molecular level of a patient's genome, transcriptome, proteome, metabolome, and the like. By collecting and analyzing these data, we can better understand the patient's disease risk, pathology and therapeutic response, and thus formulate a more accurate and efficient treatment regimen for the patient.

Artificial intelligence plays a key role in this process. Through machine learning techniques, artificial intelligence can extract valuable information from a large number of molecular data and construct predictive models and classification algorithms to assist physicians in making clinical decisions. For example, based on genomic data of a patient, artificial intelligence may predict a patient's response to a certain drug, thereby guiding drug selection and dose adjustment. In addition, artificial intelligence can also help doctors discover new biomarkers for early diagnosis and prognosis of disease by integrating multiple types of molecular data.

Disclosure of Invention

According to the application, by means of the progress of artificial intelligence and bioinformatics technologies, clinical information of molecular layers of patients suffering from acute myocardial infarction is collected and analyzed, clinical responses of the patients to different medicines are predicted, personalized medication decisions are provided for the patients, so that diagnosis and treatment effects of diseases are improved, and life quality of the patients is improved.

The invention discloses an artificial intelligence-based acute myocardial infarction clinical decision support system, which comprises:

an acquisition unit for acquiring genetic information of a patient sample;

the characteristic extraction unit is used for extracting the characteristics of the genetic information to obtain genetic characteristics;

the subtype classification unit is used for classifying the patient based on the genetic characteristic input classification model and judging whether the patient is of a drug sensitive type or not;

a medication decision unit for giving medication advice based on the classification result of the patient, the medication advice being administration of a medication for myocardial infarction when the patient is medication sensitive;

the classification model construction unit is used for constructing the classification model, and the construction method of the classification model comprises the following steps:

acquiring genetic information and classification labels of acute myocardial infarction patients;

extracting the characteristics of the genetic information to obtain genetic characteristics;

inputting the genetic characteristics into a random forest for model construction to obtain a preliminary classification result, comparing the preliminary classification result with the classification labels to generate a loss function, and optimizing a machine learning model based on the loss function to obtain a trained classification model;

the random forest is a random forest which performs self-adaptive pruning on decision trees in a random forest algorithm.

Further, the process of adaptively pruning the decision tree comprises the following steps:

step 1: each internal node is selected layer by layer upwards from the leaf node;

step 2: calculating a loss function of the node rt;

step 3: calculating the overall loss function of the decision tree;

step 4: for each node, its loss function value on the validation set is calculated, and a decision is made as to whether to prune based on the loss function value.

Further, the loss function of the node rt is:

wherein C (rT) represents the training error of node rT, |rT _rt The number of leaf nodes of the node rt is represented by i, and α is a non-negative hyper-parameter for balancing training error and number of leaf nodes.

Further, the overall loss function of the decision tree is:

wherein, C (rT) represents the training error of the decision tree rT, and |rT| represents the number of leaf nodes of the decision tree rT, and the pruning degree is controlled by adjusting the super parameter alpha.

Further, the myocardial infarction medicine comprises any one or more of the following components: epinephrine, dobutamine, norepinephrine, aramine, isoproterenol.

Further, feature extraction is performed on the genetic information by using Lasso regression.

Further, the genetic characteristics include one or more of the following genes: NISCH, PLAGL2, TLR1, ARID5B, TAF9B, SERINC, CARKD, EPG5, ANKLE2, BC043227, FAM188A, ervh.4, ZNF26, ERO1LB, TMEM208, EXOC2, SGMS1, BRPF3.

Further, the genetic characteristics are NISCH and PLAGL2, and the patients are classified based on the NISCH and PLAGL2, and whether the patients are drug sensitive or not is determined.

An artificial intelligence based acute myocardial infarction clinical decision support device, the device comprising a memory and a processor;

the memory is used for storing program instructions;

the processor is used for calling program instructions, when the program instructions are executed, the processor is used for executing the following artificial intelligence-based acute myocardial infarction clinical decision support method, which comprises the following steps of:

obtaining genetic information of a patient sample;

inputting the genetic characteristics into a classification model to classify the patient, and judging whether the patient is drug sensitive or not;

dosing advice based on the classification of the patient, the dosing advice being for administration of a myocardial infarction drug therapy when the patient is drug sensitive;

the method for constructing the classification model comprises the following steps:

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements an artificial intelligence based acute myocardial infarction clinical decision support method in an artificial intelligence based acute myocardial infarction clinical decision support device.

The invention has the advantages that:

1. the application discloses an acute myocardial infarction clinical decision support system based on artificial intelligence, this system is through acquireing patient's genetic information and carrying out the feature extraction, classifies the patient based on the genetic feature that draws, judges whether the patient is the drug sensitive, compares traditional clinical judgement method, and this system can judge whether the patient is fit for myocardial infarction medication based on patient's genetic information.

2. The system disclosed by the application can give personalized medication advice according to the classification result of the patient. When the patient is drug sensitive, the system recommends administration of myocardial infarction drug treatment, so that the treatment effect is improved, and the personalized drug advice can better meet the specific requirements of the patient and avoid unnecessary drug treatment.

3. The decision tree in the random forest classification algorithm used by the system disclosed by the application is subjected to self-adaptive pruning optimization, has the capability of fast decision making, can classify patients in a short time and give medication suggestions, and the improvement of the algorithm helps the decision support system to improve the efficiency of clinical decision making and reduce the waiting time of the patients and the workload of doctors.

4. The system disclosed by the application classifies patients through the machine learning model, can automatically learn and optimize the model, has very high expandability and flexibility, can add more genetic characteristics and genes according to the needs, adjusts the algorithm of the machine learning model, continuously optimizes and improves the system, and adapts to different clinical demands and research progress.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an artificial intelligence-based acute myocardial infarction clinical decision support system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram showing correlation analysis of 18 important genes according to an embodiment of the present invention;

FIG. 3 is a box diagram showing the difference of gene expression levels of genes NISCH and PLAGL2 in a control group, a drug sensitive group and a drug tolerant group according to an embodiment of the present invention, wherein A is the gene NISCH and B is the gene PLAGL2;

FIG. 4 is a schematic diagram of an illustration of a depth forest classification model prior to modification provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of an improved deep forest classification model diagram provided by an embodiment of the present invention.

FIG. 6 is a schematic diagram of an artificial intelligence-based acute myocardial infarction clinical decision support apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic flow chart of an artificial intelligence-based acute myocardial infarction clinical decision support method according to an embodiment of the present invention;

Detailed Description

In order to enable those skilled in the art to better understand the present invention, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present invention with reference to the accompanying drawings.

In some of the flows described in the specification and claims of the present invention and in the above figures, a plurality of operations appearing in a particular order are included, but it should be clearly understood that the operations may be performed in other than the order in which they appear herein or in parallel, the sequence numbers of the operations such as S101, S102, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments according to the invention without any creative effort, are within the protection scope of the invention.

Fig. 1 is an artificial intelligence-based acute myocardial infarction clinical decision support system provided in an embodiment of the present invention, including:

s101: and the acquisition unit is used for acquiring the genetic information of the patient sample.

In one embodiment, the sample comprises any one or more of the following: blood, tissue, saliva; the genetic information comprises any one or more of the following: genotype, genetic variation information, gene expression information, and epigenetic information.

The genetic variation information refers to the variation of the genome of an individual, which is inconsistent with the normal gene sequence and possibly related to disease susceptibility and drug sensitivity, and the genetic variation mainly comprises the following categories:

1. single nucleotide polymorphism (Single Nucleotide Polymorphism, SNP): SNPs are the most common type of genetic variation, meaning that at least two different nucleotides are present at one nucleotide position in the genome.

2. Indel variation (Insertion/Deletion, indel): indel refers to a variation in the occurrence of inserted or deleted bases in the genome, resulting in a change in the length of the gene sequence.

3. Copy number variation (Copy Number Variation, CNV): CNV refers to the change in copy number of a DNA sequence in the genome. It can lead to an increase or decrease in the copy number of the gene, thereby having an important effect on gene expression.

4. Gene rearrangement (Gene Rearrangement): gene rearrangement refers to the cleavage and recombination of certain gene segments in the genome, resulting in changes in the position, order and orientation of the genes.

5. Gene amplification (Gene Amplification): gene amplification refers to an increase in the number of copies of a gene in the genome, resulting in an increase in the expression of the gene.

Epigenetic information refers to genetic changes occurring in the genome that do not involve DNA sequence changes, affecting cellular and individual functions by affecting gene expression and regulation. The following are several main types of epigenetic information:

1. gene amplification (Gene Amplification): gene amplification refers to an increase in the number of copies of a gene in the genome, resulting in an increase in the expression of the gene. Gene amplification is closely related to the occurrence and development of tumors.

2. Histone modification: histones are the major components of chromatin and can regulate gene expression through a variety of chemical modifications. These modifications include acetylation, methylation, phosphorylation, ubiquitination, and the like. Different modifications can affect the structure and compactness of chromatin, thereby regulating accessibility and transcriptional activity of genes.

3. Non-coding RNA: non-coding RNA (ncRNA) refers to RNA molecules that are produced during transcription that do not participate in the coding of proteins. These ncRNA molecules can regulate gene expression by a variety of mechanisms, including miRNA, siRNA, lncRNA and the like. They can interact with DNA, RNA and proteins, and participate in transcription regulation, RNA splicing, transport, translation and stability processes.

4. Chromatin conformation: chromatin structure refers to the structure and organization of chromatin in three dimensions. The three-dimensional structure of chromatin can regulate gene expression through chromatin interactions, chromatin loops, and formation of chromatin domains. Changes in chromatin conformation can affect gene accessibility and transcription efficiency.

In a specific embodiment, GSE66360 dataset is downloaded from NCBI, which includes blood samples from 49 myocardial patients and 50 healthy control group samples. The platform annotation file GPL570 and the R language AnnoProbe package (https:// github. Com/ableno/AnnoProbe) are utilized to annotate, the average value of genes corresponding to a plurality of probes is taken as the gene expression quantity, and after the processing, the expression matrix of the 21655 genes of 99 samples is obtained.

S102: and the characteristic extraction unit is used for extracting the characteristics of the genetic information to obtain the genetic characteristics.

In one embodiment, the genetic information is characterized using Lasso regression.

Lasso regression is a linear regression method used for feature selection and regularization. Lasso regression achieves feature selection by adding an L1 regularization term to the loss function, i.e., by punishing coefficients such that coefficients of some features become 0, thereby achieving sparsity constraints on the features. The Lasso regression has the advantage that important features can be automatically selected, and the complexity and generalization error of the model are reduced. Meanwhile, lasso regression can solve the problem of multiple collinearity, namely when high correlation exists between the features, the coefficient of one of the features can be set to 0 through L1 regularization, so that redundancy is reduced.

In one embodiment, the genetic trait comprises one or more of the following genes: NISCH, PLAGL2, TLR1, ARID5B, TAF9B, SERINC, CARKD, EPG5, ANKLE2, BC043227, FAM188A, ervh.4, ZNF26, ERO1LB, TMEM208, EXOC2, SGMS1, BRPF3.

Wherein, the NISCH codes a protein related to neurons and participates in the development and functional regulation of neurons. PLAGL2 encodes a transcription factor protein involved in biological processes such as embryonic development, cell proliferation and differentiation. TLR1 encodes Toll-like receptor 1, and is involved in the immune response of the immune system, recognizing and responding to signals from bacteria and other microorganisms. ARID5B encodes a transcriptional regulator involved in biological processes such as embryonic development, cell proliferation and differentiation. TAF9B encodes a transcription factor protein involved in transcriptional regulation of genes. SERINC1 encodes a membrane protein involved in the metabolic and functional regulation of cell membranes. CARKD encodes a ketoacid kinase involved in cellular energy metabolism. EPG5 encodes a protein that is involved in the autophagy process of cells. ANKLE2 encodes a protein involved in cell division and chromosomal stability. BC043227 encodes a long non-coding RNA. ERVH.4 encodes an endogenous retrovirus. ZNF26 encodes a zinc finger transcription factor involved in transcriptional regulation of genes. ERO1LB encodes a protein and is responsible for regulating the oxidative folding process of the protein in the cell. TMEM208 encodes a membrane protein. EXOC2 encodes a protein that is involved in the regulation of cellular secretory pathways. SGMS1 encodes sphingolipid synthase a, involved in the metabolic and functional regulation of cell membranes. BRPF3 encodes a protein involved in transcriptional regulation of genes.

In one specific embodiment, 49 myocardial infarction patients were sub-group partitioned using consensus cluster analysis of the NMF algorithm. The purpose of NMF is to identify potential features in a gene expression profile by decomposing the original matrix into two non-negative matrices. The integrated correlation coefficient is used to determine the optimal k value. The optimal total number of clusters is set to k=2 (the two subgroups are called drug sensitive and drug resistant, respectively). For example, when k=2, the consensus matrix heat map maintains a clear and sharp boundary, indicating that the sample has stable and robust clusters. Wherein, the drug sensitive group and the cultureB subgroup respectively comprise 15 and 34 myocardial infarction patients.

Differential expression analysis was performed on the two disease subgroups obtained using limma in R4.0.5 language, differential expression mRNA screening criteria: after correction pvalue < 0.05, |logfc| >1, a total of 8482 differentially expressed mRNA were obtained. The Lasso regression analysis is used for carrying out high-dimensional variable screening and feature selection on 8482 differential expression genes, and 18 important features are screened out. Correlation analysis was performed on 18 important genes using the corrgram package of R language, and the results are shown in fig. 2. The 18 important genes obtained above are subjected to machine learning by utilizing a random forest package of R language so as to screen optimal biomarkers, a classification model is constructed through a random forest algorithm, and the 18 mRNA are sequenced in importance according to Mean Decrease Accuary values from large to small, wherein the sequence is as follows: NISCH, PLAGL2, TLR1, ARID5B, TAF9B, SERINC, CARKD, EPG5, ANKLE2, BC043227, FAM188A, ervh.4, ZNF26, ERO1LB, TMEM208, EXOC2, SGMS1, BRPF3.

S103: and the subtype classification unit is used for classifying the patient based on the genetic characteristics and judging whether the patient is of a drug sensitive type.

In one embodiment, the genetic characteristic is NISCH and PLAGL2, and the patient is classified based on the NISCH and PLAGL2 to determine whether the patient is drug sensitive.

In a specific embodiment, 1 mRNA is added sequentially from top to bottom according to the sequence of the random forest sequencing result, and the random forest (R language random forest package) algorithm is used for classifying, and the ten fold cross-validation (10-fold cross-validation) process is used for solving the accuracy and the AUC. As can be seen from the figure, AUC values and accuracy reach maximum when the number of mrnas reaches 2, so the first 2 mrnas are chosen as optimal biomarkers. In one embodiment, the optimal biomarkers are NISCH and PLAGL2.

A support vector machine (R language e1071 package) was applied to the 2 mRNAs screened above to construct a diagnostic model, and a random forest algorithm was used to construct a classification model, which was compared to the model, and the AUC of the 2 models was found to be high. The ROC curve of the classifier and the ROC curve of the model were diagnosed with two subtypes of 2 genes alone, the difference in gene expression level between the three groups of the two genes is shown in fig. 3, a in fig. 3 is a box plot of the difference in gene expression level between the three groups of the gene NISCH, and B in fig. 3 is a box plot of the difference in gene expression level between the three groups of the gene PLAGL2.

S104: and the medication decision unit is used for giving medication advice based on the classification result of the patient, and when the patient is drug sensitive, the medication advice is used for giving myocardial infarction drug treatment.

In one embodiment, the myocardial infarction drug comprises any one or more of the following: epinephrine, dobutamine, norepinephrine, aramine, isoproterenol.

Among the 5 drugs, epiephrine refers to Epinephrine which enhances myocardial contractility and heart rate by activating beta-adrenergic receptors during the emergency of myocardial infarction to restore cardiac function. Dobutamine refers to Dobutamine, a positive inotropic drug commonly used in the treatment of cardiac insufficiency, including myocardial infarction, which increases myocardial contractility by activating β1-adrenergic receptors, helping to improve cardiac pumping function. Norepinephrine refers to Norepinephrine, which is commonly used to treat hypotension and shock in patients with myocardial infarction by activating alpha adrenergic receptors to constrict blood vessels, increasing blood pressure and tissue perfusion. Aramine refers to Alamin, an alpha adrenergic receptor agonist, useful for treating hypotension caused by myocardial infarction, by constricting blood vessels, increasing blood pressure and tissue perfusion. Isoproterenol refers to isoprenaline, a beta-adrenergic receptor agonist, which can be used to treat cardiac arrest and arrhythmias caused by myocardial infarction, and which helps to improve cardiac function by increasing myocardial contractility and heart rate.

In one embodiment, drug target retrieval is performed on the 8 drugs Dopamine, epinephrine, dobutamine, norepinephrine, aramine, levosimendan, isoproterenol, milrinone in the iLINCS database, and the corresponding action targets of the drugs are retrieved with reference to table 1.

Table 1 drug target correspondence table

Among the 8 drugs, 1. Dopamine refers to Dopamine, a neurotransmitter, which can also be used as a drug. It produces effects primarily through interactions with dopamine receptors. Dopamine receptors include the D1, D2, D3, D4 and D5 subtypes. Dopamine plays a role in regulating movement, emotion, rewarding and the like in the central nervous system, and also participates in regulating the cardiovascular system. 2. Epineephrine refers to Epinephrine, an important hormone and neurotransmitter, which plays a critical role in the stress response of the body. Epinephrine produces effects primarily through interactions with alpha-adrenergic receptors and beta-adrenergic receptors. The actions of epinephrine include increasing heart rate, promoting myocardial contraction, dilating bronchi, increasing blood glucose concentration, etc. 3. Dobutamine refers to Dobutamine, a β1-adrenergic receptor agonist. It can be used for treating cardiovascular diseases such as heart failure and myocardial ischemia by increasing myocardial contractility and cardiac output through binding to beta 1-adrenergic receptor. 4. Norepinephrine refers to Norepinephrine, an important hormone and neurotransmitter, similar to epinephrine. It produces effects primarily through interactions with alpha-adrenergic receptors and beta-adrenergic receptors. Norepinephrine effects include vasoconstriction, increased heart rate, and contractility, among others. 5. Armine refers to alamine, a potent alpha adrenergic receptor agonist. It is mainly used for treating hypotension and shock by constricting blood vessels through combining with alpha-adrenergic receptor. 6. Levosimendan refers to Levosimendan, a calpain regulator, for use in enhancing myocardial contractility. It can enhance myocardial contractility by increasing sensitivity of myocardial cells to calcium ions, and improve heart failure and myocardial ischemia. 7. Isoproterenol refers to isoprenaline, a non-selective beta-adrenergic receptor agonist. It dilates bronchi, increases heart rate and myocardial contractility, etc. by binding to beta-adrenergic receptors. 8. Milrinone refers to Milrinone, a phosphodiesterase III inhibitor, for use in enhancing myocardial contractility. It can increase intracellular cyclic adenosine monophosphate (cAMP) level, enhance myocardial contractility, and improve heart failure by inhibiting phosphodiesterase III activity.

Differential expression analysis was performed on the two disease subgroups obtained using limma in R4.0.5 language, respectively, against healthy controls, differentially expressing mRNA screening criteria: after correction, pvalue < 0.05, |logfc| >1, drug-sensitive group yielded 6065 differentially expressed mRNA and drug-sensitive group yielded 618 differentially expressed mRNA.

The wien diagram decomposition result of the obtained differential expression genes and the drug targets shows that the target genes of the Epinephrine, dobutamine, norepinephrine, aramine, isoproterenol five drugs and the differential expression genes of the drug sensitive group have intersection. ADRB2 and ADRA1A are acting targets of the Epinethrine, and the two genes are different from a normal control group in a drug sensitive group, and the drug administration proposal is to administer the drug Epinethrine treatment of myocardial infarction to patients in the drug sensitive group. Similarly, ADRB2 is an action target of Dobutamine and isoprotenol, and the gene has obvious difference in a drug sensitive group and a normal control group, and the drug administration proposal is to administer the drug Dobutamine and isoprotenol to patients in the drug sensitive group. ADRA1A is an action target of Norepiephrine and aromatic, and the gene is different from a normal control group in a drug sensitive group, and the drug administration is recommended to administer the Norepiephrine and aromatic treatment of myocardial infarction to patients in the drug sensitive group. In summary, the patients in the drug sensitive group can be treated by any one or more of the following myocardial infarction drugs: epinephrine, dobutamine, norepinephrine, aramine, isoproterenol.

S105: the classification model construction unit is used for constructing the classification model, and the construction method of the classification model comprises the following steps:

In one embodiment, the process of adaptively pruning the decision tree comprises:

step 2: calculating a loss function of the node rt;

step 3: calculating the overall loss function of the decision tree;

In a specific embodiment, the loss function of the node rt is:

where C (rt) represents the training error of node rt, |rtrt| represents the number of leaf nodes of node rt, and α is a non-negative super parameter for balancing the training error and the number of leaf nodes.

The overall loss function of the decision tree is:

In one embodiment, the classification label is a drug sensitive classification result of whether a drug or drugs is effective, specifically, the drug or drugs include any one or more of the following: dopamine, epinephrine, dobutamine, norepinephrine, aramine, levosimendan, isoproterenol, milrinone.

Random Forest (Random Forest) is an integrated learning algorithm that classifies and regresses by building multiple decision trees. It builds a decision tree by randomly selecting a subset of features and samples and predicts by voting or averaging. The random forest is characterized by higher accuracy and robustness, and can process high-dimensional data and large-scale data sets.

In one embodiment, after feature selection is complete, the resulting sample features are used to train an improved deep forest classification model.

The deep forest classification model is connected by adopting a cascade structure, and each layer of forest is the integration of decision trees. Deep forest classificationThe model automatically determines the layer number of the depth forest cascade, and the method for automatically determining the layer number of the depth forest cascade comprises the following steps: each forest generates class vectors through k-fold cross validation, namely each sample data is used as a training sample for k-1 times, k-1 class vectors are generated, validation data are obtained according to images to be classified, when a new layer of forest is generated through expansion, the performance of the whole depth forest frame can be evaluated according to the validation data, and if the performance of the whole depth forest frame is not obviously improved, the number of layers of the depth forest is not increased any more. In the training process of the depth forest classification model, each layer is usedRandom forest sum->A completely random forest, the first layer will produce +.>The generated feature vector will be connected to the original feature vector and input to the next stage, wherein +.>The number of categories; each layer thereafter is similar to the next to last layer.

A decision tree is a tree structure in which each internal node represents a test on an attribute, each branch represents a test output, each leaf node represents a class, and a decision tree is a predictive model representing a mapping between an object attribute and an object value.

The random forest is a classifier for training and predicting sample data by utilizing a plurality of decision trees, and the generation method of each tree is that randomly selecting in the whole characteristic spaceA feature as a candidate feature, wherein +.>For inputting the number of featuresThe feature with the best Gini value is then selected as the splitting feature of the node.

The complete random forest is a classifier for training and predicting samples by utilizing a plurality of decision trees, and the generation method of each tree in the complete random tree forest is that 1 feature is randomly selected in the whole feature space to serve as a splitting feature of a node.

The Gini value refers to the coefficient of radix, and the Gini index is used to measure the data uncertainty or uncertainty in the CART algorithm of the decision tree, and the Gini index is used to determine the optimal binary value of the class variable.

After the training of the depth forest classification model is completed, testing the image which is actually required to be processed by using the trained depth forest classification model, and obtaining a classification result of the target domain image; taking the maximum value in the average value of the results of the last layer of forest in the depth forest classification model as the category corresponding to the classification result output by the depth forest classification model:

wherein,for depth forests each layer contains the number of forests, < >>For the number of categories of the dataset, +.>Class for classification of dataset +.>Classification results output for the deep forest classification model, < +.>For the maximum value in the average value of the results of the last layer of forests in the depth forest classification model, +.>Is the average value of the results of the last layer of forests in the depth forests classification model.

A schematic diagram of the depth forest classification model diagram before improvement is shown in FIG. 4

The traditional random forest algorithm consists of a random forest and a completely random forest, but the two forests cannot be self-adaptive pruning technology, the generalization capability of the model is low, and the phenomenon of over-fitting is easy to generate.

Based on the method, in the improved deep forest, the self-adaptive pruning random forest is utilized to replace the completely random forest.

Specifically, the adaptive pruning random forest classifier improves the generalization capability of the model and reduces the overfitting phenomenon by pruning each decision tree.

Assume that for a training set containing rN samples, each sample is represented by rM features, for a total of rC categories. The training set is expressed as:

wherein r isFeature vector representing the i-th sample, +.>Representing the corresponding category.

When the random forest is constructed, the random forest is formed by rT decision trees, and the construction process of each decision tree is as follows:

1. constructing a new data set with rN size from the training set rD using a sample-with-put-back approach。

2. And randomly selecting rM features from the rM features for constructing a decision tree.

3. UsingAnd constructing a decision tree from the selected rm features. The construction process may use CART algorithm or other decision tree construction algorithm.

4. And carrying out self-adaptive pruning on the constructed decision tree so as to improve the generalization capability of the model. Specific:

1) Pruning:

pruning is performed on each internal node, starting from the leaf node and going up layer by layer. For each node, calculating the loss function value of the node on the verification set, and if the loss function value is smaller after pruning, performing pruning operation.

2) Loss function:

the loss function defining node rt is:

wherein,training error representing node rt, ++>Leaf node number representing node rt, +.>Is a non-negative super-parameter for balancing the training error and the number of leaf nodes.

3) Pruning strategies:

defining the overall loss function of the decision tree as:

wherein,training error representing decision tree rT, +.>The number of leaf nodes of the decision tree rT is represented. By adjusting superparameter->The pruning degree can be controlled.

In addition, the traditional random forests are cascaded layer by layer, so that redundancy of features is easily caused, and the training speed is influenced. The improved depth forest provided by the invention is cascaded with the original input only in the last layer, so that the redundancy degree of data is greatly reduced, and the schematic diagram of the improved depth forest classification model is shown in fig. 5.

Fig. 6 is an artificial intelligence-based acute myocardial infarction clinical decision support apparatus according to an embodiment of the present invention, including: a memory and a processor;

the memory is used for storing program instructions;

the processor is used for calling program instructions, when the program instructions are executed, the method is used for executing the following artificial intelligence-based acute myocardial infarction clinical decision support method, the flow chart of the method is shown in fig. 7, and the method comprises the following steps of:

s701: obtaining genetic information of a patient sample;

s702: extracting the characteristics of the genetic information to obtain genetic characteristics;

s703: classifying the patient based on the genetic characteristics, and judging whether the patient is drug sensitive;

s704: dosing advice based on the classification of the patient, the dosing advice being for administration of a myocardial infarction drug therapy when the patient is drug sensitive;

s705: the method for constructing the classification model comprises the following steps:

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the above-described cardiogenic shock staging-based treatment aid decision making method.

The results of the verification of the present verification embodiment show that assigning an inherent weight to an indication may moderately improve the performance of the present method relative to the default settings.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in implementing the methods of the above embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, where the storage medium may be a read only memory, a magnetic disk or optical disk, etc.

While the foregoing describes a computer device provided by the present invention in detail, those skilled in the art will appreciate that the foregoing description is not meant to limit the invention thereto, as long as the scope of the invention is defined by the claims appended hereto.

Claims

1. An artificial intelligence-based acute myocardial infarction clinical decision support system, the system comprising:

an acquisition unit for acquiring genetic information of a patient sample;

inputting the genetic features into a deep forest for model construction to obtain a preliminary classification result, comparing the preliminary classification result with the classification labels to generate a loss function, and optimizing a machine learning model based on the loss function to obtain a trained classification model;

the depth forest is cascaded with the original input only in the last layer, and self-adaptive pruning is carried out on decision trees in a depth forest algorithm;

the process of self-adaptive pruning of the decision tree comprises the following steps:

step 2: calculating a loss function of the node rt;

step 3: calculating the overall loss function of the decision tree;

2. The artificial intelligence based acute myocardial infarction clinical decision support system as set forth in claim 1 wherein the loss function of the node rt is:

3. The artificial intelligence based acute myocardial infarction clinical decision support system as set forth in claim 1 wherein the overall loss function of the decision tree is:

4. The artificial intelligence-based acute myocardial infarction clinical decision support system as set forth in claim 1, wherein the myocardial infarction medicine includes any one or more of the following: epinephrine, dobutamine, norepinephrine, aramine, isoproterenol.

5. The artificial intelligence-based acute myocardial infarction clinical decision support system as set forth in claim 1, wherein the genetic information is characterized by Lasso regression.

6. The artificial intelligence based acute myocardial infarction clinical decision support system as set forth in claim 1 wherein the genetic signature includes one or more of the following genes: NISCH, PLAGL2, TLR1, ARID5B, TAF9B, SERINC, CARKD, EPG5, ANKLE2, BC043227, FAM188A, ervh.4, ZNF26, ERO1LB, TMEM208, EXOC2, SGMS1, BRPF3.

7. The artificial intelligence based acute myocardial infarction clinical decision support system as set forth in claim 1 wherein the genetic features are NISCH and PLAGL2 and the patient is classified based on the NISCH and PLAGL2 to determine if the patient is drug sensitive.

8. An artificial intelligence-based acute myocardial infarction clinical decision support device, characterized in that the device comprises a memory and a processor;

the memory is used for storing program instructions;

obtaining genetic information of a patient sample;

the depth forest is cascaded with the original input only in the last layer, and random forest of self-adaptive pruning is carried out on decision trees in the depth forest algorithm;

step 2: calculating a loss function of the node rt;

step 3: calculating the overall loss function of the decision tree;

9. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the artificial intelligence based acute myocardial infarction clinical decision support method in the artificial intelligence based acute myocardial infarction clinical decision support apparatus of claim 8.