CN112908487B - Automatic identification method and system for updated content of clinical guideline - Google Patents
Automatic identification method and system for updated content of clinical guideline Download PDFInfo
- Publication number
- CN112908487B CN112908487B CN202110418664.XA CN202110418664A CN112908487B CN 112908487 B CN112908487 B CN 112908487B CN 202110418664 A CN202110418664 A CN 202110418664A CN 112908487 B CN112908487 B CN 112908487B
- Authority
- CN
- China
- Prior art keywords
- guideline
- clinical
- module
- clinical guideline
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Epidemiology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention provides an automatic identification method and system for updating content of clinical guideline, wherein the method comprises the following steps: according to a module hierarchical structure tree established by utilizing each level of title of the clinical guideline in advance, respectively analyzing and structurally extracting a first clinical guideline and a second clinical guideline to at least obtain a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline; and determining difference information between the first guide module and the second guide module, marking corresponding labels at positions corresponding to the difference information in the first clinical guide and the second clinical guide respectively, and manually consulting the two clinical guides to be compared to find differences and changes among different clinical guides, so that the efficiency and accuracy of determining the differences and changes among different clinical guides are improved.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to an automatic identification method and system for updated contents of clinical guidelines.
Background
With the expansion of clinical research scope (such as tumor clinical research scope) and innovation of clinical diagnosis and treatment technology, new medical evidence is iterated continuously, and the update frequency of clinical guidelines is accelerated in this case.
At present, more new versions of clinical guidelines cannot give updated instructions, and even if updated instructions are given, knowledge differences and accurate positions of the knowledge differences between the new and old versions of clinical guidelines cannot be intuitively given in the updated instructions. The clinician is required to manually consult the new and old versions of clinical guidelines to find the differences and changes among the different versions of clinical guidelines, but the method is required to consume a great deal of manpower and time, and the manual consulting is easy to cause missed check and other conditions, so that the efficiency and the accuracy of determining the differences and changes among the different versions of clinical guidelines are lower.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a method and a system for automatically identifying updated contents of clinical guidelines, so as to solve the problems of low efficiency and low accuracy existing in the current method for determining differences and changes between different clinical guidelines.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
an embodiment of the invention in a first aspect discloses an automatic identification method for updating content of clinical guideline, which comprises the following steps:
according to a module hierarchical structure tree established by utilizing each level of title of a clinical guideline in advance, respectively analyzing and structuring and extracting a first clinical guideline and a second clinical guideline to at least obtain a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, wherein the first guideline module is text content contained in a minimum level title in the first clinical guideline, and the second guideline module is text content contained in the minimum level title in the second clinical guideline;
If the first clinical guideline and the second clinical guideline belong to the same source, and when the first clinical guideline has updated description relative to the second clinical guideline, determining first difference information between the first clinical guideline and the second clinical guideline by using the updated description of the first clinical guideline, and marking corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline respectively, wherein the labels are added labels, deleted labels or modified labels;
if the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source and the updated description does not exist in the first clinical guideline, matching the first guideline module with the second guideline module, wherein the matched first guideline module and second guideline module are respectively used as a first guideline module to be processed and a second guideline module to be processed, and the first guideline module which is not matched with all the second guideline modules is used as a third guideline module to be processed;
Determining second difference information between the first clinical guideline and the second clinical guideline according to the first guideline to be processed module and the second guideline to be processed module, and marking corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively;
and determining third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and marking corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively.
Preferably, the process of matching the first guide module and the second guide module includes:
aiming at each first guide module, determining the similarity of the titles of the first guide modules and the titles of each second guide module by using a preset depth semantic matching model;
for each first guide module, if all the title similarities are smaller than a title similarity threshold, determining that the first guide module is not matched with all the second guide modules, and if at least one of the title similarities is larger than or equal to the title similarity threshold, determining that the first guide module is matched with the second guide module corresponding to the maximum title similarity.
Preferably, the determining second difference information between the first clinical guideline and the second clinical guideline according to the first to-be-processed guideline module and the second to-be-processed guideline module, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline, respectively, includes:
sentence dividing processing is carried out on the first to-be-processed guide module and the second to-be-processed guide module respectively to obtain a plurality of first sentences corresponding to the first to-be-processed guide module and a plurality of second sentences corresponding to the second to-be-processed guide module;
for the m first sentences of the first to-be-processed guide modules, calculating the sentence similarity between the m first sentences and H second sentences of the second to-be-processed guide modules, wherein m is an integer greater than or equal to 1 and less than or equal to x, x is the total number of the first sentences contained in the first to-be-processed guide modules, m starts from 1 and increases by 1, H is an integer greater than or equal to 1 and less than or equal to y, and y is the total number of the second sentences contained in the second to-be-processed guide modules;
if the sentence similarity between the mth first sentence and the nth second sentence is equal to 1, determining that the mth first sentence and the nth second sentence are the same, and not executing marking processing, wherein n is an integer greater than or equal to 1 and less than or equal to y;
If the sentence similarity between the m-th first sentence and the n-th second sentence is greater than or equal to a sentence similarity threshold and less than 1, marking a modification label at a position corresponding to the m-th first sentence in the first clinical guideline, marking a modification label at a position corresponding to the n-th second sentence in the second clinical guideline, and when n is greater than m, determining a third sentence which is positioned before the n-th second sentence in the second guideline module to be processed, has sentence similarity smaller than the sentence similarity threshold and is not subjected to marking processing with the first sentence, and marking a deletion label at a position corresponding to the third sentence in the second clinical guideline;
if the sentence similarity between the mth first sentences and the H second sentences is smaller than the sentence similarity threshold, marking a new label at a position corresponding to the mth first sentences in the first clinical guideline.
Preferably, the determining third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and labeling corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline, respectively, includes:
Calculating the similarity of the first sentence between the first P% content of the first sentence in the third to-be-processed guide module and a plurality of second sentences of each second guide module;
if at least one first sentence similarity is larger than a first sentence similarity threshold, determining that the third to-be-processed guide module is matched with the second guide module corresponding to the maximum first sentence similarity;
starting from the first P% content of the first sentence of the third pending guideline module, changing the existing label in the first clinical guideline located behind the first P% content into a modified label;
and starting from a second sentence corresponding to the maximum first sentence similarity in the second guideline module matched with the third guideline module to be processed, and changing the existing label in the second clinical guideline positioned behind the second sentence into a modified label.
Preferably, the analyzing and the structured extracting the first clinical guideline and the second clinical guideline respectively, after at least obtaining a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, further includes:
and respectively preprocessing the first guide module and the second guide module, and respectively extracting knowledge features in the preprocessed first guide module and the preprocessed second guide module.
Preferably, the labeling of the modification tag at the position corresponding to the m-th first sentence in the first clinical guideline, and labeling of the modification tag at the position corresponding to the n-th second sentence in the second clinical guideline further includes:
comparing the difference between the knowledge features in the m-th first sentence and the n-th second sentence to obtain knowledge feature difference information;
and labeling corresponding labels at positions corresponding to the knowledge characteristic difference information in the mth first sentence and the nth second sentence respectively.
Preferably, the method further comprises:
and displaying the labels in different categories by using different display forms respectively.
Preferably, the method further includes, after respectively parsing and structurally extracting the first clinical guideline and the second clinical guideline, at least obtaining a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, further including:
and carrying out standardization processing on the first guide module and the second guide module.
Preferably, the method further includes, after respectively parsing and structurally extracting the first clinical guideline and the second clinical guideline, at least obtaining a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, further including:
Storing the first and second guideline modules into a database, storing all hierarchical relationships between the first guideline modules into the database, and storing all hierarchical relationships between the second guideline modules into the database.
A second aspect of an embodiment of the present invention discloses an automatic identification system of updated contents of clinical guideline, the system comprising:
the analysis unit is used for respectively analyzing and structurally extracting a first clinical guideline and a second clinical guideline according to a module hierarchical structure tree established by utilizing each level of the clinical guideline in advance, at least obtaining a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, wherein the first guideline module is text content contained in a minimum level of the first clinical guideline, and the second guideline module is text content contained in the minimum level of the second clinical guideline;
the first processing unit is used for determining first difference information between the first clinical guideline and the second clinical guideline by utilizing the updated description of the first clinical guideline if the first clinical guideline and the second clinical guideline belong to the same source and when the updated description exists in the first clinical guideline relative to the second clinical guideline, and labeling corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline respectively, wherein the labels are newly added labels, deleted labels or modified labels;
The second processing unit is configured to match the first guideline module with the second guideline module if the first clinical guideline belongs to a different source from the second clinical guideline, or match the first guideline module with the second guideline module if the first clinical guideline belongs to a same source from the second clinical guideline and the first clinical guideline does not have the updated description, and respectively use the matched first guideline module and second guideline module as a first guideline module to be processed and a second guideline module to be processed, and use the first guideline module that is not matched with all the second guideline modules as a third guideline module to be processed;
the third processing unit is used for determining second difference information between the first clinical guideline and the second clinical guideline according to the first guideline to be processed and the second guideline to be processed, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively;
and the fourth processing unit is used for determining third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and labeling corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively.
Based on the automatic identification method and system of the updated content of the clinical guideline provided by the embodiment of the invention, the method comprises the following steps: according to a module hierarchical structure tree established by utilizing each level of title of the clinical guideline in advance, respectively analyzing and structurally extracting a first clinical guideline and a second clinical guideline to at least obtain a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline; if the first clinical guideline and the second clinical guideline belong to the same source, and when the first clinical guideline has updated description relative to the second clinical guideline, determining first difference information between the first clinical guideline and the second clinical guideline by using the updated description of the first clinical guideline, and marking corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline respectively; if the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source and the updating description does not exist in the first clinical guideline, the first guideline module and the second guideline module are matched, the matched first guideline module and second guideline module are respectively used as a first guideline module to be processed and a second guideline module to be processed, and the first guideline module which is not matched with all the second guideline modules is used as a third guideline module to be processed; determining second difference information between the first clinical guideline and the second clinical guideline according to the first guideline module to be processed and the second guideline module to be processed, and marking corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively; and determining third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and marking corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively. The difference and the change among different clinical guidelines are found without manually consulting two clinical guidelines which need to be compared, so that the efficiency and the accuracy of determining the difference and the change condition among different clinical guidelines are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for automatically identifying updated content of a clinical guideline provided by an embodiment of the present invention;
FIG. 2 is a flow chart of labeling in a first clinical guideline and a second clinical guideline provided by an embodiment of the present invention;
FIG. 3 is another flow chart of a method for automatically identifying updated content of a clinical guideline provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a summary of the gist of the EAU renal cell carcinoma clinical guideline section provided in an embodiment of the present invention;
FIG. 5 is a schematic diagram of a labeling update label according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a label corresponding to difference information of a labeling knowledge feature according to an embodiment of the present invention;
FIG. 7 is another schematic diagram of a tag corresponding to difference information of a labeling knowledge feature according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a label corresponding to the difference information of the labeling knowledge features according to an embodiment of the present application;
FIG. 9 is a schematic diagram showing an update sequence according to an embodiment of the present application;
fig. 10 is a block diagram of an automatic identification system for updating contents of clinical guideline according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the present disclosure, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As known from the background art, at present, when comparing the differences and changes of different clinical guidelines, a clinician is usually required to refer to the different clinical guidelines to find the differences and changes of the different clinical guidelines, but the manual comparison mode needs to consume a great deal of manpower and material resources, and the situations of missed checking and the like easily occur in the comparison process, which results in lower efficiency and lower accuracy of comparing the different clinical guidelines.
Therefore, the embodiment of the invention provides an automatic identification method and an automatic identification system for updating content of a clinical guideline, which are used for respectively analyzing and structurally extracting a first clinical guideline and a second clinical guideline according to a module hierarchical structure tree established by utilizing each level of titles of the clinical guideline in advance, so as to at least obtain a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline; and determining difference information between the first guide module and the second guide module, and marking corresponding labels at positions corresponding to the difference information in the first clinical guide and the second clinical guide respectively, so that two clinical guides which need to be compared are not needed to be manually consulted to find differences and changes among different clinical guides, and the efficiency and the accuracy of determining the differences and the changes among different clinical guides are improved.
It should be noted that, in the clinical guideline, there are multiple levels of headings, and the guideline modules (such as the first guideline module and the second guideline module referred to below) in the embodiments of the present invention specifically refer to: text content contained under the minimum level title in the clinical guideline. From the foregoing, it can be seen that there is also a corresponding heading for each guideline module.
It will be appreciated that the clinical guidelines (e.g., tumor clinical guidelines, etc.) referred to in embodiments of the invention are guidelines for a particular healthcare field, such as medical guidelines or clinical practice guidelines.
In embodiments of the present invention, it is desirable to compare the difference between a first clinical guideline, which may be referred to as a matching clinical guideline, and a second clinical guideline, which may be referred to as a clinical guideline to be matched.
Referring to fig. 1, a flowchart of an automatic identification method of updated contents of clinical guideline provided by an embodiment of the present invention is shown, where the automatic identification method includes:
step S101: according to a module hierarchical structure tree established by utilizing the titles of all levels of clinical guidelines in advance, analyzing and structuring and extracting the first clinical guideline and the second clinical guideline respectively to at least obtain a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline.
It should be noted that, the first guideline module is text content contained in the minimum level title in the first clinical guideline, and the second guideline module is text content contained in the minimum level title in the second clinical guideline.
In the specific implementation process of step S101, a module hierarchy tree is constructed in advance according to each level of the title of the clinical guideline, and mapping rules between the module hierarchy tree and each guideline module in the clinical guideline are formulated, so that a guideline module mapping based on the module hierarchy tree is implemented.
And analyzing and structurally extracting the unstructured first clinical guideline by using the module hierarchical structure tree to at least obtain a first guideline module corresponding to the first clinical guideline, and analyzing and structurally extracting the unstructured second clinical guideline by using the module hierarchical structure tree to at least obtain a second guideline module corresponding to the second clinical guideline.
It will be appreciated that, after parsing and structured extraction of the first clinical guideline, data in a specified format (e.g., CSV format) is obtained, where column data of the data in the specified format is a module name of the first guideline module, and row data of the data in the specified format is module content of the first guideline module. And the same is true. After the second clinical guideline is analyzed and extracted in a structured way, data with a specified format is also obtained, wherein the column data is the module name of the second guideline module, and the row data is the module content of the second guideline module.
Preferably, after the first guide module and the second guide module are extracted, the first guide module (that is, the above-mentioned data in the specified format) and the second guide module are stored in a database (for example, a relational database), and the hierarchical relationships between all the first guide modules are stored in the database, and the hierarchical relationships between all the second guide modules are stored in the database. It will be appreciated that the hierarchical relationship between all the first guideline modules refers specifically to: hierarchical relationships between module names of all first guideline modules, and hierarchical relationships between all second guideline modules specifically refer to: hierarchical relationships between module names of all second instruction modules.
That is, the database stores two parts of data, one part is a guide module (e.g., the above-mentioned CSV format data), the other part is a module name of the guide module, and a hierarchical relationship between the module names, which is the above-mentioned hierarchical tree of modules. In the database, the guideline modules and the structures between them (i.e., structures between the guideline modules) can be automatically mapped according to the hierarchical relationship of module names (i.e., module hierarchy tree).
Note that, the extracted module contents corresponding to the first guide module and the second guide module may have non-standard contents, so that it is preferable to perform standardization processing on the first guide module and the second guide module.
The specific way of normalizing is (here the first and second guideline modules are denoted as guideline modules): because more reference marks are inserted in the module content (namely text content) corresponding to the guide module, all the reference marks in the guide module are removed by making a regular expression matching rule, and meanwhile, the guide module is subjected to noise reduction processing such as text cleaning, word stopping, word drying, english number conversion into Arabic number, abbreviation full scale expansion and the like, so that the module content corresponding to the guide module is converted into a normalized text.
And carrying out related processing of the following steps by using the first guide module and the second guide module after normalization processing, wherein the content of the related processing is shown in the following steps.
Step S102: if the first clinical guideline and the second clinical guideline belong to the same source, and when the first clinical guideline has updated description relative to the second clinical guideline, determining first difference information between the first clinical guideline and the second clinical guideline by using the updated description of the first clinical guideline, and labeling corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline respectively.
It should be noted that, the label is a new label, a delete label or a modify label, the new label is a label indicating the new content of the first clinical guideline compared with the second clinical guideline, that is, the new content only exists in the first clinical guideline, and the new content does not exist in the second clinical guideline; the deletion tag is a tag indicating the deleted content of the first clinical guideline as compared to the second clinical guideline, i.e., the deleted content is only present in the second clinical guideline, and the deleted content is not present in the first clinical guideline; the modification signature is a signature indicating modification content of the first clinical guideline compared to the second clinical guideline, i.e. modification content is similar but not identical content in the first clinical guideline and the second clinical guideline.
In the update description attached to the clinical guideline, it is indicated which changes are made to which part of the contents of which page in the clinical guideline, or a hyperlink of the update contents is attached to the PDF, and clicking on the hyperlink can locate the update part.
It will be appreciated that the first clinical guideline and the second clinical guideline may be of different origin or may be of the same origin. When the first clinical guideline and the second clinical guideline belong to the same source, the first clinical guideline is an updated version compared to the second clinical guideline, and if the first clinical guideline has an updated description (relative to the updated description of the second clinical guideline), the updated description is used to determine difference information between the first clinical guideline and the second clinical guideline, and if the first clinical guideline does not have an updated description, the first guideline module and the second guideline module need to be compared to determine difference information between the first clinical guideline and the second clinical guideline.
Meanwhile, if the first clinical guideline and the second clinical guideline are affiliated with different sources, it is also necessary to compare the first guideline module and the second guideline module to determine difference information between the first clinical guideline and the second clinical guideline.
Such as: suppose there are two sources of renal tumor guidelines, one written for NCCN (i.e., the source of the clinical guideline), with updated instructions for NCCN written renal tumor guidelines, and the other written for EAU. If the first clinical guideline is version 2V 2 of 2020 written in NCCN and the second clinical guideline is version 1V 1 of 2020 written in NCCN, i.e. the first clinical guideline is affiliated with the second clinical guideline from the same source and the first clinical guideline is accompanied by an updated description, the updated description is directly used to determine the difference information between the first clinical guideline and the second clinical guideline. If the first clinical guideline is version 2V 2 of year 2020 written by EAU and the second clinical guideline is version 1V 1 of year 2020 written by EAU, i.e., the first clinical guideline is affiliated with the second clinical guideline from the same source but the first clinical guideline is not accompanied by an updated description, then the first guideline module and the second guideline module need to be aligned to determine difference information between the first clinical guideline and the second clinical guideline. If the first clinical guideline is version 2V 2 in 2020 written in EAU and the second clinical guideline is version 2V 2 in 2020 written in NCCN, i.e. the first clinical guideline is affiliated with a different source than the second clinical guideline, it is necessary to compare the first guideline module and the second guideline module to determine difference information between the first clinical guideline and the second clinical guideline.
In the process of implementing step S102, if the first clinical guideline and the second clinical guideline belong to the same source (the first clinical guideline is a new version clinical guideline corresponding to the second clinical guideline at this time), and when there is an update description of the first clinical guideline corresponding to the second clinical guideline, the updated guideline module and the updated sentence between the first clinical guideline and the second clinical guideline are automatically positioned and identified by using the preset guideline update identification and labeling rules, so as to determine the first difference information between the first clinical guideline and the second clinical guideline, and corresponding labels are labeled at the positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline.
If the first difference information is a modified portion of the first clinical guideline relative to the second clinical guideline (the guideline module indicating that modification occurred and the sentence in the guideline module in which modification occurred), a modification tag is labeled at a position in the first clinical guideline and the second clinical guideline corresponding to the modified portion, respectively.
If the first difference information is a new portion of the first clinical guideline relative to the second clinical guideline, the new label is marked at a location in the first clinical guideline corresponding to the new portion.
If the first difference information is a deleted portion of the first clinical guideline relative to the second clinical guideline, a delete tab is marked at a location in the second clinical guideline corresponding to the deleted portion.
It should be noted that, the process of setting the guideline update identification and labeling rule is: summarizing and generalizing writing rules of update description of clinical guidelines, and setting guideline update identification and labeling rules according to the writing rules, for example: in some clinical guidelines, the deleted content in the old version of clinical guideline is described in a sentence with "remove", the modified content in the new version of clinical guideline is described in a sentence with "modified", or the newly added content in the new version of clinical guideline is listed separately in a separate table.
It will be appreciated that, as can be appreciated from the foregoing, the difference information may indicate that the first clinical guideline may be a modified portion, a new portion, or a deleted portion of the second clinical guideline, and preferably, when corresponding labels are labeled at positions in the first clinical guideline and the second clinical guideline corresponding to the first difference information, different presentation forms may be utilized to present different categories of labels, respectively.
That is, through different presentation forms, the newly added tag, the deleted tag, and the modified tag are presented in the first clinical guideline and the second clinical guideline, respectively, thereby distinguishing the newly added tag, the deleted tag, and the modified tag.
For example: displaying the new label, the deletion label and the modification label by highlighting the text with different colors, wherein the modified part in the first clinical guideline and the second clinical guideline is displayed in a yellow highlighting text mode, namely the modification label is displayed in a yellow highlighting text mode; displaying the newly added part of the first clinical guideline compared with the second clinical guideline in a manner of highlighting text in blue, namely displaying the newly added label in a manner of highlighting text in blue; the deleted portion of the second clinical guideline is shown with text highlighted in red compared to the first clinical guideline, i.e., the delete tab is shown with text highlighted in red.
The above manner of displaying the new tag, the delete tag and the modify tag is only used for illustration, and in practical application, other different manners of displaying the new tag, the delete tag and the modify tag may be adopted, and the manner of displaying the new tag, the delete tag and the modify tag in the embodiment of the present invention is not particularly limited.
Step S103: if the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source and the updating description does not exist in the first clinical guideline, the first guideline module and the second guideline module are matched, the matched first guideline module and second guideline module are respectively used as a first guideline module to be processed and a second guideline module to be processed, and the first guideline module which is not matched with all the second guideline modules is used as a third guideline module to be processed.
In the process of implementing step S103, if the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source but there is no updated description of the first clinical guideline (compared with the updated description of the second clinical guideline), the first guideline module and the second guideline module are matched, the matched first guideline module and second guideline module are respectively used as a first guideline module to be processed and a second guideline module to be processed, the first guideline module which is not matched with all the second guideline modules is used as a third guideline module to be processed, and step S104 and step S105 are executed.
In a specific implementation, for each first guide module, the first guide module is matched with all second guide modules, if it can be determined that a second guide module matched with the first guide module is obtained, the first guide module is used as a first guide module to be processed, and a second guide module matched with the first guide module (i.e. the first guide module to be processed) is used as a second guide module; and if the first guide module is not matched with all the second guide modules, the first guide module is used as a third guide module to be processed.
Each pair of the first to-be-processed guide module and the second to-be-processed guide module is as follows: the first guide module and the second guide module are matched.
It should be noted that, the inventors have found through research that the title of the guideline module is a high summary of the module content of the guideline module, the title of the guideline module has the characteristics of brevity and high definition, and the title names of the respective guideline modules do not largely change with revision and update of the clinical guideline version.
Therefore, the specific way to match the first guide module and the second guide module is: for each first guideline module, a header similarity between the header of the first guideline module and the header of each second guideline module is determined using a preset deep semantic matching model (DSSM semantic matching model).
For each first guide module, if all the header similarity between the first guide module and all the second guide modules is smaller than the header similarity threshold, determining that the first guide module is not matched with all the second guide modules; and if the title similarity between the first guide module and at least one second guide module is greater than or equal to a title similarity threshold, determining that the first guide module is matched with the second guide module corresponding to the maximum title similarity.
That is, if the header similarity between the first guide module and all the second guide modules is smaller than the header similarity threshold, the first guide module is not matched with all the second guide modules, and the first guide module is the third guide module to be processed.
If the title similarity between the first guide module and one or more second guide modules is greater than or equal to the title similarity threshold, determining that the first guide module is matched with the second guide module corresponding to the maximum title similarity, wherein the first guide module is a first guide module to be processed, and the second guide module corresponding to the maximum title similarity is a second guide module to be processed.
Firstly, determining second difference information between the first clinical guideline and the second clinical guideline by using all the determined first guideline to be processed modules and the second guideline to be processed modules corresponding to the first guideline to be processed, and marking corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline; then a third to-be-processed guideline module is utilized to determine third difference information between the first clinical guideline and the second clinical guideline, and corresponding labels are marked at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline; details of the implementation are described below.
Step S104: and determining second difference information between the first clinical guideline and the second clinical guideline according to the first guideline to be processed and the second guideline to be processed, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively.
In the specific implementation process of step S104, for each pair of the first to-be-processed guideline module and the second to-be-processed guideline module that are matched, second difference information between the first clinical guideline and the second clinical guideline is determined by using the first to-be-processed guideline module and the second to-be-processed guideline module, and corresponding labels are respectively marked at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline, where the second difference information indicates a difference part in the first to-be-processed guideline module and the second to-be-processed guideline module that are matched, for example: the first guideline module to be processed is compared with the modified part in the second guideline module to be processed, the first guideline module to be processed is compared with the newly added part in the second guideline module to be processed, and the first guideline module to be processed is compared with the deleted part in the second guideline module to be processed.
For the specific content of labeling the corresponding label at the position corresponding to the second difference information in the first clinical guideline and the second clinical guideline, reference may be made to the content in step S102, and the details are not repeated here.
It is understood that when the second difference information is determined using the matched first to-be-processed guideline module and second to-be-processed guideline module, the difference between the first to-be-processed guideline module and the second to-be-processed guideline module (i.e., the second difference information) is determined in sentence units. Therefore, the first to-be-processed guide module and the second to-be-processed guide module need to be processed in a sentence dividing manner, sentence similarity between sentences of the first to-be-processed guide module and sentences of the second to-be-processed guide module is calculated, and finally, differences (namely second difference information) between the first to-be-processed guide module and the second to-be-processed guide module are determined by using the calculated sentence similarity.
Note that, the sentence similarity between sentences is calculated by using a specified algorithm, for example, the sentence similarity is calculated by using a general sentence encoder (Universal Sentence Encoder), and the manner of calculating the sentence similarity is not particularly limited in the embodiment of the present invention.
Step S105: and determining third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and marking corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively.
From the above, the third to-be-processed guide module is a first guide module that is not matched with all the second guide modules, and it should be noted that there are 3 cases where the header similarity between the headers of the first guide module and the second guide module is smaller than the header similarity threshold (i.e., the first guide module is not similar to the second guide module).
The 1 st case is: the module content corresponding to the second guideline module is deleted content of the first clinical guideline compared to the second clinical guideline, i.e., the module content corresponding to the second guideline module is present in the second clinical guideline but not in the first clinical guideline.
The 2 nd case is: the module content corresponding to the first guideline module is the new content of the first clinical guideline compared with the second clinical guideline, i.e. the module content corresponding to the first guideline module exists in the first clinical guideline and does not exist in the second clinical guideline.
The 3 rd case is: if the first guideline module is a third guideline module to be processed, the module content corresponding to the third guideline module to be processed is combined in other guideline modules of another version of clinical guideline. The "another version of clinical guideline" is specifically: if the content corresponding to the third pending guideline module is in the first clinical guideline, the other version of the clinical guideline is the second clinical guideline, and vice versa.
In the process of implementing step S105 specifically for the above 3 rd case, for each third to-be-processed guideline module, third difference information between the first clinical guideline and the second clinical guideline is determined by using the third to-be-processed guideline module and all the second guideline modules, and corresponding labels are respectively marked at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline, and for specific contents of marking corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline, reference is made to the contents in step S102 described above, and details are not repeated here.
The process of determining the third difference information by using the third to-be-processed guide module and all the second guide modules comprises the following steps: calculating the first sentence similarity between the first P percent (such as the first 20 percent) content of the first sentence in the third to-be-processed guide module and a plurality of (e.g. 5) second sentences of each second guide module; if the first sentence similarity is greater than the first sentence similarity threshold, determining that the third to-be-processed guide module is matched with the second guide module corresponding to the maximum first sentence similarity, that is, if only one first sentence similarity is greater than the first sentence similarity threshold, determining that the third to-be-processed guide module is matched with the second guide module with the first sentence similarity greater than the first sentence similarity threshold, and if a plurality of first sentence similarities are greater than the first sentence similarity threshold, determining that the third to-be-processed guide module is matched with the second guide module corresponding to the maximum first sentence similarity.
It should be noted that, calculating the first sentence similarity between the first P% (for example, the first 20%) of the first sentence in the third to-be-processed guide module and the plurality of (for example, 5) second sentences in each second guide module specifically refers to: at most, first sentence similarity between the first P% content of the first sentence in the third pending guideline module and the plurality of second sentences of the second guideline module is calculated, for example: and (3) calculating the similarity of the first sentence between the first P% content of the first sentence in the third to-be-processed guide module and the 5 second sentences of the second guide module at most.
As is known from the above step S104, the corresponding tags have been marked at the positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline, that is, there may already be partial tags in the first clinical guideline and the second clinical guideline, and after determining the third guideline module to be processed and the second guideline module matched with the third guideline module, from the first P% content of the first sentence of the third guideline module to be processed, the tags existing in the first clinical guideline located behind the first guideline module are changed to modified tags; and starting from a second sentence corresponding to the maximum first sentence similarity in a second guideline module matched with the third guideline module to be processed, changing the existing label in the second clinical guideline positioned behind the second sentence into a modified label.
That is, if the sentence (the first sentence similarity is greater than the first sentence similarity threshold and is the largest) of the second guideline module can be matched according to the first P% content of the first sentence in the third guideline module to be processed, in the first clinical guideline, starting from the first P% content of the first sentence of the third guideline module to be processed, covering the existing label in the content after the first P% content with the modified label (the newly added label noted in step S104); and in the second clinical guideline, starting from a second sentence corresponding to the maximum first sentence similarity in the second guideline module matched with the third guideline module to be processed, covering the label existing in the content after the sentence with the modified label (the deleted label marked in step S104).
Preferably, after step S104 and step S105 are performed, the labels marked in step S104 and step S105 are displayed at the front end, wherein different types of labels are displayed by using different display forms.
According to the embodiment of the invention, according to a module hierarchical structure tree which is established by utilizing each level title of clinical guideline in advance, analyzing and structuring extraction are respectively carried out on a first clinical guideline and a second clinical guideline, and at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline are obtained; and determining difference information between the first guide module and the second guide module, marking corresponding labels at positions corresponding to the difference information in the first clinical guide and the second clinical guide respectively, and manually consulting the two clinical guides to be compared to find differences and changes among different clinical guides, so that the efficiency and accuracy of determining the differences and changes among different clinical guides are improved.
The process of labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline, respectively, referred to in step S104 in fig. 1 of the above embodiment of the present invention, referring to fig. 2, a flowchart of labeling labels in the first clinical guideline and the second clinical guideline provided in the embodiment of the present invention is shown, including the following steps:
step S201: and respectively carrying out sentence dividing processing on the first to-be-processed guide module and the second to-be-processed guide module to obtain a plurality of first sentences corresponding to the first to-be-processed guide module and a plurality of second sentences corresponding to the second to-be-processed guide module.
It should be noted that, corresponding knowledge features may exist in sentences of the guide modules, preferably, after the first guide module to be processed and the second guide module to be processed are acquired, the first guide module and the second guide module are preprocessed respectively, and the knowledge features in the preprocessed first guide module and second guide module are extracted respectively, and the manner of specifically extracting the knowledge features of the guide modules (to characterize the first guide module and the second guide module) is as follows: and identifying and extracting the knowledge features in the guide module by using a rule-based or machine learning method by using the predefined knowledge feature types.
Knowledge features in the guideline module include, but are not limited to: the medical entity features, the grade features, the quantity features and the time features, and detailed explanation of each knowledge feature is described in detail below.
Medical entity characteristics: the medical entity features refer to various medical terms such as clinical symptoms, medicines, inspection methods and the like related in the disease diagnosis and treatment process, and are mainly used for explaining objects related to various medical activities and behaviors in clinical guidelines, so that the medical entity features are knowledge mainly focused by clinicians when learning the clinical guidelines.
Disease grade characteristics: the medical intervention method refers to a certain medical intervention measure mentioned in a clinical guideline, the clinical guideline commissions evidence or recommended grade of the intervention measure determined based on evidence sources, acquisition paths and expert consensus conditions, and the higher the grade is, the more sufficient the evidence is explained, and the more recommended application to clinical practice is. For example, there are generally fixed level classifications and expressions in the text of clinical guidelines for each tumor, such as "category+ level" and "LE: + level", etc., and are located directly in the sentence that describes a certain medical intervention or are annotated at the end of the sentence in brackets.
The number features are as follows: the method is used for describing various statistics related to diagnosis and treatment, and besides basic experimental statistical data such as sample number, P value and survival rate, the method also is used for describing clinical diagnosis and treatment data such as medication dosage, in-vivo substance content, tumor size and the like, and two forms of basic words and compound words mainly exist.
The expression forms of the basic words comprise three types, namely English basic words, arabic numbers and percentage. English base words are in a form such as "six-four", the expression of Arabic numbers can comprise decimal points, commas and spaces, the expression of Arabic numbers can also be without decimal points, commas and spaces, and the percentages can be: the english base term + "percentage" (e.g., seventypercentage) may also be: arabic numerals + "%" (e.g., 70%).
The compound word index uses word modifiers to describe or define the basic word descriptions in the clinical guideline, such as "mg/dL" representing the dosage, "cm" representing the tumor size, and "times" representing multiples.
Time characteristics: the method is used for expressing the time or duration of a certain medical event, operation in the clinical guideline, and mainly comprises two types of time and time.
The time is a specific time point, and is mainly used for describing the specific year time of a certain discovery and experiment in clinical guidelines, and the basic composition form is four Arabic numerals. The time period refers to a period of time that is generally used to express the duration of a certain medical operation or a certain year range, such as "6 montants" and "201pto 2013", etc., and its basic composition forms are that, in addition to common numerals (arabic numerals and english) in combination with time words, words that represent time instants may be connected together by connecting words or connecting symbols (such as "2006through 2015" or "2006-2015"), or words that represent time instants may be integrated with time words (such as "5-year", etc.).
The foregoing is a detailed description of a part of knowledge features, and other types of knowledge features are not specifically illustrated and described, so those skilled in the art can determine knowledge features to be extracted according to actual situations, and the detailed description is not limited herein.
In the specific implementation process of step S201, the first to-be-processed guide module and the second to-be-processed guide module are subjected to sentence segmentation, semantic expansion and normalization processing are performed on knowledge features in each sentence, so as to obtain a plurality of first sentences corresponding to the first to-be-processed guide module, and a plurality of second sentences corresponding to the second to-be-processed guide module.
The specific semantic expansion and normalization processing method for the knowledge features in each sentence of the guide module (the first guide module to be processed and the second guide module to be processed are characterized in that: firstly, sentence segmentation is carried out on the guide module, then semantic expansion and standardization processing are carried out on knowledge features in sentences of the guide module based on appointed disease codes (such as ICD-10) and controlled word lists, and the lexical and structural isomerism problems of the knowledge features in the sentences of the guide module are solved, so that sentences of the guide module after the semantic expansion and standardization processing are obtained. The lexical isomerism comprises synonyms, abbreviations, acronyms, case sensitivity, deformation and the like, and the syntactic problems comprise sorting, separators, deletion and the like.
Step S202: and for the m first sentences of the first to-be-processed guide module, calculating the sentence similarity between the m first sentences and the H second sentences of the second to-be-processed guide module.
It should be noted that m is an integer greater than or equal to 1 and less than or equal to x, x is the total number of first sentences contained in the first to-be-processed guide module, m starts from 1 and increases by 1, h is an integer greater than or equal to 1 and less than or equal to y, and y is the total number of second sentences contained in the second to-be-processed guide module.
As can be seen from the content in fig. 1 in the above embodiment of the present invention, the first to-be-processed guide module and the second to-be-processed guide module correspond to each other, in the process of implementing step S202, based on the knowledge features and the attributes of the knowledge features (such as the category and the position of the knowledge features) of the sentences after the semantic expansion and normalization processing in step S201, the m first sentence of the first to-be-processed guide module is subjected to traversal similarity calculation, that is, the sentence similarity between the m first sentence and the H second sentences of the second to-be-processed guide module is calculated.
That is, from the 1 st first sentence of the first to-be-processed guide module, the sentence similarity between the 1 st (m=1) first sentence and the H second sentences of the second to-be-processed guide module is calculated, then the 2 nd first sentence of the first to-be-processed guide module is passed, the sentence similarity between the 2 nd (m=2) first sentence and the H second sentences of the second to-be-processed guide module is calculated, and so on until the sentence similarity between the x (m=x) th first sentence and the H second sentences of the second to-be-processed guide module is calculated.
It can be understood that, in order to save computing resources and improve efficiency, the sentence iteration step H is set, i.e. only the sentence similarity between the mth first sentence and the part (H) of the second sentences of the second guideline module to be processed is calculated.
That is, at most, the sentence similarity between the m-th first sentence and the H second sentences of the second to-be-processed guidance module is calculated, and the sequence number ranges of the H second sentences are: the sequence number of the second sentence with the sentence similarity greater than or equal to the sentence similarity threshold value with the m-1 th first sentence is pushed back by H.
For example: assuming that the sentence iteration step size is 5 (h=5), starting from the 1 st first sentence of the first to-be-processed guideline module, calculating the sentence similarity between the 1 st (m=1) first sentence and the first 5 (including the 5 th) second sentences of the second to-be-processed guideline module at most, and assuming that the sentence similarity between the 1 st first sentence and the 1 st second sentence is greater than or equal to the sentence similarity threshold. Then, to the 2 nd first sentence of the first to-be-processed guide module, the sentence similarity between the 2 nd (m=2) first sentence and the 5 second sentences of the second to-be-processed guide module (in this case, the 2 nd second sentence to the 6 th second sentence are calculated), and the sentence similarity between the 2 nd first sentence and the 3 rd second sentence is assumed to be greater than or equal to the sentence similarity threshold. Then, to the 3 rd first sentence of the first to-be-processed guideline module, the sentence similarity between the 3 rd (m=3) first sentence and the 5 second sentences (in this case, the 4 th to 8 th second sentences) of the second to-be-processed guideline module is calculated at most, and the line calculation is advanced.
It should be noted that, the value of the sentence iteration step H may be set according to the actual situation, which is not specifically limited herein.
Step S203: if the sentence similarity between the mth first sentence and the nth second sentence is equal to 1, determining that the mth first sentence and the nth second sentence are the same, and not executing the labeling process.
It should be noted that n is an integer greater than or equal to 1 and less than or equal to y, and n is located in the range of sequence numbers of H second sentences for calculating the sentence similarity with the mth first sentences.
In the specific implementation process of step S203, if the sentence similarity between the mth first sentence in the first to-be-processed guide module and the nth second sentence in the second to-be-processed guide module is equal to 1, it is explained that the mth first sentence and the nth second sentence are the same, at this time, no mark is made at the position corresponding to the mth first sentence in the first to-be-processed guide module, and no mark is made at the position corresponding to the nth second sentence in the second to-be-processed guide module.
Step S204: if the sentence similarity between the m-th first sentence and the n-th second sentence is greater than or equal to the sentence similarity threshold and smaller than 1, marking a modification label at a position corresponding to the m-th first sentence in the first clinical guideline, marking a modification label at a position corresponding to the n-th second sentence in the second clinical guideline, and when n is greater than m, determining that the sentence similarity between the first sentence and the first sentence is smaller than the sentence similarity threshold and is not marked in the second sentence before the n-th second sentence in the second guideline module to be processed, and marking a deletion label at a position corresponding to the third sentence in the second clinical guideline.
In the specific implementation of step S204, if the sentence similarity between the mth first sentence and the nth second sentence is greater than or equal to the sentence similarity threshold and less than 1, it is indicated that the mth first sentence in the first guideline module to be processed has modified content compared with the nth second sentence in the second guideline module to be processed, at this time, a modification label is marked at a position corresponding to the mth first sentence in the first guideline module to be processed of the first clinical guideline, and a modification label is marked at a position corresponding to the nth second sentence in the second guideline module to be processed of the second clinical guideline.
And when n is larger than m, determining a third sentence which is positioned before the nth second sentence in the second to-be-processed guideline module, has sentence similarity smaller than a sentence similarity threshold value with the first sentence of the first m, and is not subjected to labeling processing, and labeling a deletion label at a position corresponding to the third sentence in the second to-be-processed guideline module of the second clinical guideline.
For example: assuming that the sentence similarity between the 4 th first sentence and the 5 th second sentence is greater than or equal to the sentence similarity threshold and less than 1, where n (n=5) is greater than m (m=4), determining third sentences (assumed to be the 2 nd second sentences) which are located before the 5 th second sentence in the second guideline module to be processed, are all less than the sentence similarity threshold and are not subjected to labeling processing, and labeling a deletion tag at a position corresponding to the 2 nd second sentence in the second clinical guideline module.
It will be appreciated that when it is determined that there is a modified content for the mth first sentence in the first pending guideline module compared to the nth second sentence in the second pending guideline module, it may also be indicated that there is an update in the knowledge features of the mth first sentence compared to the knowledge features of the nth second sentence, at which time a difference between the knowledge features within the mth first sentence and the nth second sentence needs to be determined.
Preferably, after step S204 is performed, based on the knowledge features of the sentences after the semantic expansion and normalization processing in step S201 and the attributes of the knowledge features (such as the category and the position of the knowledge features, etc.), the differences between the knowledge features in the mth first sentence and the nth second sentence are compared to obtain knowledge feature difference information, and corresponding labels are respectively marked at the positions corresponding to the knowledge feature difference information in the mth first sentence and the nth second sentence.
That is, corresponding tags are labeled at the locations of knowledge features where update changes occur in the mth first sentence and the nth second sentence, indicating that the knowledge features change in the first pending guideline module of the first clinical guideline and the second pending guideline module of the second clinical guideline,
Meanwhile, different kinds of labels for indicating knowledge characteristic difference information are displayed in different display forms, such as: the entity knowledge characteristic difference information is represented by red underline and font bolded, the level knowledge characteristic difference information is represented by cyan underline and font bolded, the quantity knowledge characteristic difference information is represented by blue underline and font bolded, and the time knowledge characteristic difference information is represented by green underline. The specific display form may be set according to actual conditions, and is not particularly limited herein.
Preferably, knowledge feature difference information may also be counted and presented based on a time series (time of clinical guideline release). Meanwhile, labels which are used for indicating knowledge characteristic difference information and are in different types are displayed at the front end in different display forms.
Step S205: if the sentence similarity between the m-th first sentence and the H second sentences is smaller than the sentence similarity threshold value, marking a new label at the position corresponding to the m-th first sentence in the first clinical guideline.
In the process of implementing step S205, if the sentence similarity between the mth first sentences and the H second sentences is smaller than the sentence similarity threshold, it indicates that the mth first sentences in the first guideline module to be processed are new content compared with the second guideline module to be processed, and the positions corresponding to the mth first sentences in the first guideline module to be processed of the first clinical guideline are marked with new labels.
Through the contents of the steps S201 to S205, each first sentence in all the first guideline modules to be processed is sequentially processed (m is from 1 to x), so as to obtain second difference information between the first clinical guideline and the second clinical guideline, and corresponding labels are respectively marked at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline.
In the embodiment of the invention, the second difference information between the first clinical guideline and the second clinical guideline is determined from 3 dimensions of the guideline module, the sentences and the knowledge features, and corresponding labels are marked at the positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively, so that the difference and the change between different clinical guidelines are found without manually consulting two clinical guidelines which need to be compared, and the efficiency and the accuracy of determining the difference and the change condition between different clinical guidelines are improved. And counting knowledge feature difference information based on a time sequence, mining time sequence changes of knowledge features in the clinical guideline, converting the static and unstructured clinical guideline into structural, knowledge characterization and visual representation forms, and realizing multi-level and multi-dimensional disclosure of updated contents of the clinical guideline so as to assist a clinician to intuitively know differences and change conditions among the clinical guideline and improve learning efficiency of the clinician.
For better explanation of the content in fig. 1 and 2 of the above embodiment of the present invention, the content shown in fig. 3 is illustrated, where the first clinical guideline shown in fig. 3 and the second clinical guideline belong to the same source, referring to fig. 3, another flowchart of a method for automatically identifying updated content of a clinical guideline provided by the embodiment of the present invention is shown, and the method for automatically identifying includes:
step S301: and analyzing and structurally extracting the first clinical guideline and the second clinical guideline to at least obtain a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline.
Step S302: and carrying out standardization processing on the first guide module and the second guide module.
In the specific implementation process of step S302, the specific content of the normalization processing for the guide module is referred to the content in step S101 in the above embodiment of the present invention in fig. 1, and will not be described herein again.
Step S303: knowledge features in the first guide module and the second guide module are extracted.
Step S304: if there is an update explanation in the first clinical guideline, step S305 is executed, and if there is no update explanation, step S306 is executed.
Step S305: first difference information between the first clinical guideline and the second clinical guideline is determined using the updated description of the first clinical guideline, and corresponding labels are respectively marked at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline, and step S309 is performed.
Step S306: and matching the first guide module with the second guide module, wherein the matched first guide module and second guide module are respectively used as a first guide module to be processed and a second guide module to be processed, and the first guide module which is not matched with all the second guide modules is used as a third guide module to be processed.
Step S307: and determining second difference information between the first clinical guideline and the second clinical guideline according to the first guideline to be processed and the second guideline to be processed, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively.
Step S308: and determining third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and marking corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively.
Step S309: the labels marked in the steps S305 or the steps S307 to S308 are displayed at the front end by utilizing different display forms.
It should be noted that, the execution principles of the steps S301 to S309 are described in fig. 1 and fig. 2, and are not repeated here.
To better illustrate the contents of the above embodiments of the present invention shown in fig. 1 to 3, the contents of the above embodiments of the present invention shown in fig. 1 to 3 are illustrated by processes A1 to A7 by taking specific clinical guidelines as examples, wherein the contents of the processes A1 to A7 are only for illustration.
And establishing a module hierarchical structure tree corresponding to the kidney tumor clinical guideline by utilizing each level title of the kidney tumor clinical guideline, and making a mapping rule between the module hierarchical structure tree and the content of each guideline module.
A1, analyzing and structurally extracting unstructured first clinical guidelines and unstructured second clinical guidelines according to a module hierarchical structure tree corresponding to the clinical guidelines of the kidney tumors, and at least obtaining a first guideline module corresponding to the first clinical guidelines and a second guideline module corresponding to the second clinical guidelines.
The method comprises the steps of selecting and calling java open source tools of wire.PDF, wire.doc and PDFBox for processing according to text and picture content in a kidney cell cancer clinical guideline document, and selecting and calling PDFPnumber in a Python open source library for processing according to a table in the kidney cell cancer clinical guideline document.
The text content of the extracted clinical guideline (thereby characterizing the first clinical guideline and the second clinical guideline) is stored in TXT and DOCX formats, the picture content is stored in PNG format, and the table content is stored in CSV format. Meanwhile, in order to facilitate distinguishing and identifying each level of title from text content in the clinical guideline, each level of title of the clinical guideline is stored in a separate TXT file, and references cited in the clinical guideline are also stored separately. The extracted first and second guideline modules are stored in a database, and the hierarchical relationships between all the first guideline modules are stored in the database, and the hierarchical relationships between all the second guideline modules are stored in the database.
A2, performing text cleaning, word stopping and word drying, converting English numbers into Arabic numbers, expanding abbreviations in a full scale and the like on each first guide module and each second guide module, and converting the first guide module and the second guide module into normalized texts.
In a specific implementation, the specific processing mode of text cleaning is as follows: and removing all reference marks in the first guide module and the second guide module based on the regular expression matching rule.
The stop words are words which occur in the text with high frequency but do not contain practical meanings, such as the, of, is, are, various punctuations and the like, and the stop words are removed by removing the stop words in the first guide module and the second guide module through a dictionary matching method.
Because the written language of the clinical guideline of the renal cell carcinoma is English, the words in the first clinical guideline and the second clinical guideline have complex morphological changes, such as various tenses of verbs, plural nouns and comparison stages of adjectives, so that after the first clinical guideline and the second clinical guideline are subjected to the word stem treatment, various variants of the words in the first clinical guideline and the second clinical guideline are unified and standardized, and only word stem parts are reserved to improve the effectiveness of similarity calculation results of the first clinical guideline and the second clinical guideline.
An abbreviation-full scale mapping table is constructed using an abbreviation recognition method based on rules and a right-to-left reverse order scanning method, and abbreviations in the first clinical guideline and the second clinical guideline are replaced according to the abbreviation-full scale mapping table, and the abbreviations are expanded to full scales intended to be referred to in the first clinical guideline and the second clinical guideline, such as the abbreviation "RCC" is replaced with the full scale "Renal Cell Carcinoma".
A3, randomly extracting a version of the first clinical guideline and the second clinical guideline, and identifying and extracting knowledge features, manually checking and evaluating effects of the first clinical guideline and the second clinical guideline. Identifying class 5 knowledge features in the renal cell carcinoma clinical guideline (i.e., categories of the first clinical guideline and the second clinical guideline), the class 5 knowledge features being: the "clinical manifestation", "treatment method", "therapeutic drug (except for the second category" combination drug) "," examination method "and" disease (except kidney tumor) ", and knowledge features are identified and extracted by a dictionary and rule-based combination method. The criteria for evaluating the identification and extraction of knowledge features are: accuracy (P), recall (R), reconciliation average of accuracy and Recall (F-measure, F1), where p=100% of the number of correctly identified and extracted knowledge features/the number of total identified and extracted knowledge features, r=100% of the number of correctly identified and extracted knowledge features/the number of knowledge features to be identified and extracted, f1=2 RP/(r+p) 100%.
A4, it can be understood that the update description attached to the NCCN kidney cancer clinical guideline only aims at the main point summary part, and the change situation of pictures and tables in different pages is summarized by the main point paging description, and the text part is not directly involved, so that the update description is split only on the basis of the main point summary page, and detailed subsequent processing is not needed.
For the EAU renal cell carcinoma clinical guideline, the text renal cell carcinoma clinical knowledge part of the EAU renal cell carcinoma clinical guideline not only describes renal cell carcinoma pathology and diagnosis and treatment related knowledge in sections, but also has a part of sections for combing related key evidence and recommendation in a form of a table, marking the evidence grade and recommendation strength, and forming a section key point summary, wherein the specific content of the section key point summary is shown in a schematic diagram of the EAU renal cell carcinoma clinical guideline section key point summary shown in fig. 4. The updated instructions in the EAU renal cell carcinoma clinical guideline are in the form of tables that list, in sections, evidence and recommendations that are newly added in summary of each section point of the EAU renal cell carcinoma clinical guideline compared to the previous version.
The above is relevant to the clinical guidelines for NCCN kidney cancer and the updated guidelines for EAU renal cell carcinoma.
In the analysis section of the PDF document of the clinical guideline, the updated description content and the summary of each chapter point in the clinical guideline are extracted and dumped into the CSV format table, so that the updated description of the clinical guideline can be traversed, and the updated description of the clinical guideline can be searched and located in the summary of each chapter point, so that the newly added content of the first clinical guideline (here, the new version clinical guideline) is determined compared with the newly added content of the second clinical guideline (here, the old version clinical guideline) and the newly added label is marked at the position corresponding to the newly added content in the first clinical guideline. For the deleted content or modified content of the first clinical guideline compared with the deleted content or modified content of the second clinical guideline, the deleted content or modified content needs to be determined based on the similarity calculation result by using the updated description content discovery method, and corresponding labeling is performed, and detailed description is omitted here for how to label the content shown in step S102 of fig. 1 in the above embodiment of the present invention.
The above is to determine and label the difference information between the first clinical guideline and the second clinical guideline according to the updated description of the clinical guideline, and the following is to determine and label the difference information between the first clinical guideline and the second clinical guideline without using the updated description (corresponding to the steps S103 to S105 of fig. 1 and the contents of fig. 2 in the above embodiment of the present invention).
It will be appreciated that two versions of the NCCN renal carcinoma clinical guideline were selected, and two versions of the EAU renal cell carcinoma clinical guideline were selected for comparison between clinical guidelines (differences and changes between the two versions of clinical guideline were determined without using updated instructions), with the sentence similarity threshold set to 0.51.
A5, as can be seen from the above process A4, the sentence similarity threshold is set to 0.51, and according to the above embodiment of the present invention, the steps S103 to S105 of FIG. 1 and the contents shown in the steps of FIG. 2 are combined, and under the condition that the updated description is not utilized, the positive sequence defined traversal mode is adopted to compare and label the difference information between the two versions of NCCN renal cell carcinoma clinical guidelines, and the difference information between the two versions of EAU renal cell carcinoma clinical guidelines is compared and labeled accordingly.
It can be appreciated that the comparison between two versions of NCCN renal carcinoma clinical guidelines and the comparison between two versions of EAU renal cell carcinoma clinical guidelines can be evaluated by indexes such as accuracy, recall, and F1 values.
A6, in the process of comparing the two versions of clinical guidelines in the process A5, for the third to-be-processed guideline module (the first guideline module that is not matched with all the second guideline modules) mentioned in step S103 in the embodiment of the present invention, the third to-be-processed guideline module is processed and labeled correspondingly by the content shown in step S105 in the embodiment of the present invention in FIG. 1, and the details are not repeated here.
A7, after comparing the two versions of clinical guidelines, marking the parts (i.e. modification labels) of the two versions of clinical guidelines, which are modified, in a mode of yellow highlighting text, wherein the yellow highlighting contents in the two versions of clinical guidelines are in one-to-one correspondence with each other; marking a new part (namely a new label) in the new version clinical guideline in a mode of blue highlighting text; the deleted portions (i.e., delete tabs) in the old version of the clinical guideline are marked in red highlighting text.
To better explain how the difference information between the two versions of the clinical guideline is highlighted, it is illustrated by the schematic diagram of labeling update labels shown in fig. 5.
It will be appreciated that the clinical guideline shown in fig. 5 is the difference information between the bodies of the guideline module of the EAU renal cell carcinoma clinical guideline "epidemiology" of the two versions, with the body of the guideline module of the EAU renal cell carcinoma clinical guideline "epidemiology" of the 2016 version on the left and the body of the guideline module of the EAU renal cell carcinoma clinical guideline "epidemiology" of the 2018 version on the right.
In fig. 5, text is highlighted in yellow, which is similar but not exactly identical in the two versions of EAU renal cell carcinoma clinical guideline; in the clinical guideline of the version of the EAU renal cell carcinoma in 2016 on the left, the content marked in a text manner is highlighted in red, which represents that the part of the content is deleted in the clinical guideline of the EAU renal cell carcinoma in 2016, and the part of the content (the content highlighted in red in the clinical guideline of the EAU renal cell carcinoma in 2016) has no relevant description in the clinical guideline of the EAU renal cell carcinoma in 2018; in the EAU renal cell carcinoma clinical guideline of the 2018 version on the right, the content marked in blue highlighting text represents that the portion of content is newly added in the EAU renal cell carcinoma clinical guideline of the 2018 version, and the portion of content (the content highlighted in blue in the EAU renal cell carcinoma clinical guideline of the 2018 version) does not appear in the EAU renal cell carcinoma clinical guideline of the 2016 version.
As can be seen from the content shown in step S204 of fig. 2 in the above embodiment of the present invention, for the modified content between the two versions of clinical guidelines (i.e., the first clinical guideline and the second clinical guideline in step S204), that is, for the yellow highlighted content corresponding to the two versions of clinical guidelines, the knowledge features between the two versions of modified content may change, at this time, it is necessary to determine the knowledge feature difference information between the modified content and make corresponding labels, and at the same time, display different types of labels for indicating the knowledge feature difference information in different display forms.
It is to be noted that, from the foregoing, the language of the clinical guideline for renal cell carcinoma is english, so that the contents of the clinical guideline shown in fig. 5 and the text contents of the clinical guideline shown in fig. 6 to 9 below are written in english,
it should be further noted that, in fig. 6 to 8, the left side is a part of the clinical guideline of the version 2016 EAU renal cell carcinoma, and the right side is a part of the clinical guideline of the version 2018 EAU renal cell carcinoma.
As shown in fig. 6, differences in physical knowledge features are represented in red underlining and font bolding in the yellow highlighting of the two versions of the clinical guideline, i.e., there is a change in the physical knowledge features in the two versions of the clinical guideline that are noted in red underlining and font bolding.
As shown in fig. 7, in the yellow highlighting of the two versions of the clinical guideline, the differences in the level knowledge features are represented by the cyan underlining and the font bolding, i.e., there is a change in the level knowledge features in the two versions of the clinical guideline, which are marked by the cyan underlining and the font bolding.
As shown in fig. 8, in the yellow highlighting of the two versions of the clinical guideline, the difference in the quantitative knowledge features is represented by blue underlining and font bolding, and the difference in the temporal knowledge features is represented by green underlining and font bolding, i.e., there is a change in the quantitative knowledge features noted by blue underlining and font bolding in the two versions of the clinical guideline, and there is a change in the temporal knowledge features noted by green underlining and font bolding in the two versions of the clinical guideline.
It can be understood that the display update time sequence can be used for combing entity knowledge feature differences among clinical guidelines of different versions from the time dimension, and knowledge update venues of clinical guidelines can be assisted to be quickly cleared by readers by browsing changes of entity knowledge features of clinical guidelines of each version compared with the previous version or the next version. Taking the guideline module content of "targeted therapy for recurrent, progressive or metastatic renal cell carcinoma (Target Therapy of Relapsed or Advanced or Metastatic RCC)" as an example, the display update timing is illustrated by the display schematic diagram of the update timing shown in fig. 9.
In the illustration of fig. 9, the "targeted therapy for recurrent, progressive or metastatic renal cell carcinoma" guideline module of the 2016 version of the clinical guideline is newly augmented with three drugs, as compared to the 2015 version of the clinical guideline; compared with the clinical guideline of 2016, the guideline module of 2017 for the targeted treatment of recurrent, progressive or metastatic renal cell carcinoma adds one drug and deletes five drugs; compared to the clinical guideline of 2017, the guideline module of the clinical guideline of 2018 is newly added with one drug.
The content shown bolded in fig. 9 represents the titles of the respective stages from which the clinical guideline is derived.
Corresponding to the above-mentioned method for automatically identifying updated contents of clinical guideline provided by the embodiment of the present invention, referring to fig. 10, the embodiment of the present invention further provides a block diagram of an automatic identifying system for updated contents of clinical guideline, where the automatic identifying system includes: the analysis unit 100, the first processing unit 110, the second processing unit 120, the third processing unit 130, and the fourth processing unit 140;
the parsing unit 100 is configured to parse and structurally extract a first clinical guideline and a second clinical guideline according to a module hierarchy tree established by using each level of the clinical guideline in advance, so as to obtain at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, where the first guideline module is text content included in a minimum level of the first clinical guideline, and the second guideline module is text content included in a minimum level of the second clinical guideline.
The first processing unit 110 is configured to determine first difference information between the first clinical guideline and the second clinical guideline by using the updated description of the first clinical guideline if the first clinical guideline and the second clinical guideline belong to the same source, and label corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline, wherein the labels are newly added labels, deleted labels or modified labels.
The second processing unit 120 is configured to match the first guideline module with the second guideline module if the first clinical guideline and the second clinical guideline belong to different sources, or match the first guideline module with the second guideline module if the first clinical guideline and the second clinical guideline belong to the same source and the first clinical guideline has no updated description, and respectively use the matched first guideline module and second guideline module as a first guideline module to be processed and a second guideline module to be processed, and use the first guideline module that is not matched with all the second guideline modules as a third guideline module to be processed.
In a specific implementation, the second processing unit 120 is specifically configured to: aiming at each first guide module, determining the similarity of the titles of the first guide modules and the titles of each second guide module by using a preset depth semantic matching model; for each first guide module, if all the title similarities are smaller than the title similarity threshold, determining that the first guide module is not matched with all the second guide modules, and if at least one title similarity is larger than or equal to the title similarity threshold, determining that the first guide module is matched with the second guide module corresponding to the maximum title similarity.
The third processing unit 130 is configured to determine second difference information between the first clinical guideline and the second clinical guideline according to the first to-be-processed guideline module and the second to-be-processed guideline module, and label corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline, respectively.
The fourth processing unit 140 is configured to determine third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and label corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline, respectively.
In a specific implementation, the fourth processing unit 140 is specifically configured to: calculating the first sentence similarity between the first P% content of the first sentence in the third to-be-processed guide module and a plurality of second sentences of each second guide module; if at least one first sentence similarity is larger than a first sentence similarity threshold, determining that a third to-be-processed guide module is matched with a second guide module corresponding to the maximum first sentence similarity; starting from the first P% content of the first sentence of the third pending guideline module, changing the existing label in the first clinical guideline located after the first P% content to a modified label; starting from a second sentence corresponding to the maximum first sentence similarity in a second guideline module matched with the third guideline module to be processed, changing the existing label in the second clinical guideline positioned behind the second sentence into a modified label.
According to the embodiment of the invention, according to a module hierarchical structure tree which is established by utilizing each level title of clinical guideline in advance, analyzing and structuring extraction are respectively carried out on a first clinical guideline and a second clinical guideline, and at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline are obtained; and determining difference information between the first guide module and the second guide module, marking corresponding labels at positions corresponding to the difference information in the first clinical guide and the second clinical guide respectively, and manually consulting the two clinical guides to be compared to find differences and changes among different clinical guides, so that the efficiency and accuracy of determining the differences and changes among different clinical guides are improved.
Preferably, in conjunction with the content shown in fig. 10, the third processing unit 130 includes: the sentence sub-unit, the calculation sub-unit, the first labeling sub-unit, the second labeling sub-unit and the third labeling sub-unit are implemented according to the following principles:
the sentence dividing subunit is used for respectively carrying out sentence dividing processing on the first to-be-processed guide module and the second to-be-processed guide module to obtain a plurality of first sentences corresponding to the first to-be-processed guide module and a plurality of second sentences corresponding to the second to-be-processed guide module.
The calculating subunit is configured to calculate, for an mth first sentence of the first to-be-processed guide module, a sentence similarity between the mth first sentence and H second sentences of the second to-be-processed guide module, where m is an integer greater than or equal to 1 and less than or equal to x, x is a total number of first sentences included in the first to-be-processed guide module, m starts from 1 and increases by 1, H is an integer greater than or equal to 1 and less than or equal to y, and y is a total number of second sentences included in the second to-be-processed guide module.
And the first labeling subunit is used for determining that the mth first sentence is identical to the nth second sentence if the sentence similarity between the mth first sentence and the nth second sentence is equal to 1, and not executing labeling processing, wherein n is an integer greater than or equal to 1 and less than or equal to y.
And the second labeling subunit is used for labeling the modified label at the position corresponding to the m-th first sentence in the first clinical guideline and labeling the modified label at the position corresponding to the n-th second sentence in the second clinical guideline if the sentence similarity between the m-th first sentence and the n-th second sentence is larger than or equal to the sentence similarity threshold and smaller than 1, determining a third sentence which is positioned before the n-th second sentence in the second guideline module to be processed and has not been labeled with the sentence similarity threshold and has not been labeled with the first sentence in the second guideline, and labeling the deletion label at the position corresponding to the third sentence in the second clinical guideline.
And the third labeling subunit is used for labeling the newly added label at the position corresponding to the m first sentences in the first clinical guideline if the sentence similarity between the m first sentences and the H second sentences is smaller than the sentence similarity threshold value.
Preferably, in combination with the content shown in fig. 10, the automatic identification system further includes:
the preprocessing unit is used for respectively preprocessing the first guide module and the second guide module and respectively extracting knowledge features in the preprocessed first guide module and the preprocessed second guide module.
Correspondingly, the second labeling subunit is further configured to: comparing the difference between the knowledge features in the mth first sentence and the nth second sentence to obtain knowledge feature difference information; and labeling corresponding labels at positions corresponding to the knowledge characteristic difference information in the mth first sentence and the nth second sentence respectively.
In the embodiment of the invention, the second difference information between the first clinical guideline and the second clinical guideline is determined from 3 dimensions of the guideline module, the sentences and the knowledge features, and corresponding labels are marked at the positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively, so that the difference and the change between different clinical guidelines are found without manually consulting two clinical guidelines which need to be compared, and the efficiency and the accuracy of determining the difference and the change condition between different clinical guidelines are improved. And counting knowledge feature difference information based on a time sequence, mining time sequence changes of knowledge features in the clinical guideline, converting the static and unstructured clinical guideline into structural, knowledge characterization and visual representation forms, and realizing multi-level and multi-dimensional disclosure of updated contents of the clinical guideline so as to assist a clinician to intuitively know differences and change conditions among the clinical guideline and improve learning efficiency of the clinician.
Preferably, in combination with the content shown in fig. 10, the automatic identification system further includes:
and the display unit is used for displaying the labels in different categories by using different display forms respectively.
Preferably, in combination with the content shown in fig. 10, the automatic identification system further includes:
and the normalization unit is used for performing normalization processing on the first guide module and the second guide module.
Preferably, in combination with the content shown in fig. 10, the automatic identification system further includes:
the storage unit is used for storing the first guide modules and the second guide modules into a database, storing the hierarchical relations among all the first guide modules into the database, and storing the hierarchical relations among all the second guide modules into the database.
In summary, the embodiment of the invention provides a method and a system for automatically identifying updated contents of a clinical guideline, which are used for respectively analyzing and structurally extracting a first clinical guideline and a second clinical guideline according to a module hierarchical structure tree established by utilizing each level of titles of the clinical guideline in advance, so as to at least obtain a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline; and determining difference information between the first guide module and the second guide module, marking corresponding labels at positions corresponding to the difference information in the first clinical guide and the second clinical guide respectively, and manually consulting the two clinical guides to be compared to find differences and changes among different clinical guides, so that the efficiency and accuracy of determining the differences and changes among different clinical guides are improved.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for a system or system embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, with reference to the description of the method embodiment being made in part. The systems and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (7)
1. A method for automatically identifying updated content of a clinical guideline, the method comprising:
according to a module hierarchical structure tree established by utilizing each level of title of a clinical guideline in advance, respectively analyzing and structuring and extracting a first clinical guideline and a second clinical guideline to at least obtain a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, wherein the first guideline module is text content contained in a minimum level title in the first clinical guideline, and the second guideline module is text content contained in the minimum level title in the second clinical guideline;
If the first clinical guideline and the second clinical guideline belong to the same source, and when the first clinical guideline has updated description relative to the second clinical guideline, determining first difference information between the first clinical guideline and the second clinical guideline by using the updated description of the first clinical guideline, and marking corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline respectively, wherein the labels are added labels, deleted labels or modified labels; if the first difference information is the modification part of the first clinical guideline relative to the second clinical guideline, marking modification labels at positions corresponding to the modification parts in the first clinical guideline and the second clinical guideline respectively; if the first difference information is a new part of the first clinical guideline relative to the second clinical guideline, marking a new label at positions corresponding to the new part in the first clinical guideline and the second clinical guideline respectively; if the first difference information is the deleted part of the first clinical guideline relative to the second clinical guideline, marking deletion labels at positions corresponding to the modified parts in the first clinical guideline and the second clinical guideline respectively;
If the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source and the updated description does not exist in the first clinical guideline, matching the first guideline module with the second guideline module, wherein the matched first guideline module and second guideline module are respectively used as a first guideline module to be processed and a second guideline module to be processed, and the first guideline module which is not matched with all the second guideline modules is used as a third guideline module to be processed;
determining second difference information between the first clinical guideline and the second clinical guideline according to the first guideline to be processed module and the second guideline to be processed module, and marking corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively;
determining third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and marking corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively;
The process of matching the first guideline module and the second guideline module includes:
aiming at each first guide module, determining the similarity of the titles of the first guide modules and the titles of each second guide module by using a preset depth semantic matching model;
for each first guide module, if all the title similarities are smaller than a title similarity threshold, determining that the first guide module is not matched with all the second guide modules, and if at least one of the title similarities is larger than or equal to the title similarity threshold, determining that the first guide module is matched with the second guide module corresponding to the maximum title similarity;
the determining second difference information between the first clinical guideline and the second clinical guideline according to the first guideline to be processed module and the second guideline to be processed module, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline, respectively, includes:
sentence dividing processing is carried out on the first to-be-processed guide module and the second to-be-processed guide module respectively to obtain a plurality of first sentences corresponding to the first to-be-processed guide module and a plurality of second sentences corresponding to the second to-be-processed guide module;
For the m first sentences of the first to-be-processed guide modules, calculating the sentence similarity between the m first sentences and H second sentences of the second to-be-processed guide modules, wherein m is an integer greater than or equal to 1 and less than or equal to x, x is the total number of the first sentences contained in the first to-be-processed guide modules, m starts from 1 and increases by 1, H is an integer greater than or equal to 1 and less than or equal to y, and y is the total number of the second sentences contained in the second to-be-processed guide modules;
if the sentence similarity between the mth first sentence and the nth second sentence is equal to 1, determining that the mth first sentence and the nth second sentence are the same, and not executing marking processing, wherein n is an integer greater than or equal to 1 and less than or equal to y;
if the sentence similarity between the m-th first sentence and the n-th second sentence is greater than or equal to a sentence similarity threshold and less than 1, marking a modification label at a position corresponding to the m-th first sentence in the first clinical guideline, marking a modification label at a position corresponding to the n-th second sentence in the second clinical guideline, and when n is greater than m, determining a third sentence which is positioned before the n-th second sentence in the second guideline module to be processed, has sentence similarity smaller than the sentence similarity threshold and is not subjected to marking processing with the first sentence, and marking a deletion label at a position corresponding to the third sentence in the second clinical guideline;
If the sentence similarity between the mth first sentences and the H second sentences is smaller than the sentence similarity threshold, marking a new label at a position corresponding to the mth first sentences in the first clinical guideline;
the determining third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and labeling corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline, respectively, includes:
calculating the similarity of the first sentence between the first P% content of the first sentence in the third to-be-processed guide module and a plurality of second sentences of each second guide module;
if at least one first sentence similarity is larger than a first sentence similarity threshold, determining that the third to-be-processed guide module is matched with the second guide module corresponding to the maximum first sentence similarity;
starting from the first P% content of the first sentence of the third pending guideline module, changing the existing label in the first clinical guideline located behind the first P% content into a modified label;
And starting from a second sentence corresponding to the maximum first sentence similarity in the second guideline module matched with the third guideline module to be processed, and changing the existing label in the second clinical guideline positioned behind the second sentence into a modified label.
2. The method of claim 1, wherein the parsing and structuring the extracting of the first clinical guideline and the second clinical guideline, respectively, at least results in a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, further comprising:
and respectively preprocessing the first guide module and the second guide module, and respectively extracting knowledge features in the preprocessed first guide module and the preprocessed second guide module.
3. The method of claim 2, wherein the labeling of the modification tag at a location in the first clinical guideline corresponding to the mth first sentence, and labeling of the modification tag at a location in the second clinical guideline corresponding to the nth second sentence, further comprises:
comparing the difference between the knowledge features in the m-th first sentence and the n-th second sentence to obtain knowledge feature difference information;
And labeling corresponding labels at positions corresponding to the knowledge characteristic difference information in the mth first sentence and the nth second sentence respectively.
4. The method according to any one of claims 1-2, further comprising:
and displaying the labels in different categories by using different display forms respectively.
5. The method of claim 2, wherein after parsing and structuring the first clinical guideline and the second clinical guideline, respectively, to obtain at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, further comprising:
and carrying out standardization processing on the first guide module and the second guide module.
6. The method of claim 1, wherein after parsing and structuring the first clinical guideline and the second clinical guideline, respectively, to obtain at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, further comprising:
storing the first and second guideline modules into a database, storing all hierarchical relationships between the first guideline modules into the database, and storing all hierarchical relationships between the second guideline modules into the database.
7. An automatic identification system of clinical guideline update, characterized in that the automatic identification system is used to perform the automatic identification method of clinical guideline update according to any one of claims 1 to 6, the system comprising:
the analysis unit is used for respectively analyzing and structurally extracting a first clinical guideline and a second clinical guideline according to a module hierarchical structure tree established by utilizing each level of the clinical guideline in advance, at least obtaining a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, wherein the first guideline module is text content contained in a minimum level of the first clinical guideline, and the second guideline module is text content contained in the minimum level of the second clinical guideline;
the first processing unit is used for determining first difference information between the first clinical guideline and the second clinical guideline by utilizing the updated description of the first clinical guideline if the first clinical guideline and the second clinical guideline belong to the same source and when the updated description exists in the first clinical guideline relative to the second clinical guideline, and labeling corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline respectively, wherein the labels are newly added labels, deleted labels or modified labels; if the first difference information is the modification part of the first clinical guideline relative to the second clinical guideline, marking modification labels at positions corresponding to the modification parts in the first clinical guideline and the second clinical guideline respectively; if the first difference information is a new part of the first clinical guideline relative to the second clinical guideline, marking a new label at positions corresponding to the new part in the first clinical guideline and the second clinical guideline respectively; if the first difference information is the deleted part of the first clinical guideline relative to the second clinical guideline, marking deletion labels at positions corresponding to the modified parts in the first clinical guideline and the second clinical guideline respectively;
The second processing unit is configured to match the first guideline module with the second guideline module if the first clinical guideline belongs to a different source from the second clinical guideline, or match the first guideline module with the second guideline module if the first clinical guideline belongs to a same source from the second clinical guideline and the first clinical guideline does not have the updated description, and respectively use the matched first guideline module and second guideline module as a first guideline module to be processed and a second guideline module to be processed, and use the first guideline module that is not matched with all the second guideline modules as a third guideline module to be processed;
the third processing unit is used for determining second difference information between the first clinical guideline and the second clinical guideline according to the first guideline to be processed and the second guideline to be processed, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively;
and the fourth processing unit is used for determining third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and labeling corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110418664.XA CN112908487B (en) | 2021-04-19 | 2021-04-19 | Automatic identification method and system for updated content of clinical guideline |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110418664.XA CN112908487B (en) | 2021-04-19 | 2021-04-19 | Automatic identification method and system for updated content of clinical guideline |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112908487A CN112908487A (en) | 2021-06-04 |
CN112908487B true CN112908487B (en) | 2023-09-22 |
Family
ID=76110634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110418664.XA Active CN112908487B (en) | 2021-04-19 | 2021-04-19 | Automatic identification method and system for updated content of clinical guideline |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112908487B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113488180B (en) * | 2021-07-28 | 2023-07-18 | 中国医学科学院医学信息研究所 | Clinical guideline knowledge modeling method and system |
CN114398402A (en) * | 2021-12-31 | 2022-04-26 | 北京华彬立成科技有限公司 | Structured information extraction and retrieval method, device, electronic equipment and storage medium |
CN114580392B (en) * | 2022-04-29 | 2022-07-29 | 中科雨辰科技有限公司 | Data processing system for identifying entity |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0747836A1 (en) * | 1995-06-05 | 1996-12-11 | Hitachi, Ltd. | Method and apparatus for comparison of structured documents |
JP2009211480A (en) * | 2008-03-05 | 2009-09-17 | Nec Corp | Structured document processing system, structured document processing method, and structured document processing program |
WO2011107893A1 (en) * | 2010-03-04 | 2011-09-09 | Koninklijke Philips Electronics N.V. | Clinical decision support system with temporal context |
RU2592396C1 (en) * | 2015-02-03 | 2016-07-20 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | Method and system for machine extraction and interpretation of text information |
WO2017163346A1 (en) * | 2016-03-23 | 2017-09-28 | 株式会社野村総合研究所 | Text analysis system and program |
CN108491487A (en) * | 2018-03-14 | 2018-09-04 | 中国科学院重庆绿色智能技术研究院 | A kind of clinical guidelines knowledge encoding method and system |
CN110008304A (en) * | 2019-04-03 | 2019-07-12 | 网易(杭州)网络有限公司 | The difference visible processing method and device of behavior tree |
WO2020084734A1 (en) * | 2018-10-25 | 2020-04-30 | 日本電気株式会社 | Knowledge generation system, method, and program |
CN111145052A (en) * | 2019-12-26 | 2020-05-12 | 北京法意科技有限公司 | Structured analysis method and system of judicial documents |
EP3719805A1 (en) * | 2019-04-04 | 2020-10-07 | IQVIA Inc. | Predictive system for generating clinical queries |
KR20210040862A (en) * | 2020-03-31 | 2021-04-14 | 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. | Method and apparatus for constructing document heading tree, electronic device and storage medium |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7260773B2 (en) * | 2002-03-28 | 2007-08-21 | Uri Zernik | Device system and method for determining document similarities and differences |
US7788084B2 (en) * | 2006-09-19 | 2010-08-31 | Xerox Corporation | Labeling of work of art titles in text for natural language processing |
US9892111B2 (en) * | 2006-10-10 | 2018-02-13 | Abbyy Production Llc | Method and device to estimate similarity between documents having multiple segments |
US20140129246A1 (en) * | 2012-11-07 | 2014-05-08 | Koninklijke Philips Electronics N.V. | Extension of clinical guidelines based on clinical expert recommendations |
US9921731B2 (en) * | 2014-11-03 | 2018-03-20 | Cerner Innovation, Inc. | Duplication detection in clinical documentation |
US9792549B2 (en) * | 2014-11-21 | 2017-10-17 | International Business Machines Corporation | Extraction of semantic relations using distributional relation detection |
US10878962B2 (en) * | 2016-11-02 | 2020-12-29 | COTA, Inc. | System and method for extracting oncological information of prognostic significance from natural language |
US10839161B2 (en) * | 2017-06-15 | 2020-11-17 | Oracle International Corporation | Tree kernel learning for text classification into classes of intent |
US20190236102A1 (en) * | 2018-01-29 | 2019-08-01 | Planet Data Solutions | System and method for differential document analysis and storage |
US10838996B2 (en) * | 2018-03-15 | 2020-11-17 | International Business Machines Corporation | Document revision change summarization |
US20200027567A1 (en) * | 2018-07-17 | 2020-01-23 | Petuum Inc. | Systems and Methods for Automatically Generating International Classification of Diseases Codes for a Patient Based on Machine Learning |
US20200152336A1 (en) * | 2018-11-10 | 2020-05-14 | International Business Machines Corporation | Automated personalized annotation of clinical guidelines |
CA3122070A1 (en) * | 2018-12-03 | 2020-06-11 | Tempus Labs, Inc. | Clinical concept identification, extraction, and prediction system and related methods |
US10977292B2 (en) * | 2019-01-15 | 2021-04-13 | International Business Machines Corporation | Processing documents in content repositories to generate personalized treatment guidelines |
US11210472B2 (en) * | 2019-05-08 | 2021-12-28 | Tata Consultancy Services Limited | Automated extraction of message sequence chart from textual description |
KR102261078B1 (en) * | 2019-07-08 | 2021-06-03 | 경희대학교 산학협력단 | Ystem and method for converting clinical practice guidelines to computer interpretable model |
-
2021
- 2021-04-19 CN CN202110418664.XA patent/CN112908487B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0747836A1 (en) * | 1995-06-05 | 1996-12-11 | Hitachi, Ltd. | Method and apparatus for comparison of structured documents |
JP2009211480A (en) * | 2008-03-05 | 2009-09-17 | Nec Corp | Structured document processing system, structured document processing method, and structured document processing program |
WO2011107893A1 (en) * | 2010-03-04 | 2011-09-09 | Koninklijke Philips Electronics N.V. | Clinical decision support system with temporal context |
RU2592396C1 (en) * | 2015-02-03 | 2016-07-20 | Общество с ограниченной ответственностью "Аби ИнфоПоиск" | Method and system for machine extraction and interpretation of text information |
WO2017163346A1 (en) * | 2016-03-23 | 2017-09-28 | 株式会社野村総合研究所 | Text analysis system and program |
CN108491487A (en) * | 2018-03-14 | 2018-09-04 | 中国科学院重庆绿色智能技术研究院 | A kind of clinical guidelines knowledge encoding method and system |
WO2020084734A1 (en) * | 2018-10-25 | 2020-04-30 | 日本電気株式会社 | Knowledge generation system, method, and program |
CN110008304A (en) * | 2019-04-03 | 2019-07-12 | 网易(杭州)网络有限公司 | The difference visible processing method and device of behavior tree |
EP3719805A1 (en) * | 2019-04-04 | 2020-10-07 | IQVIA Inc. | Predictive system for generating clinical queries |
CN111145052A (en) * | 2019-12-26 | 2020-05-12 | 北京法意科技有限公司 | Structured analysis method and system of judicial documents |
KR20210040862A (en) * | 2020-03-31 | 2021-04-14 | 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. | Method and apparatus for constructing document heading tree, electronic device and storage medium |
Non-Patent Citations (4)
Title |
---|
guidance for updating clinical practice guidelines: a systematic review of methodological handbooks;robin wm vermooij,et al;《implementation science》(第3期);1-9 * |
structuring the chinese disjointed literature-based knowledge discovery system :the key technologies to success;qian qing ,et al;《journal of information science》;第38卷(第6期);532-539 * |
临床指南结构化研究;崔佳伟等;《中华医学图书情报杂志》;第29卷(第1期);35-41 * |
多源学术新媒体用户生成内容的知识聚合研究;陶兴;《信息科技》(第08期);I141-6 * |
Also Published As
Publication number | Publication date |
---|---|
CN112908487A (en) | 2021-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10706228B2 (en) | Heuristic domain targeted table detection and extraction technique | |
Yang et al. | Automatic detection of protected health information from clinic narratives | |
CN107408156B (en) | System and method for semantic search and extraction of relevant concepts from clinical documents | |
CN109192255B (en) | Medical record structuring method | |
CN112908487B (en) | Automatic identification method and system for updated content of clinical guideline | |
US9886427B2 (en) | Suggesting relevant terms during text entry | |
JP6022239B2 (en) | System and method for processing data | |
Wang | Annotating and recognising named entities in clinical notes | |
US20200234801A1 (en) | Methods and systems for healthcare clinical trials | |
US10339143B2 (en) | Systems and methods for relation extraction for Chinese clinical documents | |
JP4865526B2 (en) | Data mining system, data mining method, and data search system | |
Hammami et al. | Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach | |
US11270073B2 (en) | Method and system for extracting entity information from target data | |
Bretschneider et al. | Identifying pathological findings in German radiology reports using a syntacto-semantic parsing approach | |
CN113488180B (en) | Clinical guideline knowledge modeling method and system | |
Kaur et al. | Comparative analysis of algorithmic approaches for auto-coding with ICD-10-AM and ACHI | |
Fang et al. | Human gene name normalization using text matching with automatically extracted synonym dictionaries | |
Shin et al. | Natural language processing for large-scale medical image analysis using deep learning | |
CN112287664B (en) | Text index data analysis method and system, corresponding equipment and storage medium | |
Cotik et al. | Spanish named entity recognition in the biomedical domain | |
CN113343680B (en) | Structured information extraction method based on multi-type medical record text | |
Žabokrtský et al. | Towards universal segmentations: UniSegments 1.0 | |
US11544304B2 (en) | System and method for parsing user query | |
Gros et al. | Determining negation scope in german and english medical diagnoses | |
Berge et al. | Combining unsupervised, supervised, and rule-based algorithms for text mining of electronic health records-a clinical decision support system for identifying and classifying allergies of concern for anesthesia during surgery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |