CN118428471B - Atlas relation extraction method based on pre-training model enhancement - Google Patents
Atlas relation extraction method based on pre-training model enhancement Download PDFInfo
- Publication number
- CN118428471B CN118428471B CN202410876214.9A CN202410876214A CN118428471B CN 118428471 B CN118428471 B CN 118428471B CN 202410876214 A CN202410876214 A CN 202410876214A CN 118428471 B CN118428471 B CN 118428471B
- Authority
- CN
- China
- Prior art keywords
- suspension
- sequence
- mark
- entity
- relationship
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 52
- 238000012549 training Methods 0.000 title claims abstract description 46
- 239000000725 suspension Substances 0.000 claims abstract description 87
- 239000013598 vector Substances 0.000 claims abstract description 59
- 238000000034 method Methods 0.000 claims abstract description 30
- 230000006870 function Effects 0.000 claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000011218 segmentation Effects 0.000 claims description 45
- 201000010099 disease Diseases 0.000 claims description 14
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 14
- 238000007667 floating Methods 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 9
- 239000003550 marker Substances 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 9
- 208000024891 symptom Diseases 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 239000003814 drug Substances 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 241000764238 Isis Species 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 229940079593 drug Drugs 0.000 claims description 3
- 101100001674 Emericella variicolor andI gene Proteins 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 5
- 230000002776 aggregation Effects 0.000 abstract description 3
- 238000004220 aggregation Methods 0.000 abstract description 3
- 238000013461 design Methods 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
- G06N5/025—Extracting rules from data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a pre-training model-based enhanced graph relation extraction method, which comprises the following steps: constructing a relation extraction model, wherein the relation extraction model comprises a grouping suspension mark, a pre-training language model and a relation prediction layer; preprocessing text data and initializing grouping suspension marks to obtain characteristic sequences of the text and the suspension marks; calculating an attention mask; controlling the feature propagation direction of the pre-training language model by using the attention mask, and extracting the features of the suspension mark pairs; inputting the features of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector; and calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model. The invention provides an entity pair representation method for grouping suspension marks, which is characterized in that the suspension marks are grouped, each group multiplexes the characteristics of head entities, a specific attention mask is designed, the high-efficiency aggregation of the entities on the characteristics is realized, and the high-precision relation extraction is realized under the condition of less calculation amount.
Description
Technical Field
The invention relates to the field of deep learning and natural language processing, in particular to a pre-training model-based enhanced atlas relation extraction method.
Background
Relationship extraction is a task in natural language processing that aims to identify and extract relationships between entities from text. Given a piece of text and a labeled pair of entities, the goal of a task is to determine the type of relationship or class of relationship between these entities. Relationship extraction has important applications and values in the fields of natural language processing and information extraction, including but not limited to the following: knowledge graph construction, information retrieval and recommendation, event extraction and intelligence analysis, social network analysis, automatic question and answer, intelligent assistant and the like.
Most of the current relation extraction methods of medical knowledge graphs need to design complex relation extraction modules, carry out complex processing on text features output by a language model, and have large calculated amount and low calculation efficiency. A small part of methods can reduce the calculation amount to a certain extent by designing the suspension mark, however, the existing suspension mark methods have the problem of low expression efficiency, which hinders the research and landing of the algorithm. Therefore, how to design an entity extraction method, by improving the entity representation method, the entity extraction method can efficiently represent the entity characteristics, and has academic research significance and industrial application significance.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention discloses a pre-training model-based enhanced graph relation extraction method. Compared with the existing method, the method creatively provides an entity pair representation method of grouping suspension marks, the method groups suspension marks, multiplexes the characteristics of one head entity for each group, designs a specific attention mask for the grouping suspension marks, achieves efficient aggregation of the characteristics of the entities, and achieves high-precision relation extraction under the condition of less calculation amount.
The invention aims at realizing a graph relation extraction method based on pre-training model enhancement, which comprises the following steps:
step 1, constructing a relation extraction model, wherein the relation extraction model comprises a grouping suspension mark, a pre-training language model and a relation prediction layer;
Step 2, preprocessing text data and initializing grouping suspension marks to obtain characteristic sequences of the text and the suspension marks;
step 3, calculating an attention mask;
step 4, using the attention mask to control the feature propagation direction of the pre-training language model, and extracting the features of the suspension mark pairs;
step 5, inputting the characteristics of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector;
and 6, calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model.
The text data preprocessing and grouping suspension mark initializing are carried out to obtain a characteristic sequence of the text and the suspension mark, and the method comprises the following steps:
Step 201, word segmentation is carried out on an input text to obtain a word segmentation sequence;
Step 202, inserting a "< e >" mark before each entity of the word segmentation sequence, inserting a "</e >" mark after each entity for marking the position of the entity, inserting a start mark "< CLS >" in the head of the word segmentation sequence, and inserting a stop mark "< SEP >" in the tail of the word segmentation sequence;
Step 203, mapping the word segmentation sequence into a word vector sequence by using a word embedding model of the pre-training language model Roberta-large, wherein the total word segmentation number is The total entity number isThe word vector sequence mathematical expression obtained by mapping the word sequence is as follows:
wherein, A word vector representing the start tag "< CLS >",A word vector representing the end tag "< SEP >",A word vector representing the i-th word,A word vector representing an i < e > "th tag, each < e >" tag content being fixed, so each < e > "tag word vector is identical;
step 204, obtaining a position embedded sequence of the word segmentation sequence by using a position embedded model of the pre-training language model Roberta-large, and for the word segmentation sequence in step 203, obtaining a mathematical expression of the position embedded sequence as follows:
wherein, The position of the start tag "< CLS >" is embedded,The location of the end mark "< SEP >" is embedded,The position of the i-th word is indicated to be embedded,The position embedding representing the i < e > "mark, each < e >" mark position being different, so the position embedding of each < e > "mark is different;
Step 205, mapping the word vector sequence obtained by word segmentation sequence And position embedded sequence of word segmentation sequenceAdding according to elements to obtain feature embedded sequence of word segmentation sequenceThe mathematical expression is:
Step 206, generating a suspension mark feature; the ith suspension mark is characterized by the word vector of the ith "< e >" mark The position of the i < e > "mark is embeddedThe mathematical expression is:
wherein, Features representing the ith suspension mark;
Step 207, generating a suspension mark feature sequence; the entity number is m, and m suspension marks are provided, so that a suspension mark characteristic sequence containing m groups of suspension marks is generated, and the generation mode of the ith group of suspension marks is as follows: features of the ith suspension mark Placed at the beginning of the i-th set of floating-point sequences, other floating-point sequences are arranged in the order of appearance in the text, from small to large, behind the i-th set of floating-point sequences, where i = 1,2,3, …, m; sequentially splicing m groups of suspension mark characteristic sequences to obtain a suspension mark characteristic sequence with the length ofSuspension marker signature sequences of (2);
Step 208, embedding the features of the word segmentation sequence into the sequenceAnd a suspension marker feature sequenceSpliced together, the mathematical expression is:
wherein, Characteristic sequences representing text and hover marks.
The calculating the attention mask comprises the following steps:
the characteristic embedded sequence of the word segmentation sequence Sequence length isSuspension marker feature sequencesSequence length isNumber of entitiesGenerates a size ofIs a matrix of (a)The mathematical expression of the element assignment in the matrix is:
wherein, Is the attention mask of the person,Representation ofElements of row i and column j.
The method for extracting the feature of the suspension mark pair by using the attention mask to control the feature propagation direction of the pre-training language model comprises the following steps:
Step 401, feature sequences of the text and the suspension mark Input into pre-trained language model Roberta-large and mask with the attentionAs a mask for Roberta-large forward propagation, the mathematical expression is:
wherein, Is thatThe last hidden layer of the output is characterized, d isIs used to determine the hidden layer dimension of the (c),Embedding sequences for features of said word segmentation sequencesThe length of the sequence is set to be,For said suspended tag feature sequencesSequence length;
step 402, slave Features of the last hidden layer of the outputThe characteristics of each entity pair are selected, and the mathematical expression is as follows:
wherein, Features representing pairs of floating marks for the i-th entity and the j-th entity,Representing an operation of indexing from the 0 th dimension of the target tensor.
The method inputs the characteristics of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector, and comprises the following steps:
Features of pairs of floating marks for the i-th and j-th entities Inputting the full connection layer to obtain a relation prediction vector of the ith entity and the jth entity, wherein the mathematical expression is as follows:
wherein, Representing the relationship prediction vector of the i-th entity and the j-th entity,Representing the weight matrix of the fully connected layer,Representing the bias vector of the fully connected layer, C representing the number of relationship categories, d representing the dimension of the floating token pair feature,Is an activation function for normalizing the vector into a probability distribution.
The method comprises the following steps of calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model:
calculating a relationship prediction vector for an ith entity and a jth entity Tag associated with realityCross entropy loss between the two, the mathematical expression is:
wherein, A true relationship label representing an ith entity and a jth entity, if the ith entity and the jth entity have a kth type relationshipOtherwise,The probability that the ith entity and the jth entity representing model prediction have the kth class is the relation prediction vectorIndex value of (2);
The cross entropy loss of all entity pairs is calculated and the mathematical expression is as follows:
wherein, Representing the total cross entropy loss;
pairs using Adam optimization algorithm And optimizing, and training a relation extraction model.
Compared with the prior art, the method has the advantages that: the technology provides a pre-training model-based enhanced graph relation extraction method. The method innovatively provides an entity pair representation method of grouping suspension marks, by grouping the suspension marks, multiplexing the characteristics of one head entity for each group, and designing a specific attention mask for the grouping suspension marks, the efficient aggregation of the entity pair characteristics is realized, and the high-precision relation extraction is realized under the condition of less calculation amount.
Drawings
Fig. 1 shows a schematic flow chart of an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Knowledge graph is a collection of knowledge and its links described in a structured form, which is a knowledge representation that organizes entities, attributes and relationships into a graphical structure, and aims to better describe and understand the knowledge and concepts of the world, and can be used to store, query, infer and analyze knowledge. The knowledge graph is designed not only by representing the knowledge in the form of a graph structure, but also by considering how to define the attribute and the relation, how to establish the relation between the attribute and the relation, how to perform the operations of inquiry, reasoning, analysis and the like. Such a design makes knowledge graph a powerful tool for storing and manipulating a large amount of complex knowledge.
In this embodiment, there is massive knowledge in the medical field in the internet that can be used for medical condition consultation and health care, but the traditional search engine cannot make reasonable judgment according to the actual condition of the patient, and cannot meet the use requirement. Assuming that a large-scale medical knowledge graph is built, a huge amount of text data needs to be crawled on the Internet and structured, and relationship extraction is used as a key link in text structuring, and in the process of carrying out relationship extraction on texts, a graph relationship extraction method based on pre-training model enhancement is used for the medical field and extracting medical related relationships. A reliable Chinese medical knowledge system is built, and the system can help to meet the demands of people on the knowledge related to daily diseases and has high application value.
The medical knowledge graph (Medical Knowledge Graph) serves as the core of medical artificial intelligence, is essentially a semantic network for revealing relationships between medical entities, and can formally describe things and correlations of things in the real world. In general, a medical knowledge graph is constructed by continuously expanding entities and relationships based on manually constructed expert knowledge through algorithms and expert auditing, and comprises medical concepts and various medical relationships such as diseases, symptoms, medicines, operations and the like. In a wide range of medical scenarios, medical knowledge maps have proven to be effective in providing medical knowledge support for algorithms and medical interpretation of predicted outcomes of algorithms. In the foreseeable future, knowledge maps will play a vital role in the field of medical treatment, which is a strong knowledge attribute. Therefore, the method for extracting the relationship between the patterns based on the enhancement of the pre-training model can provide very important support for extracting the relationship between the medical knowledge patterns.
Thus, as shown in fig. 1, a method for extracting a graph relationship based on pre-training model enhancement, the method comprising:
step 1, constructing a relation extraction model, wherein the relation extraction model comprises a grouping suspension mark, a pre-training language model and a relation prediction layer;
Step 2, preprocessing text data and initializing grouping suspension marks to obtain characteristic sequences of the text and the suspension marks;
step 3, calculating an attention mask;
step 4, using the attention mask to control the feature propagation direction of the pre-training language model, and extracting the features of the suspension mark pairs;
step 5, inputting the characteristics of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector;
and 6, calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model.
The map is a medical knowledge map, the entities of the map comprise diseases, symptoms, medicines and operations, and the relationship of the map comprises disease-symptom relationship, disease-medicine relationship, disease-disease relationship, symptom-symptom relationship and disease-operation relationship.
The text data preprocessing and grouping suspension mark initializing are carried out to obtain a characteristic sequence of the text and the suspension mark, and the method comprises the following steps:
Step 201, word segmentation is carried out on an input text to obtain a word segmentation sequence;
Step 202, inserting a "< e >" mark before each entity of the word segmentation sequence, inserting a "</e >" mark after each entity for marking the position of the entity, inserting a start mark "< CLS >" in the head of the word segmentation sequence, and inserting a stop mark "< SEP >" in the tail of the word segmentation sequence;
Step 203, mapping the word segmentation sequence into a word vector sequence by using a word embedding model of the pre-training language model Roberta-large, wherein the total word segmentation number is The total entity number isThe word vector sequence mathematical expression obtained by mapping the word sequence is as follows:
wherein, A word vector representing the start tag "< CLS >",A word vector representing the end tag "< SEP >",A word vector representing the i-th word,A word vector representing an i < e > "th tag, each < e >" tag content being fixed, so each < e > "tag word vector is identical;
step 204, obtaining a position embedded sequence of the word segmentation sequence by using a position embedded model of the pre-training language model Roberta-large, and for the word segmentation sequence in step 203, obtaining a mathematical expression of the position embedded sequence as follows:
wherein, The position of the start tag "< CLS >" is embedded,The location of the end mark "< SEP >" is embedded,The position of the i-th word is indicated to be embedded,The position embedding representing the i < e > "mark, each < e >" mark position being different, so the position embedding of each < e > "mark is different;
Step 205, mapping the word vector sequence obtained by word segmentation sequence And position embedded sequence of word segmentation sequenceAdding according to elements to obtain feature embedded sequence of word segmentation sequenceThe mathematical expression is:
Step 206, generating a suspension mark feature; the ith suspension mark is characterized by the word vector of the ith "< e >" mark The position of the i < e > "mark is embeddedThe mathematical expression is:
wherein, Features representing the ith suspension mark;
Step 207, generating a suspension mark feature sequence; the entity number is m, and m suspension marks are provided, so that a suspension mark characteristic sequence containing m groups of suspension marks is generated, and the generation mode of the ith group of suspension marks is as follows: features of the ith suspension mark Placed at the beginning of the i-th set of floating-point sequences, other floating-point sequences are arranged in the order of appearance in the text, from small to large, behind the i-th set of floating-point sequences, where i = 1,2,3, …, m; sequentially splicing m groups of suspension mark characteristic sequences to obtain a suspension mark characteristic sequence with the length ofSuspension marker signature sequences of (2);
Step 208, embedding the features of the word segmentation sequence into the sequenceAnd a suspension marker feature sequenceSpliced together, the mathematical expression is:
wherein, Characteristic sequences representing text and hover marks.
RoBERTa-large is one of the variants based on the BERT (Bidirectional Encoder Representations from Transformers) model, developed by Facebook AI (now referred to as Meta AI). RoBERTa, collectively referred to as "A Robustly Optimized BERT Pretraining Approach", was modified and optimized based on BERT. The following are some key features of RoBERTa-large: (1) model scale: roBERTa-large is larger than BERT-large, having 24 layers of Transformer encoder, with 1024 hidden units per layer, for a total of 355M parameters. In contrast, BERT-large has 24 layers, each with 1024 hidden units, for a total of 340M parameters. (2) amount of pre-training data: roBERTa a larger pre-training dataset was used, about 160GB of data, much more than 16GB of BERT. This includes datasets BookCorpus, ENGLISH WIKIPEDIA, CC-News, openWebText, stories, and the like. (3) pretraining strategy: roBERTa are more optimized during the pre-training process. For example, the Next Sentence Prediction (NSP) task in BERT is eliminated and a longer training sequence (longer sentence) is used. (4) training time: roBERTa perform a longer pre-training to ensure that the model captures language patterns and context better. (5) effect improvement: due to the optimization described above, roBERTa performs better than BERT across multiple natural language processing tasks, including tasks such as text classification, question-answering, text generation, and the like.
The calculating the attention mask comprises the following steps:
the characteristic embedded sequence of the word segmentation sequence Sequence length isSuspension marker feature sequencesSequence length isNumber of entitiesGenerates a size ofIs a matrix of (a)The mathematical expression of the element assignment in the matrix is:
wherein, Is the attention mask of the person,Representation ofElements of row i and column j.
One of the roles of the attention mask in the transducer model is to control the propagation of information, i.e. information that determines which locations can affect each other.
The attention mechanism interacts each location with other locations in calculating the attention weight and assigns weights according to their relevance. By marking certain locations in the attention mask, we can control whether the model takes these locations into account when calculating the attention weight.
The method for extracting the feature of the suspension mark pair by using the attention mask to control the feature propagation direction of the pre-training language model comprises the following steps:
Step 401, feature sequences of the text and the suspension mark Input into pre-trained language model Roberta-large and mask with the attentionAs a mask for Roberta-large forward propagation, the mathematical expression is:
wherein, Is thatThe last hidden layer of the output is characterized, d isIs used to determine the hidden layer dimension of the (c),Embedding sequences for features of said word segmentation sequencesThe length of the sequence is set to be,For said suspended tag feature sequencesSequence length;
step 402, slave Features of the last hidden layer of the outputThe characteristics of each entity pair are selected, and the mathematical expression is as follows:
wherein, Features representing pairs of floating marks for the i-th entity and the j-th entity,Representing an operation of indexing from the 0 th dimension of the target tensor.
The method inputs the characteristics of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector, and comprises the following steps:
Features of pairs of floating marks for the i-th and j-th entities Inputting the full connection layer to obtain a relation prediction vector of the ith entity and the jth entity, wherein the mathematical expression is as follows:
wherein, Representing the relationship prediction vector of the i-th entity and the j-th entity,Representing the weight matrix of the fully connected layer,Representing the bias vector of the fully connected layer, C representing the number of relationship categories, d representing the dimension of the floating token pair feature,Is an activation function for normalizing the vector into a probability distribution.
The method comprises the following steps of calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model:
calculating a relationship prediction vector for an ith entity and a jth entity Tag associated with realityCross entropy loss between the two, the mathematical expression is:
wherein, A true relationship label representing an ith entity and a jth entity, if the ith entity and the jth entity have a kth type relationshipOtherwise,The probability that the ith entity and the jth entity representing model prediction have the kth class is the relation prediction vectorIndex value of (2);
The cross entropy loss of all entity pairs is calculated and the mathematical expression is as follows:
wherein, Representing the total cross entropy loss;
pairs using Adam optimization algorithm And optimizing, and training a relation extraction model.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Claims (3)
1. The method for extracting the map relation based on the pre-training model enhancement is characterized by comprising the following steps:
step 1, constructing a relation extraction model, wherein the relation extraction model comprises a grouping suspension mark, a pre-training language model and a relation prediction layer;
Step 2, preprocessing text data and initializing grouping suspension marks to obtain characteristic sequences of the text and the suspension marks;
step 3, calculating an attention mask;
step 4, using the attention mask to control the feature propagation direction of the pre-training language model, and extracting the features of the suspension mark pairs;
step 5, inputting the characteristics of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector;
step 6, calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model;
The map is a medical knowledge map, the entities of the map comprise diseases, symptoms, medicines and operations, and the relationship of the map comprises disease-symptom relationship, disease-medicine relationship, disease-disease relationship, symptom-symptom relationship and disease-operation relationship;
the text data preprocessing and grouping suspension mark initializing are carried out to obtain a characteristic sequence of the text and the suspension mark, and the method comprises the following steps:
Step 201, word segmentation is carried out on an input text to obtain a word segmentation sequence;
Step 202, inserting a "< e >" mark before each entity of the word segmentation sequence, inserting a "</e >" mark after each entity for marking the position of the entity, inserting a start mark "< CLS >" in the head of the word segmentation sequence, and inserting a stop mark "< SEP >" in the tail of the word segmentation sequence;
Step 203, mapping the word segmentation sequence into a word vector sequence by using a word embedding model of the pre-training language model Roberta-large, wherein the total word segmentation number is The total entity number isThe word vector sequence mathematical expression obtained by mapping the word sequence is as follows:
;
wherein, A word vector representing the start tag "< CLS >",A word vector representing the end tag "< SEP >",A word vector representing the i-th word,A word vector representing an i < e > "th tag, each < e >" tag content being fixed, so each < e > "tag word vector is identical;
Step 204, obtaining a position embedded sequence of a word segmentation sequence by using a position embedded model of a pre-training language model Roberta-large, wherein the mathematical expression of the position embedded sequence obtained for the word segmentation sequence is as follows:
;
wherein, The position of the start tag "< CLS >" is embedded,The location of the end mark "< SEP >" is embedded,The position of the i-th word is indicated to be embedded,The position embedding representing the i < e > "mark, each < e >" mark position being different, so the position embedding of each < e > "mark is different;
Step 205, mapping the word vector sequence obtained by word segmentation sequence And position embedded sequence of word segmentation sequenceAdding according to elements to obtain feature embedded sequence of word segmentation sequenceThe mathematical expression is:
;
Step 206, generating a suspension mark feature; the ith suspension mark is characterized by the word vector of the ith "< e >" mark The position of the i < e > "mark is embeddedThe mathematical expression is:
;
wherein, Features representing the ith suspension mark;
Step 207, generating a suspension mark feature sequence; the entity number is m, and m suspension marks are provided, so that a suspension mark characteristic sequence containing m groups of suspension marks is generated, and the generation mode of the ith group of suspension marks is as follows: features of the ith suspension mark Placed at the beginning of the i-th set of floating-point sequences, other floating-point sequences are arranged in the order of appearance in the text, from small to large, behind the i-th set of floating-point sequences, where i = 1,2,3, …, m; sequentially splicing m groups of suspension mark characteristic sequences to obtain a suspension mark characteristic sequence with the length ofSuspension marker signature sequences of (2);
Step 208, embedding the features of the word segmentation sequence into the sequenceAnd a suspension marker feature sequenceSpliced together, the mathematical expression is:
;
wherein, A feature sequence representing text and a hover mark;
The calculating the attention mask comprises the following steps:
the characteristic embedded sequence of the word segmentation sequence Sequence length isSuspension marker feature sequencesSequence length isNumber of entitiesGenerates a size ofIs a matrix of (a)The mathematical expression of the element assignment in the matrix is:
;
wherein, Is the attention mask of the person,Representation ofElements of the ith row and the jth column;
The method for extracting the feature of the suspension mark pair by using the attention mask to control the feature propagation direction of the pre-training language model comprises the following steps:
Step 401, feature sequences of the text and the suspension mark Input into pre-trained language model Roberta-large and mask with the attentionAs a mask for Roberta-large forward propagation, the mathematical expression is:
;
wherein, Is thatThe last hidden layer of the output is characterized, d isIs used to determine the hidden layer dimension of the (c),Embedding sequences for features of said word segmentation sequencesThe length of the sequence is set to be,For said suspended tag feature sequencesSequence length;
step 402, slave Features of the last hidden layer of the outputThe characteristics of each entity pair are selected, and the mathematical expression is as follows:
;
wherein, Features representing pairs of floating marks for the i-th entity and the j-th entity,Representing an operation of indexing from the 0 th dimension of the target tensor.
2. The method for extracting the relationship between the atlases based on the enhancement of the pre-training model according to claim 1, wherein the step of inputting the features of the suspension mark pairs into the relationship prediction layer to obtain the relationship probability vector comprises the following steps:
Features of pairs of floating marks for the i-th and j-th entities Inputting the full connection layer to obtain a relation prediction vector of the ith entity and the jth entity, wherein the mathematical expression is as follows:
;
wherein, Representing the relationship prediction vector of the i-th entity and the j-th entity,Representing the weight matrix of the fully connected layer,Representing bias vectors of the fully connected layers, C representing the number of relationship categories, d representing the dimension of the floating mark pair feature, the dimension of the floating mark pair feature andIs equal in the dimension of the hidden layer of (c),Is an activation function for normalizing the vector into a probability distribution.
3. The method for extracting the relationship between the patterns based on the enhancement of the pre-training model according to claim 2, wherein the steps of calculating the loss function for the relationship probability vector, optimizing the loss function, training the relationship extraction model, and extracting the relationship by using the relationship extraction model comprise the following steps:
calculating a relationship prediction vector for an ith entity and a jth entity Tag associated with realityCross entropy loss between the two, the mathematical expression is:
;
wherein, A true relationship label representing an ith entity and a jth entity, if the ith entity and the jth entity have a kth type relationshipOtherwise,The probability that the ith entity and the jth entity representing model prediction have the kth class is the relation prediction vectorIndex value of (2);
The cross entropy loss of all entity pairs is calculated and the mathematical expression is as follows:
;
wherein, Representing the total cross entropy loss;
pairs using Adam optimization algorithm And optimizing, and training a relation extraction model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410876214.9A CN118428471B (en) | 2024-07-02 | 2024-07-02 | Atlas relation extraction method based on pre-training model enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410876214.9A CN118428471B (en) | 2024-07-02 | 2024-07-02 | Atlas relation extraction method based on pre-training model enhancement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118428471A CN118428471A (en) | 2024-08-02 |
CN118428471B true CN118428471B (en) | 2024-09-24 |
Family
ID=92326091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410876214.9A Active CN118428471B (en) | 2024-07-02 | 2024-07-02 | Atlas relation extraction method based on pre-training model enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118428471B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116313121A (en) * | 2022-12-30 | 2023-06-23 | 北京邮电大学 | Standardized construction method for high-robustness medical knowledge graph of pipeline type |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114357176B (en) * | 2021-11-26 | 2023-11-21 | 永中软件股份有限公司 | Entity knowledge automatic extraction method, computer device and computer readable medium |
CN115392256A (en) * | 2022-08-29 | 2022-11-25 | 重庆师范大学 | Drug adverse event relation extraction method based on semantic segmentation |
CN116186277A (en) * | 2022-12-06 | 2023-05-30 | 同济大学 | Chinese knowledge graph construction method based on CasRel model |
CN115952284A (en) * | 2022-12-09 | 2023-04-11 | 昆明理工大学 | Medical text relation extraction method fusing density clustering and ERNIE |
CN116956940A (en) * | 2023-08-03 | 2023-10-27 | 杭州电子科技大学 | Text event extraction method based on multi-directional traversal and prompt learning |
CN118133785A (en) * | 2024-04-08 | 2024-06-04 | 云南律奥新技术开发有限公司 | Document Relation Extraction Method Based on Relation Template Evidence Extraction |
-
2024
- 2024-07-02 CN CN202410876214.9A patent/CN118428471B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116313121A (en) * | 2022-12-30 | 2023-06-23 | 北京邮电大学 | Standardized construction method for high-robustness medical knowledge graph of pipeline type |
Also Published As
Publication number | Publication date |
---|---|
CN118428471A (en) | 2024-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110298037B (en) | Convolutional neural network matching text recognition method based on enhanced attention mechanism | |
CN110210037B (en) | Syndrome-oriented medical field category detection method | |
Vlad et al. | Sentence-level propaganda detection in news articles with transfer learning and BERT-BiLSTM-capsule model | |
CN111027595B (en) | Double-stage semantic word vector generation method | |
CN110209822A (en) | Sphere of learning data dependence prediction technique based on deep learning, computer | |
CN109657230A (en) | Merge the name entity recognition method and device of term vector and part of speech vector | |
CN113128233B (en) | Construction method and system of mental disease knowledge map | |
CN114077673B (en) | Knowledge graph construction method based on BTBC model | |
CN114925157B (en) | Nuclear power station maintenance experience text matching method based on pre-training model | |
CN112069825B (en) | Entity relation joint extraction method for alert condition record data | |
CN117763363A (en) | Cross-network academic community resource recommendation method based on knowledge graph and prompt learning | |
CN117217223A (en) | Chinese named entity recognition method and system based on multi-feature embedding | |
CN117033423A (en) | SQL generating method for injecting optimal mode item and historical interaction information | |
CN109189848A (en) | Abstracting method, system, computer equipment and the storage medium of knowledge data | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
CN115510230A (en) | Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism | |
CN112800244B (en) | Method for constructing knowledge graph of traditional Chinese medicine and national medicine | |
Jin et al. | Textual content prediction via fuzzy attention neural network model without predefined knowledge | |
CN117194682B (en) | Method, device and medium for constructing knowledge graph based on power grid related file | |
CN117056459B (en) | Vector recall method and device | |
CN118428471B (en) | Atlas relation extraction method based on pre-training model enhancement | |
CN114881038B (en) | Chinese entity and relation extraction method and device based on span and attention mechanism | |
CN117932066A (en) | Pre-training-based 'extraction-generation' answer generation model and method | |
Wang et al. | A Named Entity Recognition Model Based on Entity Trigger Reinforcement Learning | |
CN118469006B (en) | Knowledge graph construction method, device, medium and chip for electric power operation text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |