CN118428471B - Atlas relation extraction method based on pre-training model enhancement - Google Patents

Atlas relation extraction method based on pre-training model enhancement Download PDF

Info

Publication number
CN118428471B
CN118428471B CN202410876214.9A CN202410876214A CN118428471B CN 118428471 B CN118428471 B CN 118428471B CN 202410876214 A CN202410876214 A CN 202410876214A CN 118428471 B CN118428471 B CN 118428471B
Authority
CN
China
Prior art keywords
suspension
sequence
mark
entity
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410876214.9A
Other languages
Chinese (zh)
Other versions
CN118428471A (en
Inventor
关相承
修保新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Dongyin Information Technology Co ltd
Original Assignee
Hunan Dongyin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Dongyin Information Technology Co ltd filed Critical Hunan Dongyin Information Technology Co ltd
Priority to CN202410876214.9A priority Critical patent/CN118428471B/en
Publication of CN118428471A publication Critical patent/CN118428471A/en
Application granted granted Critical
Publication of CN118428471B publication Critical patent/CN118428471B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a pre-training model-based enhanced graph relation extraction method, which comprises the following steps: constructing a relation extraction model, wherein the relation extraction model comprises a grouping suspension mark, a pre-training language model and a relation prediction layer; preprocessing text data and initializing grouping suspension marks to obtain characteristic sequences of the text and the suspension marks; calculating an attention mask; controlling the feature propagation direction of the pre-training language model by using the attention mask, and extracting the features of the suspension mark pairs; inputting the features of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector; and calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model. The invention provides an entity pair representation method for grouping suspension marks, which is characterized in that the suspension marks are grouped, each group multiplexes the characteristics of head entities, a specific attention mask is designed, the high-efficiency aggregation of the entities on the characteristics is realized, and the high-precision relation extraction is realized under the condition of less calculation amount.

Description

Atlas relation extraction method based on pre-training model enhancement
Technical Field
The invention relates to the field of deep learning and natural language processing, in particular to a pre-training model-based enhanced atlas relation extraction method.
Background
Relationship extraction is a task in natural language processing that aims to identify and extract relationships between entities from text. Given a piece of text and a labeled pair of entities, the goal of a task is to determine the type of relationship or class of relationship between these entities. Relationship extraction has important applications and values in the fields of natural language processing and information extraction, including but not limited to the following: knowledge graph construction, information retrieval and recommendation, event extraction and intelligence analysis, social network analysis, automatic question and answer, intelligent assistant and the like.
Most of the current relation extraction methods of medical knowledge graphs need to design complex relation extraction modules, carry out complex processing on text features output by a language model, and have large calculated amount and low calculation efficiency. A small part of methods can reduce the calculation amount to a certain extent by designing the suspension mark, however, the existing suspension mark methods have the problem of low expression efficiency, which hinders the research and landing of the algorithm. Therefore, how to design an entity extraction method, by improving the entity representation method, the entity extraction method can efficiently represent the entity characteristics, and has academic research significance and industrial application significance.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention discloses a pre-training model-based enhanced graph relation extraction method. Compared with the existing method, the method creatively provides an entity pair representation method of grouping suspension marks, the method groups suspension marks, multiplexes the characteristics of one head entity for each group, designs a specific attention mask for the grouping suspension marks, achieves efficient aggregation of the characteristics of the entities, and achieves high-precision relation extraction under the condition of less calculation amount.
The invention aims at realizing a graph relation extraction method based on pre-training model enhancement, which comprises the following steps:
step 1, constructing a relation extraction model, wherein the relation extraction model comprises a grouping suspension mark, a pre-training language model and a relation prediction layer;
Step 2, preprocessing text data and initializing grouping suspension marks to obtain characteristic sequences of the text and the suspension marks;
step 3, calculating an attention mask;
step 4, using the attention mask to control the feature propagation direction of the pre-training language model, and extracting the features of the suspension mark pairs;
step 5, inputting the characteristics of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector;
and 6, calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model.
The text data preprocessing and grouping suspension mark initializing are carried out to obtain a characteristic sequence of the text and the suspension mark, and the method comprises the following steps:
Step 201, word segmentation is carried out on an input text to obtain a word segmentation sequence;
Step 202, inserting a "< e >" mark before each entity of the word segmentation sequence, inserting a "</e >" mark after each entity for marking the position of the entity, inserting a start mark "< CLS >" in the head of the word segmentation sequence, and inserting a stop mark "< SEP >" in the tail of the word segmentation sequence;
Step 203, mapping the word segmentation sequence into a word vector sequence by using a word embedding model of the pre-training language model Roberta-large, wherein the total word segmentation number is The total entity number isThe word vector sequence mathematical expression obtained by mapping the word sequence is as follows:
wherein, A word vector representing the start tag "< CLS >",A word vector representing the end tag "< SEP >",A word vector representing the i-th word,A word vector representing an i < e > "th tag, each < e >" tag content being fixed, so each < e > "tag word vector is identical;
step 204, obtaining a position embedded sequence of the word segmentation sequence by using a position embedded model of the pre-training language model Roberta-large, and for the word segmentation sequence in step 203, obtaining a mathematical expression of the position embedded sequence as follows:
wherein, The position of the start tag "< CLS >" is embedded,The location of the end mark "< SEP >" is embedded,The position of the i-th word is indicated to be embedded,The position embedding representing the i < e > "mark, each < e >" mark position being different, so the position embedding of each < e > "mark is different;
Step 205, mapping the word vector sequence obtained by word segmentation sequence And position embedded sequence of word segmentation sequenceAdding according to elements to obtain feature embedded sequence of word segmentation sequenceThe mathematical expression is:
Step 206, generating a suspension mark feature; the ith suspension mark is characterized by the word vector of the ith "< e >" mark The position of the i < e > "mark is embeddedThe mathematical expression is:
wherein, Features representing the ith suspension mark;
Step 207, generating a suspension mark feature sequence; the entity number is m, and m suspension marks are provided, so that a suspension mark characteristic sequence containing m groups of suspension marks is generated, and the generation mode of the ith group of suspension marks is as follows: features of the ith suspension mark Placed at the beginning of the i-th set of floating-point sequences, other floating-point sequences are arranged in the order of appearance in the text, from small to large, behind the i-th set of floating-point sequences, where i = 1,2,3, …, m; sequentially splicing m groups of suspension mark characteristic sequences to obtain a suspension mark characteristic sequence with the length ofSuspension marker signature sequences of (2)
Step 208, embedding the features of the word segmentation sequence into the sequenceAnd a suspension marker feature sequenceSpliced together, the mathematical expression is:
wherein, Characteristic sequences representing text and hover marks.
The calculating the attention mask comprises the following steps:
the characteristic embedded sequence of the word segmentation sequence Sequence length isSuspension marker feature sequencesSequence length isNumber of entitiesGenerates a size ofIs a matrix of (a)The mathematical expression of the element assignment in the matrix is:
wherein, Is the attention mask of the person,Representation ofElements of row i and column j.
The method for extracting the feature of the suspension mark pair by using the attention mask to control the feature propagation direction of the pre-training language model comprises the following steps:
Step 401, feature sequences of the text and the suspension mark Input into pre-trained language model Roberta-large and mask with the attentionAs a mask for Roberta-large forward propagation, the mathematical expression is:
wherein, Is thatThe last hidden layer of the output is characterized, d isIs used to determine the hidden layer dimension of the (c),Embedding sequences for features of said word segmentation sequencesThe length of the sequence is set to be,For said suspended tag feature sequencesSequence length;
step 402, slave Features of the last hidden layer of the outputThe characteristics of each entity pair are selected, and the mathematical expression is as follows:
wherein, Features representing pairs of floating marks for the i-th entity and the j-th entity,Representing an operation of indexing from the 0 th dimension of the target tensor.
The method inputs the characteristics of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector, and comprises the following steps:
Features of pairs of floating marks for the i-th and j-th entities Inputting the full connection layer to obtain a relation prediction vector of the ith entity and the jth entity, wherein the mathematical expression is as follows:
wherein, Representing the relationship prediction vector of the i-th entity and the j-th entity,Representing the weight matrix of the fully connected layer,Representing the bias vector of the fully connected layer, C representing the number of relationship categories, d representing the dimension of the floating token pair feature,Is an activation function for normalizing the vector into a probability distribution.
The method comprises the following steps of calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model:
calculating a relationship prediction vector for an ith entity and a jth entity Tag associated with realityCross entropy loss between the two, the mathematical expression is:
wherein, A true relationship label representing an ith entity and a jth entity, if the ith entity and the jth entity have a kth type relationshipOtherwiseThe probability that the ith entity and the jth entity representing model prediction have the kth class is the relation prediction vectorIndex value of (2);
The cross entropy loss of all entity pairs is calculated and the mathematical expression is as follows:
wherein, Representing the total cross entropy loss;
pairs using Adam optimization algorithm And optimizing, and training a relation extraction model.
Compared with the prior art, the method has the advantages that: the technology provides a pre-training model-based enhanced graph relation extraction method. The method innovatively provides an entity pair representation method of grouping suspension marks, by grouping the suspension marks, multiplexing the characteristics of one head entity for each group, and designing a specific attention mask for the grouping suspension marks, the efficient aggregation of the entity pair characteristics is realized, and the high-precision relation extraction is realized under the condition of less calculation amount.
Drawings
Fig. 1 shows a schematic flow chart of an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Knowledge graph is a collection of knowledge and its links described in a structured form, which is a knowledge representation that organizes entities, attributes and relationships into a graphical structure, and aims to better describe and understand the knowledge and concepts of the world, and can be used to store, query, infer and analyze knowledge. The knowledge graph is designed not only by representing the knowledge in the form of a graph structure, but also by considering how to define the attribute and the relation, how to establish the relation between the attribute and the relation, how to perform the operations of inquiry, reasoning, analysis and the like. Such a design makes knowledge graph a powerful tool for storing and manipulating a large amount of complex knowledge.
In this embodiment, there is massive knowledge in the medical field in the internet that can be used for medical condition consultation and health care, but the traditional search engine cannot make reasonable judgment according to the actual condition of the patient, and cannot meet the use requirement. Assuming that a large-scale medical knowledge graph is built, a huge amount of text data needs to be crawled on the Internet and structured, and relationship extraction is used as a key link in text structuring, and in the process of carrying out relationship extraction on texts, a graph relationship extraction method based on pre-training model enhancement is used for the medical field and extracting medical related relationships. A reliable Chinese medical knowledge system is built, and the system can help to meet the demands of people on the knowledge related to daily diseases and has high application value.
The medical knowledge graph (Medical Knowledge Graph) serves as the core of medical artificial intelligence, is essentially a semantic network for revealing relationships between medical entities, and can formally describe things and correlations of things in the real world. In general, a medical knowledge graph is constructed by continuously expanding entities and relationships based on manually constructed expert knowledge through algorithms and expert auditing, and comprises medical concepts and various medical relationships such as diseases, symptoms, medicines, operations and the like. In a wide range of medical scenarios, medical knowledge maps have proven to be effective in providing medical knowledge support for algorithms and medical interpretation of predicted outcomes of algorithms. In the foreseeable future, knowledge maps will play a vital role in the field of medical treatment, which is a strong knowledge attribute. Therefore, the method for extracting the relationship between the patterns based on the enhancement of the pre-training model can provide very important support for extracting the relationship between the medical knowledge patterns.
Thus, as shown in fig. 1, a method for extracting a graph relationship based on pre-training model enhancement, the method comprising:
step 1, constructing a relation extraction model, wherein the relation extraction model comprises a grouping suspension mark, a pre-training language model and a relation prediction layer;
Step 2, preprocessing text data and initializing grouping suspension marks to obtain characteristic sequences of the text and the suspension marks;
step 3, calculating an attention mask;
step 4, using the attention mask to control the feature propagation direction of the pre-training language model, and extracting the features of the suspension mark pairs;
step 5, inputting the characteristics of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector;
and 6, calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model.
The map is a medical knowledge map, the entities of the map comprise diseases, symptoms, medicines and operations, and the relationship of the map comprises disease-symptom relationship, disease-medicine relationship, disease-disease relationship, symptom-symptom relationship and disease-operation relationship.
The text data preprocessing and grouping suspension mark initializing are carried out to obtain a characteristic sequence of the text and the suspension mark, and the method comprises the following steps:
Step 201, word segmentation is carried out on an input text to obtain a word segmentation sequence;
Step 202, inserting a "< e >" mark before each entity of the word segmentation sequence, inserting a "</e >" mark after each entity for marking the position of the entity, inserting a start mark "< CLS >" in the head of the word segmentation sequence, and inserting a stop mark "< SEP >" in the tail of the word segmentation sequence;
Step 203, mapping the word segmentation sequence into a word vector sequence by using a word embedding model of the pre-training language model Roberta-large, wherein the total word segmentation number is The total entity number isThe word vector sequence mathematical expression obtained by mapping the word sequence is as follows:
wherein, A word vector representing the start tag "< CLS >",A word vector representing the end tag "< SEP >",A word vector representing the i-th word,A word vector representing an i < e > "th tag, each < e >" tag content being fixed, so each < e > "tag word vector is identical;
step 204, obtaining a position embedded sequence of the word segmentation sequence by using a position embedded model of the pre-training language model Roberta-large, and for the word segmentation sequence in step 203, obtaining a mathematical expression of the position embedded sequence as follows:
wherein, The position of the start tag "< CLS >" is embedded,The location of the end mark "< SEP >" is embedded,The position of the i-th word is indicated to be embedded,The position embedding representing the i < e > "mark, each < e >" mark position being different, so the position embedding of each < e > "mark is different;
Step 205, mapping the word vector sequence obtained by word segmentation sequence And position embedded sequence of word segmentation sequenceAdding according to elements to obtain feature embedded sequence of word segmentation sequenceThe mathematical expression is:
Step 206, generating a suspension mark feature; the ith suspension mark is characterized by the word vector of the ith "< e >" mark The position of the i < e > "mark is embeddedThe mathematical expression is:
wherein, Features representing the ith suspension mark;
Step 207, generating a suspension mark feature sequence; the entity number is m, and m suspension marks are provided, so that a suspension mark characteristic sequence containing m groups of suspension marks is generated, and the generation mode of the ith group of suspension marks is as follows: features of the ith suspension mark Placed at the beginning of the i-th set of floating-point sequences, other floating-point sequences are arranged in the order of appearance in the text, from small to large, behind the i-th set of floating-point sequences, where i = 1,2,3, …, m; sequentially splicing m groups of suspension mark characteristic sequences to obtain a suspension mark characteristic sequence with the length ofSuspension marker signature sequences of (2)
Step 208, embedding the features of the word segmentation sequence into the sequenceAnd a suspension marker feature sequenceSpliced together, the mathematical expression is:
wherein, Characteristic sequences representing text and hover marks.
RoBERTa-large is one of the variants based on the BERT (Bidirectional Encoder Representations from Transformers) model, developed by Facebook AI (now referred to as Meta AI). RoBERTa, collectively referred to as "A Robustly Optimized BERT Pretraining Approach", was modified and optimized based on BERT. The following are some key features of RoBERTa-large: (1) model scale: roBERTa-large is larger than BERT-large, having 24 layers of Transformer encoder, with 1024 hidden units per layer, for a total of 355M parameters. In contrast, BERT-large has 24 layers, each with 1024 hidden units, for a total of 340M parameters. (2) amount of pre-training data: roBERTa a larger pre-training dataset was used, about 160GB of data, much more than 16GB of BERT. This includes datasets BookCorpus, ENGLISH WIKIPEDIA, CC-News, openWebText, stories, and the like. (3) pretraining strategy: roBERTa are more optimized during the pre-training process. For example, the Next Sentence Prediction (NSP) task in BERT is eliminated and a longer training sequence (longer sentence) is used. (4) training time: roBERTa perform a longer pre-training to ensure that the model captures language patterns and context better. (5) effect improvement: due to the optimization described above, roBERTa performs better than BERT across multiple natural language processing tasks, including tasks such as text classification, question-answering, text generation, and the like.
The calculating the attention mask comprises the following steps:
the characteristic embedded sequence of the word segmentation sequence Sequence length isSuspension marker feature sequencesSequence length isNumber of entitiesGenerates a size ofIs a matrix of (a)The mathematical expression of the element assignment in the matrix is:
wherein, Is the attention mask of the person,Representation ofElements of row i and column j.
One of the roles of the attention mask in the transducer model is to control the propagation of information, i.e. information that determines which locations can affect each other.
The attention mechanism interacts each location with other locations in calculating the attention weight and assigns weights according to their relevance. By marking certain locations in the attention mask, we can control whether the model takes these locations into account when calculating the attention weight.
The method for extracting the feature of the suspension mark pair by using the attention mask to control the feature propagation direction of the pre-training language model comprises the following steps:
Step 401, feature sequences of the text and the suspension mark Input into pre-trained language model Roberta-large and mask with the attentionAs a mask for Roberta-large forward propagation, the mathematical expression is:
wherein, Is thatThe last hidden layer of the output is characterized, d isIs used to determine the hidden layer dimension of the (c),Embedding sequences for features of said word segmentation sequencesThe length of the sequence is set to be,For said suspended tag feature sequencesSequence length;
step 402, slave Features of the last hidden layer of the outputThe characteristics of each entity pair are selected, and the mathematical expression is as follows:
wherein, Features representing pairs of floating marks for the i-th entity and the j-th entity,Representing an operation of indexing from the 0 th dimension of the target tensor.
The method inputs the characteristics of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector, and comprises the following steps:
Features of pairs of floating marks for the i-th and j-th entities Inputting the full connection layer to obtain a relation prediction vector of the ith entity and the jth entity, wherein the mathematical expression is as follows:
wherein, Representing the relationship prediction vector of the i-th entity and the j-th entity,Representing the weight matrix of the fully connected layer,Representing the bias vector of the fully connected layer, C representing the number of relationship categories, d representing the dimension of the floating token pair feature,Is an activation function for normalizing the vector into a probability distribution.
The method comprises the following steps of calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model:
calculating a relationship prediction vector for an ith entity and a jth entity Tag associated with realityCross entropy loss between the two, the mathematical expression is:
wherein, A true relationship label representing an ith entity and a jth entity, if the ith entity and the jth entity have a kth type relationshipOtherwiseThe probability that the ith entity and the jth entity representing model prediction have the kth class is the relation prediction vectorIndex value of (2);
The cross entropy loss of all entity pairs is calculated and the mathematical expression is as follows:
wherein, Representing the total cross entropy loss;
pairs using Adam optimization algorithm And optimizing, and training a relation extraction model.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (3)

1. The method for extracting the map relation based on the pre-training model enhancement is characterized by comprising the following steps:
step 1, constructing a relation extraction model, wherein the relation extraction model comprises a grouping suspension mark, a pre-training language model and a relation prediction layer;
Step 2, preprocessing text data and initializing grouping suspension marks to obtain characteristic sequences of the text and the suspension marks;
step 3, calculating an attention mask;
step 4, using the attention mask to control the feature propagation direction of the pre-training language model, and extracting the features of the suspension mark pairs;
step 5, inputting the characteristics of the suspension mark pairs into a relation prediction layer to obtain a relation probability vector;
step 6, calculating a loss function for the relation probability vector, optimizing the loss function, training a relation extraction model, and carrying out relation extraction by using the relation extraction model;
The map is a medical knowledge map, the entities of the map comprise diseases, symptoms, medicines and operations, and the relationship of the map comprises disease-symptom relationship, disease-medicine relationship, disease-disease relationship, symptom-symptom relationship and disease-operation relationship;
the text data preprocessing and grouping suspension mark initializing are carried out to obtain a characteristic sequence of the text and the suspension mark, and the method comprises the following steps:
Step 201, word segmentation is carried out on an input text to obtain a word segmentation sequence;
Step 202, inserting a "< e >" mark before each entity of the word segmentation sequence, inserting a "</e >" mark after each entity for marking the position of the entity, inserting a start mark "< CLS >" in the head of the word segmentation sequence, and inserting a stop mark "< SEP >" in the tail of the word segmentation sequence;
Step 203, mapping the word segmentation sequence into a word vector sequence by using a word embedding model of the pre-training language model Roberta-large, wherein the total word segmentation number is The total entity number isThe word vector sequence mathematical expression obtained by mapping the word sequence is as follows:
wherein, A word vector representing the start tag "< CLS >",A word vector representing the end tag "< SEP >",A word vector representing the i-th word,A word vector representing an i < e > "th tag, each < e >" tag content being fixed, so each < e > "tag word vector is identical;
Step 204, obtaining a position embedded sequence of a word segmentation sequence by using a position embedded model of a pre-training language model Roberta-large, wherein the mathematical expression of the position embedded sequence obtained for the word segmentation sequence is as follows:
wherein, The position of the start tag "< CLS >" is embedded,The location of the end mark "< SEP >" is embedded,The position of the i-th word is indicated to be embedded,The position embedding representing the i < e > "mark, each < e >" mark position being different, so the position embedding of each < e > "mark is different;
Step 205, mapping the word vector sequence obtained by word segmentation sequence And position embedded sequence of word segmentation sequenceAdding according to elements to obtain feature embedded sequence of word segmentation sequenceThe mathematical expression is:
Step 206, generating a suspension mark feature; the ith suspension mark is characterized by the word vector of the ith "< e >" mark The position of the i < e > "mark is embeddedThe mathematical expression is:
wherein, Features representing the ith suspension mark;
Step 207, generating a suspension mark feature sequence; the entity number is m, and m suspension marks are provided, so that a suspension mark characteristic sequence containing m groups of suspension marks is generated, and the generation mode of the ith group of suspension marks is as follows: features of the ith suspension mark Placed at the beginning of the i-th set of floating-point sequences, other floating-point sequences are arranged in the order of appearance in the text, from small to large, behind the i-th set of floating-point sequences, where i = 1,2,3, …, m; sequentially splicing m groups of suspension mark characteristic sequences to obtain a suspension mark characteristic sequence with the length ofSuspension marker signature sequences of (2)
Step 208, embedding the features of the word segmentation sequence into the sequenceAnd a suspension marker feature sequenceSpliced together, the mathematical expression is:
wherein, A feature sequence representing text and a hover mark;
The calculating the attention mask comprises the following steps:
the characteristic embedded sequence of the word segmentation sequence Sequence length isSuspension marker feature sequencesSequence length isNumber of entitiesGenerates a size ofIs a matrix of (a)The mathematical expression of the element assignment in the matrix is:
wherein, Is the attention mask of the person,Representation ofElements of the ith row and the jth column;
The method for extracting the feature of the suspension mark pair by using the attention mask to control the feature propagation direction of the pre-training language model comprises the following steps:
Step 401, feature sequences of the text and the suspension mark Input into pre-trained language model Roberta-large and mask with the attentionAs a mask for Roberta-large forward propagation, the mathematical expression is:
wherein, Is thatThe last hidden layer of the output is characterized, d isIs used to determine the hidden layer dimension of the (c),Embedding sequences for features of said word segmentation sequencesThe length of the sequence is set to be,For said suspended tag feature sequencesSequence length;
step 402, slave Features of the last hidden layer of the outputThe characteristics of each entity pair are selected, and the mathematical expression is as follows:
wherein, Features representing pairs of floating marks for the i-th entity and the j-th entity,Representing an operation of indexing from the 0 th dimension of the target tensor.
2. The method for extracting the relationship between the atlases based on the enhancement of the pre-training model according to claim 1, wherein the step of inputting the features of the suspension mark pairs into the relationship prediction layer to obtain the relationship probability vector comprises the following steps:
Features of pairs of floating marks for the i-th and j-th entities Inputting the full connection layer to obtain a relation prediction vector of the ith entity and the jth entity, wherein the mathematical expression is as follows:
wherein, Representing the relationship prediction vector of the i-th entity and the j-th entity,Representing the weight matrix of the fully connected layer,Representing bias vectors of the fully connected layers, C representing the number of relationship categories, d representing the dimension of the floating mark pair feature, the dimension of the floating mark pair feature andIs equal in the dimension of the hidden layer of (c),Is an activation function for normalizing the vector into a probability distribution.
3. The method for extracting the relationship between the patterns based on the enhancement of the pre-training model according to claim 2, wherein the steps of calculating the loss function for the relationship probability vector, optimizing the loss function, training the relationship extraction model, and extracting the relationship by using the relationship extraction model comprise the following steps:
calculating a relationship prediction vector for an ith entity and a jth entity Tag associated with realityCross entropy loss between the two, the mathematical expression is:
wherein, A true relationship label representing an ith entity and a jth entity, if the ith entity and the jth entity have a kth type relationshipOtherwiseThe probability that the ith entity and the jth entity representing model prediction have the kth class is the relation prediction vectorIndex value of (2);
The cross entropy loss of all entity pairs is calculated and the mathematical expression is as follows:
wherein, Representing the total cross entropy loss;
pairs using Adam optimization algorithm And optimizing, and training a relation extraction model.
CN202410876214.9A 2024-07-02 2024-07-02 Atlas relation extraction method based on pre-training model enhancement Active CN118428471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410876214.9A CN118428471B (en) 2024-07-02 2024-07-02 Atlas relation extraction method based on pre-training model enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410876214.9A CN118428471B (en) 2024-07-02 2024-07-02 Atlas relation extraction method based on pre-training model enhancement

Publications (2)

Publication Number Publication Date
CN118428471A CN118428471A (en) 2024-08-02
CN118428471B true CN118428471B (en) 2024-09-24

Family

ID=92326091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410876214.9A Active CN118428471B (en) 2024-07-02 2024-07-02 Atlas relation extraction method based on pre-training model enhancement

Country Status (1)

Country Link
CN (1) CN118428471B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116313121A (en) * 2022-12-30 2023-06-23 北京邮电大学 Standardized construction method for high-robustness medical knowledge graph of pipeline type

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357176B (en) * 2021-11-26 2023-11-21 永中软件股份有限公司 Entity knowledge automatic extraction method, computer device and computer readable medium
CN115392256A (en) * 2022-08-29 2022-11-25 重庆师范大学 Drug adverse event relation extraction method based on semantic segmentation
CN116186277A (en) * 2022-12-06 2023-05-30 同济大学 Chinese knowledge graph construction method based on CasRel model
CN115952284A (en) * 2022-12-09 2023-04-11 昆明理工大学 Medical text relation extraction method fusing density clustering and ERNIE
CN116956940A (en) * 2023-08-03 2023-10-27 杭州电子科技大学 Text event extraction method based on multi-directional traversal and prompt learning
CN118133785A (en) * 2024-04-08 2024-06-04 云南律奥新技术开发有限公司 Document Relation Extraction Method Based on Relation Template Evidence Extraction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116313121A (en) * 2022-12-30 2023-06-23 北京邮电大学 Standardized construction method for high-robustness medical knowledge graph of pipeline type

Also Published As

Publication number Publication date
CN118428471A (en) 2024-08-02

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN110210037B (en) Syndrome-oriented medical field category detection method
Vlad et al. Sentence-level propaganda detection in news articles with transfer learning and BERT-BiLSTM-capsule model
CN111027595B (en) Double-stage semantic word vector generation method
CN110209822A (en) Sphere of learning data dependence prediction technique based on deep learning, computer
CN109657230A (en) Merge the name entity recognition method and device of term vector and part of speech vector
CN113128233B (en) Construction method and system of mental disease knowledge map
CN114077673B (en) Knowledge graph construction method based on BTBC model
CN114925157B (en) Nuclear power station maintenance experience text matching method based on pre-training model
CN112069825B (en) Entity relation joint extraction method for alert condition record data
CN117763363A (en) Cross-network academic community resource recommendation method based on knowledge graph and prompt learning
CN117217223A (en) Chinese named entity recognition method and system based on multi-feature embedding
CN117033423A (en) SQL generating method for injecting optimal mode item and historical interaction information
CN109189848A (en) Abstracting method, system, computer equipment and the storage medium of knowledge data
CN114254645A (en) Artificial intelligence auxiliary writing system
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
CN112800244B (en) Method for constructing knowledge graph of traditional Chinese medicine and national medicine
Jin et al. Textual content prediction via fuzzy attention neural network model without predefined knowledge
CN117194682B (en) Method, device and medium for constructing knowledge graph based on power grid related file
CN117056459B (en) Vector recall method and device
CN118428471B (en) Atlas relation extraction method based on pre-training model enhancement
CN114881038B (en) Chinese entity and relation extraction method and device based on span and attention mechanism
CN117932066A (en) Pre-training-based &#39;extraction-generation&#39; answer generation model and method
Wang et al. A Named Entity Recognition Model Based on Entity Trigger Reinforcement Learning
CN118469006B (en) Knowledge graph construction method, device, medium and chip for electric power operation text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant