CN113157946A - Entity linking method and device, electronic equipment and storage medium - Google Patents
Entity linking method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113157946A CN113157946A CN202110529673.6A CN202110529673A CN113157946A CN 113157946 A CN113157946 A CN 113157946A CN 202110529673 A CN202110529673 A CN 202110529673A CN 113157946 A CN113157946 A CN 113157946A
- Authority
- CN
- China
- Prior art keywords
- entity
- candidate
- candidate entity
- relevance
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 6
- 239000002131 composite material Substances 0.000 claims description 2
- 238000004891 communication Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000011218 segmentation Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 241000196252 Ulva Species 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000005906 menstruation Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention provides an entity linking method, an entity linking device, electronic equipment and a storage medium. The entity linking method comprises the following steps: extracting entity mentions of the text to be analyzed; acquiring a candidate entity set mentioned by an entity from a knowledge graph, wherein the candidate entity set at least comprises one candidate entity, and the candidate entity comprises description information; acquiring a first association degree of the entity mention and the description information of each candidate entity, and acquiring a second association degree of each candidate entity and the text to be analyzed; and obtaining the target entity from the candidate entity set based on the first relevance and the second relevance, and associating the entity mention with the target entity. The entity linking method can accurately capture the semantic association degree between the vocabularies by utilizing the co-occurrence relation between the vocabularies, further ensure that entity mentions can be accurately connected to corresponding entities in the knowledge graph, improve the accuracy and reliability of entity linking, and effectively expand the scale of the knowledge graph.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to an entity linking method and apparatus, an electronic device, and a storage medium.
Background
In the process of establishing the knowledge graph, entity linking is needed, and the entity linking refers to and associates external entity references to corresponding entities in the existing knowledge graph. Currently, the entity linking technology includes comparing semantic or text structures according to context mentioned by an entity and candidate entity description context in a knowledge graph, calculating similarity, and further judging whether to link. However, entity mention context and candidate entity description context in a knowledge graph do not in many cases have a vocabulary of similar semantics, such as: the Sunwukong in the text of "Sunwukong Merry", which is an entity mention, exists a candidate entity "Houguo" in the knowledge graph, which is described as "Houwining West of Houwining", at this time, the entity linking technology based on semantic calculation and vocabulary comparison based on a text structure cannot link the entity mention "Sunwukong" with the candidate entity "Houwining" in the knowledge graph.
Disclosure of Invention
The invention provides an entity linking method, an entity linking device, electronic equipment and a storage medium, which can improve the accuracy and reliability of entity linking and effectively expand the scale of a knowledge graph.
The invention provides an entity linking method, which comprises the following steps:
extracting entity mentions of the text to be analyzed;
acquiring a candidate entity set mentioned by the entity from a knowledge graph, wherein the candidate entity set at least comprises one candidate entity, and the candidate entity comprises description information;
acquiring a first association degree of the entity mention and description information of each candidate entity, and acquiring a second association degree of each candidate entity and the text to be analyzed;
and obtaining a target entity from the candidate entity set based on the first relevance and the second relevance, and associating the entity mention with the target entity.
According to an entity linking method provided by the present invention, the obtaining a first degree of association between the entity mention and description information of each candidate entity, and obtaining a second degree of association between each candidate entity and the text to be analyzed includes:
acquiring the co-occurrence probability of the entity mention and the description information of each candidate entity, and acquiring the co-occurrence probability of each candidate entity and the text to be analyzed;
and obtaining a first association degree of the description information of the entity mention and each candidate entity according to the co-occurrence probability of the description information of the entity mention and each candidate entity, and obtaining a second association degree of each candidate entity and the text to be analyzed according to the co-occurrence probability of each candidate entity and the text to be analyzed.
According to an entity linking method provided by the present invention, the obtaining of the co-occurrence probability of the entity mention and the description information of each candidate entity and the obtaining of the co-occurrence probability of each candidate entity and the text to be analyzed includes:
acquiring the frequency of a binary word group obtained by combining words and phrases in the description information of the entity mention and each candidate entity, and acquiring the co-occurrence probability of the description information of the entity mention and each candidate entity based on the frequency of the binary word group obtained by combining words and phrases in the description information of the entity mention and each candidate entity;
and acquiring the frequency of a binary word group obtained by combining each candidate entity with each word in the text to be analyzed, and acquiring the co-occurrence probability of each candidate entity and the text to be analyzed based on the frequency of the binary word group obtained by combining each candidate entity with each word in the text to be analyzed.
According to the entity linking method provided by the invention, the frequency of the binary vocabulary group is obtained based on text statistics in a preset basic corpus.
According to the entity linking method provided by the invention, the acquiring of the candidate entity set mentioned by the entity from the knowledge graph comprises the following steps:
acquiring alternative names mentioned by the entities;
and matching the candidate entity set of the entity mention from the knowledge graph based on the entity mention and the alternative name.
According to an entity linking method provided by the present invention, obtaining a target entity from the candidate entity set based on the first relevance and the second relevance includes:
acquiring a weighted average value of the first relevance degree and the second relevance degree, and taking the weighted average value as a comprehensive relevance metric value;
and obtaining the target entity based on the comprehensive association metric value and the first association degree of the entity reference and the description information of each candidate entity.
According to an entity linking method provided by the present invention, the obtaining the target entity based on the integrated association metric value and the first association degree of the entity reference and the description information of each candidate entity includes:
acquiring a first association degree which is greater than the comprehensive association metric value;
and taking the candidate entity corresponding to the maximum first relevance degree in the first relevance degrees larger than the comprehensive relevance metric value as the target entity.
The present invention also provides an entity linking apparatus, comprising:
the extraction module is used for extracting entity mentions of the text to be analyzed;
a candidate entity obtaining module, configured to obtain a candidate entity set mentioned by the entity from a knowledge graph, where the candidate entity set includes at least one candidate entity, and the candidate entity includes description information;
the relevancy calculation module is used for acquiring a first relevancy between the entity mention and the description information of each candidate entity and acquiring a second relevancy between each candidate entity and the text to be analyzed;
and the entity linking module is used for obtaining a target entity from the candidate entity set based on the first relevance and the second relevance and associating the entity mention with the target entity.
The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of any of the above-mentioned physical linking methods when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the physical linking method as described in any one of the above.
The entity linking method, the device, the electronic equipment and the storage medium provided by the invention firstly acquire a plurality of candidate entities related to names of entity mentions from a knowledge graph, then can determine the situation that the entity mentions and all vocabularies appearing in the description information of each candidate entity commonly appear based on big data and the like to obtain a first relevance and the situation that all vocabularies appearing in a text to be analyzed of each candidate entity and the entity mention commonly appear to obtain a second relevance, further can accurately capture the relevance between the entity mentions and the plurality of candidate entities according to the relevance between the entity mentions and the plurality of candidate entities, determine the linked entities according to the relevance between the entity mentions and the plurality of candidate entities, and further ensure that the entity mentions can be accurately linked to the corresponding entities in the knowledge graph, the accuracy and the reliability of entity link are improved, and the scale of the knowledge graph is effectively expanded.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of an entity linking method provided by the present invention;
FIG. 2 is a schematic diagram of an entity linking method provided by the present invention;
FIG. 3 is a schematic structural diagram of a physical link apparatus provided in the present invention;
fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An entity linking method, an apparatus, an electronic device, and a storage medium according to embodiments of the present invention are described below with reference to the accompanying drawings.
The entity link is used for associating the external entity mention with the corresponding entity in the existing knowledge graph, so that on one hand, semantic information mentioned by the entity can be expanded, and on the other hand, the scale of the knowledge graph can be expanded more accurately.
In the above description, the entity mention is entity information extracted from the text to be analyzed, such as a subject in the text to be analyzed, which is taken as an object to be linked, that is: to be associated to the corresponding entity in the existing knowledge graph.
Fig. 1 is a flowchart of an entity linking method according to an embodiment of the present invention. As shown in fig. 1, the entity linking method according to the embodiment of the present invention includes the following steps:
s101: and extracting entity mentions of the text to be analyzed.
As shown in fig. 2, the extraction of entity mentions in the text to be analyzed may be performed by means of entity recognition. For example: the method comprises the steps of realizing automatic extraction of Entity mentions according to a prepared Named Entity Recognition model (NER model for short), wherein the Named Entity Recognition model adopts a Named Entity Recognition algorithm to extract the Entity mentions in a text to be analyzed, so that after character-level preprocessing is carried out on the text to be analyzed and input data of the Named Entity Recognition model is obtained, the Named Entity Recognition model can automatically complete Named Entity sequence labeling, and further, the extraction of the Entity mentions in the text to be analyzed is completed according to a predicted labeling sequence.
The text to be analyzed is, for example, "zhang san graduate the central theatrical performance system, actor of the chinese liaoning station", and the entity mentions extracted by the named entity recognition model may include "zhang san", "china", "actor", and the like.
S102: and acquiring a candidate entity set mentioned by the entity from the knowledge graph, wherein the candidate entity set at least comprises one candidate entity, and the candidate entity comprises description information.
The knowledge graph includes a plurality of entities, and the candidate entity set is a set of one or more entities selected from the plurality of entities in the knowledge graph, wherein an entity that needs to be associated is an entity in the candidate entity set.
As shown in fig. 2, the generation of candidate entities, namely: obtaining a candidate set of entity mentions from a knowledge-graph, comprising: acquiring alternative names mentioned by the entities; and matching a candidate entity set of entity mention from the knowledge graph based on the entity mention and the alternative name. The candidate entities in the set of candidate entities include description information, which is an entity attribute of the candidate entity, typically a piece of text, for example: for a section of text which is already linked to the knowledge graph, that is, the scenario of the grand wonder day house is very brilliant, for the text, the entity in the knowledge graph can be the grand wonder day house, the corresponding description information is the scenario of the grand wonder day house is very brilliant, of course, the entity attribute of each candidate entity can be independently obtained, and then the candidate entity set corresponds to one candidate entity attribute set.
In one embodiment of the invention, the candidate entities are generated according to names matched from the knowledge graph by regular expressions according to entity mentions and other names. Namely: these name sets correspond to the names of entities in the knowledge-graph, that is, the names and the names in the set correspond to the entities in the knowledge-graph. The specific process is as follows:
the method comprises the steps of inquiring a pre-stored alternative name library according to entity mentions, wherein the entity mentions and the alternative names exist in the alternative name library, possible names of the entity mentions can be expanded through the alternative name library, and matching inquiry cannot be carried out in a regular expression mode due to the fact that the alternative names of the entity mentions possibly do not have intersection with Chinese characters used by the name of the entity mentions, so that the alternative names of the entity mentions are firstly obtained, and then the matching inquiry can be respectively carried out according to the alternative names and the entity mentions. For example: the movie entity name "auditorium movie theater" is named as "starlight with my heart". The alternative names corresponding to the entities are recorded in the alternative name library, and the format is, but not limited to, a binary form, such as: < theater, starlight with my heart >.
After the alternative names of the entity mentions are obtained, a candidate entity set is obtained from the knowledge graph by applying regular expressions or editing distances and the like according to the entity mentions and the alternative names, wherein the regular expression matching mode comprises but is not limited to the following modes:
the name of a candidate entity in the set of candidate entities is an entity that contains a mention of the entity or other name thereof, or the mention of the entity or other name contains the name of a candidate entity in the set of candidate entities in the knowledge-graph.
The edit distance describes the statistics of the number of changes at the character level, which makes one of the character sequences change into another. And (3) calculating the edit distance of the names of the two entities, and comparing the edit distance with a set threshold value, thereby screening out a possible entity set from the knowledge graph spectrum as a candidate entity set.
S103: and acquiring a first association degree of the entity mention and the description information of each candidate entity, and acquiring a second association degree of each candidate entity and the text to be analyzed.
In one embodiment of the invention, acquiring a first degree of association of the entity mention and the description information of each candidate entity and acquiring a second degree of association of each candidate entity and the text to be analyzed comprises: acquiring the co-occurrence probability of the entity mention and the description information of each candidate entity, and acquiring the co-occurrence probability of each candidate entity and the text to be analyzed; and obtaining a first association degree of the entity mention and the description information of each candidate entity according to the co-occurrence probability of the entity mention and the description information of each candidate entity, and obtaining a second association degree of each candidate entity and the text to be analyzed according to the co-occurrence probability of each candidate entity and the text to be analyzed.
In this example, obtaining the co-occurrence probability of the entity mention and the description information of each candidate entity and obtaining the co-occurrence probability of each candidate entity and the text to be analyzed includes: acquiring the frequency of a binary word group obtained by combining the entity mention and each word in the description information of each candidate entity, and obtaining the co-occurrence probability of the entity mention and the description information of each candidate entity based on the frequency of the binary word group obtained by combining the entity mention and each word in the description information of each candidate entity; and obtaining the frequency of the binary word group obtained by combining each candidate entity with each word in the text to be analyzed, and obtaining the co-occurrence probability of each candidate entity and the text to be analyzed based on the frequency of the binary word group obtained by combining each candidate entity with each word in the text to be analyzed.
In the above example, the frequency of the binary vocabulary group is obtained based on the text statistics in the preset base corpus, for example. Wherein, the basic corpus includes a plurality of texts, namely: a plurality of texts are recorded, and the recorded texts can be collected in advance, for example: collection by hand, collection over a network, etc.
In the above description, the first degree of association represents the closeness of association between the entity mention and each vocabulary in the description information of each candidate entity, for example: the degree of closeness of the association can be determined by the probability of common occurrence in the big data, and the greater the probability of common occurrence is, the greater the degree of closeness is, the stronger the association between the entity mention and the candidate entity is; similarly, the second degree of association represents the closeness of association between each candidate entity and the words in the text to be analyzed.
The first and second relevance degrees are related to the co-occurrence probability, and therefore, the co-occurrence probability is first determined, wherein the co-occurrence probability is the probability of co-occurrence. Specifically, as shown in FIG. 2, the co-occurrence probability generation is based on the base corpus (i.e., big data). That is, the co-occurrence probability is generated by counting the frequencies of different binary word combinations as the co-occurrence probability according to the text records in the basic corpus and storing the co-occurrence probability. The generation process is as follows:
firstly, performing word segmentation on a text, counting the occurrence frequency of each vocabulary, then sequentially taking each vocabulary as a head vocabulary, counting the occurrence frequency of a binary vocabulary group consisting of the head vocabulary and other random vocabularies, and further obtaining statistical information corresponding to each vocabulary in a text record, such as: < vocabulary i, vocabulary j >: < number of occurrences of vocabulary i, number of co-occurrences of vocabulary i and vocabulary j >. Wherein co-occurrence describes the situation that occurs simultaneously in the text. The text may be segmented and entity recognized first, and stop words are removed according to the segmentation result, where the stop words include, but are not limited to: and obtaining the de-duplicated vocabulary set and the entity set by using the auxiliary words, the adverbs, the prepositions and the like as the fictitious words. Obtaining statistics for a text is typically one occurrence of each entity, and one co-occurrence of each entity with each vocabulary.
For example: for text S1: "Zhang Sanzhao in the performance department of the Central theatrical institute of drama, actor of Liaoning nationality, China. The operation steps are divided into the following steps 1 and 2, wherein the step 1 is as follows:
the segmentation result of S1 is "zhangsan", "graduation", "center", "drama", "college", "performance system", "china", "lianning", "nationality", and "actor".
The entity recognition result of S1 is: and (5) opening the leaf.
Counting the frequency, first for each entity:
zhang III: 1
And then combining the entity and each vocabulary in a binary co-occurrence way to obtain:
< Zhang III, graduation >: <1, 1 >;
< Zhang III, center >: <1, 1 >;
< three, drama >: <1, 1 >;
< Zhang III, college >: <1, 1 >;
< Zhang III, performance system >: <1, 1 >;
< Zhang III, China >: <1, 1 >;
< Zhang san, Liaoning >: <1, 1 >;
< Zhang III, nationality >: <1, 1 >;
< Zhang three, actor >: <1, 1 >.
Step 2: each textual record in the base corpus is traversed. And (3) repeating the step (1) for each text record, merging the obtained statistical results, and adding the occurrence times of the corresponding vocabulary and the binary vocabulary groups in the merging process to obtain the statistical results. If no new entity statistical information or binary vocabulary group co-occurrence statistical information exists, adding the information, otherwise, performing added updating operation in the original record statistical result.
Converting a binary vocabulary combination frequency matrix in the statistical result into a probability form, wherein the co-occurrence probability p is the corresponding frequency, namely:
where count () represents the number of statistics.
The calculation of the relevance is divided into the calculation of a first relevance and a second relevance, and the computer: calculating the association degree of the entity mention and each candidate entity attribute set, and calculating the association degree of each candidate entity and the context of the entity mention (namely: the text to be analyzed except the entity mention) wherein the association degree r is calculated as follows:
wherein n represents the number of words after the text is divided into words. And finally, obtaining a first relevance degree and a second relevance degree, and taking the weighted average of the first relevance degree and the second relevance degree as a comprehensive relevance metric value of each candidate entity. Wherein, the co-occurrence probability of each binary vocabulary group can be obtained through query.
The specific calculation process of the association degree is as follows:
entity mention: and (5) opening the leaf. The text to be analyzed S2 is: zhang Sanin 2011 (New Water Enteromorpha Laevice) has performed the Rong of the blossoming of Xiaoli Guangdong flowers.
The word segmentation of S2 yields: zhang III, Chun Shuihu Chun, decorativeness, Xiao Li Guang and Huarong.
Obtaining a binary vocabulary group according to the entity mentioning and word segmentation results: < Zhang III, new water entermorphism >, < Zhang III, decorativeness >, < Zhang III, Xiao Li Guang >, < Zhang III, Huarong >.
According to the obtained binary vocabulary group, the co-occurrence probability p1 (Zhangthree, New Water Enteromorpha), the co-occurrence probability p2 (Zhangthree, rehearsal), the co-occurrence probability p3 (Zhangthree, Xiaoliguang) and the co-occurrence probability p4 (Zhangthree, Huarong) are obtained through inquiry, and then the association degree is obtained as follows:
in the embodiment of the present invention, the comprehensive relevance R calculation method is a weighted average of the relevance R1.. m between the entity mention and each candidate entity attribute set (usually spliced into a text form), and the relevance R1.. m between each candidate entity and the entity mention context, where m represents the number of candidate entities. Namely, it is
W1 and w2 are weights, and w1 and w2 are values between 0 and 1, and can be set as required.
S104: and obtaining the target entity from the candidate entity set based on the first relevance and the second relevance, and associating the entity mention with the target entity.
In an embodiment of the present invention, obtaining a target entity from the candidate entity set based on the first relevance and the second relevance includes: acquiring a weighted average value of the first relevance degree and the second relevance degree, and taking the weighted average value as a comprehensive relevance metric value; and obtaining the target entity based on the comprehensive association metric value and the first association degree of the entity reference and the description information of each candidate entity.
In this example, obtaining the target entity based on the composite relevance metric value and a first relevance of the entity mention to the description information of each candidate entity includes: acquiring a first association degree which is greater than the comprehensive association metric value; and taking the candidate entity corresponding to the maximum first relevance degree in the first relevance degrees larger than the comprehensive relevance metric value as the target entity. That is, after obtaining the comprehensive association metric value of the entity referring to the corresponding candidate entity set, the maximum first association degree greater than the comprehensive association metric value is compared with the comprehensive association metric value, and if the first association degree exceeds the comprehensive association metric value, it indicates that the link is possible, that is: and associating the entity mention with the corresponding entity, thereby accurately expanding the scale of the knowledge graph.
According to the entity linking method provided by the embodiment of the invention, a plurality of candidate entities related to the name of the entity mention are firstly obtained from the knowledge graph, then, the situation that the entity mention and each vocabulary appearing in the description information of each candidate entity commonly can be determined based on big data and the like to obtain the first relevance and the situation that each candidate entity and each vocabulary appearing in the text to be analyzed mentioned by the entity commonly can be obtained to obtain the second relevance, further, the relevance between the entity mention and the plurality of candidate entities can be accurately captured by the relevance degree of the first relevance and the second relevance, the linked entity can be determined according to the relevance between the entity mention and the plurality of candidate entities, further, the entity mention can be accurately linked to the corresponding entity in the knowledge graph, and the accuracy and reliability of entity linking are improved, effectively expanding the scale of the knowledge graph.
Compared with the existing entity linking technology, for example: in the 'Sunwukong Merlong' context, 'Sunwukong' is an entity mention, and there is a candidate entity 'Houguo' in the knowledge graph, which is described as 'Houwining in the West of Houwining', in the prior art, the entity linking technology based on semantic calculation and word comparison based on a text structure cannot link the entity mention 'Sunwukong' with the candidate entity 'Houwining' in the knowledge graph, however, the entity mention 'Sunwukong' can be linked with the candidate entity 'Houwining' in the knowledge graph. Through the entity linking method provided by the embodiment of the invention, the relevance between Sunyoukong and West-day menstruation can be analyzed, and the relevance between Hounge and the great-alarm Tiangong can be determined, so that the entity mention can be accurately linked to the corresponding entity in the knowledge graph.
The following describes the entity linking device provided by the present invention, and the entity linking device described below and the entity linking method described above may be referred to correspondingly.
As shown in fig. 3, the entity linking apparatus according to an embodiment of the present invention includes: an extraction module 310, a candidate entity acquisition module 320, an association calculation module 330, and an entity linking module 340, wherein:
an extracting module 310, configured to extract entity mentions of a text to be analyzed;
a candidate entity obtaining module 320, configured to obtain, from a knowledge graph, a candidate entity set mentioned by the entity, where the candidate entity set includes at least one candidate entity, and the candidate entity includes description information;
the association degree calculation module 330 is configured to obtain a first association degree between the entity mention and the description information of each candidate entity, and obtain a second association degree between each candidate entity and the text to be analyzed;
an entity linking module 340, configured to obtain a target entity from the candidate entity set based on the first association degree and the second association degree, and associate the entity mention with the target entity.
According to the entity linking device provided by the embodiment of the invention, a plurality of candidate entities related to the name of the entity mention are firstly obtained from the knowledge graph, then, the situation that the entity mention and each vocabulary appearing in the description information of each candidate entity commonly appears can be determined based on big data and the like, so as to obtain the first relevance and the situation that each candidate entity and each vocabulary appearing in the text to be analyzed mentioned by the entity commonly appear, so as to obtain the second relevance, further, the relevance between the entity mention and the plurality of candidate entities can be accurately captured by the relevance degree of the first relevance and the second relevance, the linked entity can be determined according to the relevance between the entity mention and the plurality of candidate entities, further, the entity mention can be accurately linked to the corresponding entity in the knowledge graph, and the accuracy and reliability of entity linking are improved, effectively expanding the scale of the knowledge graph.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication interface (communication interface)420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication interface 420 and the memory 430 are communicated with each other via the communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform an entity linking method comprising: extracting entity mentions of the text to be analyzed; acquiring a candidate entity set mentioned by the entity from a knowledge graph, wherein the candidate entity set at least comprises one candidate entity, and the candidate entity comprises description information; acquiring a first association degree of the entity mention and description information of each candidate entity, and acquiring a second association degree of each candidate entity and the text to be analyzed; and obtaining a target entity from the candidate entity set based on the first relevance and the second relevance, and associating the entity mention with the target entity.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the entity linking method provided by the above methods, the method comprising: extracting entity mentions of the text to be analyzed; acquiring a candidate entity set mentioned by the entity from a knowledge graph, wherein the candidate entity set at least comprises one candidate entity, and the candidate entity comprises description information; acquiring a first association degree of the entity mention and description information of each candidate entity, and acquiring a second association degree of each candidate entity and the text to be analyzed; and obtaining a target entity from the candidate entity set based on the first relevance and the second relevance, and associating the entity mention with the target entity.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the entity linking method provided above, the method comprising: extracting entity mentions of the text to be analyzed; acquiring a candidate entity set mentioned by the entity from a knowledge graph, wherein the candidate entity set at least comprises one candidate entity, and the candidate entity comprises description information; acquiring a first association degree of the entity mention and description information of each candidate entity, and acquiring a second association degree of each candidate entity and the text to be analyzed; and obtaining a target entity from the candidate entity set based on the first relevance and the second relevance, and associating the entity mention with the target entity.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. An entity linking method, comprising:
extracting entity mentions of the text to be analyzed;
acquiring a candidate entity set mentioned by the entity from a knowledge graph, wherein the candidate entity set at least comprises one candidate entity, and the candidate entity comprises description information;
acquiring a first association degree of the entity mention and description information of each candidate entity, and acquiring a second association degree of each candidate entity and the text to be analyzed;
and obtaining a target entity from the candidate entity set based on the first relevance and the second relevance, and associating the entity mention with the target entity.
2. The entity linking method according to claim 1, wherein the obtaining a first degree of association of the entity mention with description information of each candidate entity and obtaining a second degree of association of each candidate entity with the text to be analyzed comprises:
acquiring the co-occurrence probability of the entity mention and the description information of each candidate entity, and acquiring the co-occurrence probability of each candidate entity and the text to be analyzed;
and obtaining a first association degree of the description information of the entity mention and each candidate entity according to the co-occurrence probability of the description information of the entity mention and each candidate entity, and obtaining a second association degree of each candidate entity and the text to be analyzed according to the co-occurrence probability of each candidate entity and the text to be analyzed.
3. The entity linking method according to claim 2, wherein the obtaining of the co-occurrence probability of the entity mention and the description information of each candidate entity and the co-occurrence probability of each candidate entity and the text to be analyzed comprises:
acquiring the frequency of a binary word group obtained by combining words and phrases in the description information of the entity mention and each candidate entity, and acquiring the co-occurrence probability of the description information of the entity mention and each candidate entity based on the frequency of the binary word group obtained by combining words and phrases in the description information of the entity mention and each candidate entity;
and acquiring the frequency of a binary word group obtained by combining each candidate entity with each word in the text to be analyzed, and acquiring the co-occurrence probability of each candidate entity and the text to be analyzed based on the frequency of the binary word group obtained by combining each candidate entity with each word in the text to be analyzed.
4. The entity linking method according to claim 3, wherein the frequency of the binary vocabulary group is statistically derived based on the text in a predetermined basic corpus.
5. The entity linking method of claim 1, wherein the obtaining of the set of candidate entities mentioned by the entity from the knowledge-graph comprises:
acquiring alternative names mentioned by the entities;
and matching the candidate entity set of the entity mention from the knowledge graph based on the entity mention and the alternative name.
6. The entity linking method according to any one of claims 1 to 5, wherein the deriving the target entity from the candidate entity set based on the first relevance and the second relevance comprises:
acquiring a weighted average value of the first relevance degree and the second relevance degree, and taking the weighted average value as a comprehensive relevance metric value;
and obtaining the target entity based on the comprehensive association metric value and the first association degree of the entity reference and the description information of each candidate entity.
7. The entity linking method of claim 6, wherein the deriving the target entity based on the composite relevance metric value and a first relevance of the entity reference to the description information of each candidate entity comprises:
acquiring a first association degree which is greater than the comprehensive association metric value;
and taking the candidate entity corresponding to the maximum first relevance degree in the first relevance degrees larger than the comprehensive relevance metric value as the target entity.
8. An entity linking apparatus, comprising:
the extraction module is used for extracting entity mentions of the text to be analyzed;
a candidate entity obtaining module, configured to obtain a candidate entity set mentioned by the entity from a knowledge graph, where the candidate entity set includes at least one candidate entity, and the candidate entity includes description information;
the relevancy calculation module is used for acquiring a first relevancy between the entity mention and the description information of each candidate entity and acquiring a second relevancy between each candidate entity and the text to be analyzed;
and the entity linking module is used for obtaining a target entity from the candidate entity set based on the first relevance and the second relevance and associating the entity mention with the target entity.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the physical linking method according to any of claims 1 to 7 are implemented when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, implementing the steps of the entity linking method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110529673.6A CN113157946B (en) | 2021-05-14 | 2021-05-14 | Entity linking method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110529673.6A CN113157946B (en) | 2021-05-14 | 2021-05-14 | Entity linking method, device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113157946A true CN113157946A (en) | 2021-07-23 |
CN113157946B CN113157946B (en) | 2024-09-27 |
Family
ID=76875992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110529673.6A Active CN113157946B (en) | 2021-05-14 | 2021-05-14 | Entity linking method, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113157946B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115795051A (en) * | 2022-12-02 | 2023-03-14 | 中科雨辰科技有限公司 | Data processing system for obtaining link entity based on entity relationship |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017123168A (en) * | 2016-01-05 | 2017-07-13 | 富士通株式会社 | Method for making entity mention in short text associated with entity in semantic knowledge base, and device |
CN107506486A (en) * | 2017-09-21 | 2017-12-22 | 北京航空航天大学 | A kind of relation extending method based on entity link |
CN111428507A (en) * | 2020-06-09 | 2020-07-17 | 北京百度网讯科技有限公司 | Entity chain finger method, device, equipment and storage medium |
CN112507715A (en) * | 2020-11-30 | 2021-03-16 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for determining incidence relation between entities |
CN112585596A (en) * | 2018-06-25 | 2021-03-30 | 易享信息技术有限公司 | System and method for investigating relationships between entities |
-
2021
- 2021-05-14 CN CN202110529673.6A patent/CN113157946B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017123168A (en) * | 2016-01-05 | 2017-07-13 | 富士通株式会社 | Method for making entity mention in short text associated with entity in semantic knowledge base, and device |
CN107506486A (en) * | 2017-09-21 | 2017-12-22 | 北京航空航天大学 | A kind of relation extending method based on entity link |
CN112585596A (en) * | 2018-06-25 | 2021-03-30 | 易享信息技术有限公司 | System and method for investigating relationships between entities |
CN111428507A (en) * | 2020-06-09 | 2020-07-17 | 北京百度网讯科技有限公司 | Entity chain finger method, device, equipment and storage medium |
CN112507715A (en) * | 2020-11-30 | 2021-03-16 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for determining incidence relation between entities |
Non-Patent Citations (2)
Title |
---|
WENBO ZHAO;TAGYOUNG CHUNG;ANUJ GOYAL;ANGELIKI METALLINOU: "Simple Question Answering with Subgraph Ranking and Joint-Scoring", 《STATISTICS》, 31 December 2019 (2019-12-31) * |
刘波: "面向企业图谱的实体链接技术的研究", 《中国优秀硕士论文电子期刊网(信息科技辑)》, 30 June 2020 (2020-06-30), pages 138 - 1241 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115795051A (en) * | 2022-12-02 | 2023-03-14 | 中科雨辰科技有限公司 | Data processing system for obtaining link entity based on entity relationship |
Also Published As
Publication number | Publication date |
---|---|
CN113157946B (en) | 2024-09-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110727880B (en) | Sensitive corpus detection method based on word bank and word vector model | |
CN109388795B (en) | Named entity recognition method, language recognition method and system | |
CN108287858B (en) | Semantic extraction method and device for natural language | |
CN106909655B (en) | The knowledge mapping entity discovery excavated based on production alias and link method | |
US8983826B2 (en) | Method and system for extracting shadow entities from emails | |
CN106570180B (en) | Voice search method and device based on artificial intelligence | |
US11907659B2 (en) | Item recall method and system, electronic device and readable storage medium | |
TWI554896B (en) | Information Classification Method and Information Classification System Based on Product Identification | |
CN116628173B (en) | Intelligent customer service information generation system and method based on keyword extraction | |
CN112948596B (en) | Knowledge graph construction method and device, computer equipment and computer storage medium | |
CN112149386A (en) | Event extraction method, storage medium and server | |
CN110457707B (en) | Method and device for extracting real word keywords, electronic equipment and readable storage medium | |
CN109992651B (en) | Automatic identification and extraction method for problem target features | |
CN113157946A (en) | Entity linking method and device, electronic equipment and storage medium | |
CN113051384B (en) | User portrait extraction method based on dialogue and related device | |
CN114298048A (en) | Named entity identification method and device | |
CN117235137B (en) | Professional information query method and device based on vector database | |
CN113032371A (en) | Database grammar analysis method and device and computer equipment | |
CN112148837A (en) | Maintenance scheme acquisition method, device, equipment and storage medium | |
CN116738979A (en) | Power grid data searching method and system based on core data identification and electronic equipment | |
CN116662557A (en) | Entity relation extraction method and device in network security field | |
CN112613304A (en) | Question answering method, electronic device and storage device | |
CN112115237A (en) | Method and device for constructing tobacco scientific and technical literature data recommendation model | |
CN114462364B (en) | Method and device for inputting information | |
CN113609391B (en) | Event recognition method and device, electronic equipment, medium and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |