CN113254665A - Knowledge graph expansion method and device, electronic equipment and storage medium - Google Patents
Knowledge graph expansion method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113254665A CN113254665A CN202110610082.1A CN202110610082A CN113254665A CN 113254665 A CN113254665 A CN 113254665A CN 202110610082 A CN202110610082 A CN 202110610082A CN 113254665 A CN113254665 A CN 113254665A
- Authority
- CN
- China
- Prior art keywords
- entity
- data
- added
- type
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000013515 script Methods 0.000 claims description 23
- 238000004891 communication Methods 0.000 claims description 19
- 238000013075 data extraction Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 14
- 230000003190 augmentative effect Effects 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000003416 augmentation Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 3
- 238000013145 classification model Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 238000004458 analytical method Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000001960 triggered effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a method and a device for expanding a knowledge graph, electronic equipment and a storage medium, which relate to the technical field of data processing and comprise the following steps: acquiring data to be processed; identifying the entity type of the entity to be added to which the data to be processed belongs as a target entity type; extracting attribute information of an attribute type corresponding to the target entity type from the to-be-processed data based on a first preset relation between the entity type and the attribute type, wherein the attribute information is used as entity data of the to-be-added entity; and expanding the created knowledge graph based on the entity data of the entity to be added. By applying the scheme provided by the embodiment of the invention, the efficiency of expanding the knowledge graph can be improved.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for expanding a knowledge graph, an electronic device, and a storage medium.
Background
With the advent of the information age, various types of information have exploded. Direct relations or indirect relations may exist among different entities to which the information belongs, and various applications can provide various services for users based on the direct relations or the indirect relations. For example, some applications may push merchandise information, movie information, and the like for a user based on the direct or indirect relationships described above.
In the prior art, the information of the entities with direct relation or indirect relation is stored by constructing a knowledge graph. When the knowledge graph is constructed, a worker manually constructs the knowledge graph based on the existing information, and along with the increase of the information, the constructed knowledge graph can be expanded based on the newly added information so as to enrich the information contained in the knowledge graph. In the related art, the constructed knowledge graph is also generally manually expanded by workers, so that the efficiency of expanding the knowledge graph is low.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a device for expanding a knowledge graph, electronic equipment and a storage medium, so as to improve the efficiency of expanding the knowledge graph. The specific technical scheme is as follows:
in a first aspect of the present invention, there is provided a method for knowledge-graph expansion, the method including:
acquiring data to be processed;
identifying the entity type of the entity to be added to which the data to be processed belongs as a target entity type;
extracting attribute information of an attribute type corresponding to the target entity type from the to-be-processed data based on a first preset relation between the entity type and the attribute type, wherein the attribute information is used as entity data of the to-be-added entity;
and expanding the created knowledge graph based on the entity data of the entity to be added.
In an embodiment of the application, the expanding the created knowledge-graph based on the entity data of the entity to be added includes:
searching whether a target entity which is the same as the entity to be added exists in the created knowledge graph or not;
if the entity data exists, combining the entity data of the entity to be added with the entity data of the target entity to realize the expansion of the knowledge graph;
and if the entity data does not exist, creating the entity to be added in the knowledge graph, and adding the entity data of the entity to be added in the knowledge graph.
In an embodiment of the application, the searching whether a target entity that is the same as the entity to be added exists in the created knowledge graph includes:
determining a searching mode for searching a target entity which is the same as the entity to be added in an original entity library based on a second preset relation between the entity type and the searching mode, wherein the original entity library comprises entity data of the entity in the knowledge graph;
and searching whether a target entity which is the same as the entity to be added exists in the original entity library according to the determined searching mode, and if so, determining that the target entity exists in the knowledge graph.
In one embodiment of the present application, the method further comprises:
and adding the entity data of the entity to be added into the original entity library.
In an embodiment of the application, the searching, according to the determined searching manner, whether a target entity that is the same as the entity to be added exists in the original entity library or not includes:
searching an entity similar to the entity to be added from the original entity library by using a preset fuzzy searching mode to serve as a candidate entity;
and searching whether a target entity which is the same as the entity to be added exists in the candidate entities according to the determined searching mode.
In an embodiment of the application, the searching, as the candidate entity, an entity similar to the entity to be added from the original entity library by using a preset fuzzy search manner includes:
searching entities with index information matched with target attribute information in the original entity library as candidate entities, wherein the index information of each entity comprises an attribute field of preset attribute information in entity data of the entity, and the target attribute information is as follows: and name information in the attribute information of the entity to be added.
In an embodiment of the present application, the search manner corresponding to each entity type is:
and searching the attribute information of the attribute type corresponding to the entity type in a one-to-one matching mode.
In an embodiment of the application, the expanding the created knowledge-graph based on the entity data of the entity to be added includes:
carrying out format conversion on the entity data of the entity to be added to obtain triple information of the entity data of the entity to be added, wherein the triple information comprises name information, attribute type information and attribute information of the entity to be added;
and importing the triple information into a graph database of the knowledge graph, and adding the entity data of the entity to be added into the knowledge graph based on the graph database.
In an embodiment of the application, the extracting, from the to-be-processed data, attribute information of an attribute type corresponding to the target entity type based on a first preset relationship between an entity type and the attribute type, as the entity data of the to-be-added entity, includes:
determining the format type of the data to be processed as a target format;
selecting a script for performing data extraction on the data in the target format from pre-designed data extraction scripts as a target script;
determining an attribute type corresponding to the target entity type as a target attribute type based on a first preset relation between the entity type and the attribute type;
and extracting attribute information of the target attribute type from the data to be processed by using the target script to serve as entity data of the entity to be added.
In a second aspect of the present invention, there is also provided a knowledge-graph augmenting apparatus, comprising:
the data acquisition module is used for acquiring data to be processed;
the type identification module is used for identifying the entity type of the entity to be added to which the data to be processed belongs as a target entity type;
the data extraction module is used for extracting attribute information of the attribute type corresponding to the target entity type from the to-be-processed data based on a first preset relation between the entity type and the attribute type, and the attribute information is used as entity data of the to-be-added entity;
and the map expansion module is used for expanding the created knowledge map based on the entity data of the entity to be added.
In a third aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method of any of the first aspects when executing a program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any one of the above-mentioned methods for knowledge-graph expansion.
In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the above-described methods of knowledge-graph augmentation.
In the knowledge graph expansion scheme provided by the embodiment of the invention, data to be processed can be obtained; identifying the entity type of an entity to be added to which the data to be processed belongs as a target entity type; extracting attribute information of an attribute type corresponding to a target entity type from the data to be processed based on a first preset relation between the entity type and the attribute type, wherein the attribute information is used as entity data of an entity to be added; and expanding the created knowledge graph based on the entity data of the entity to be added. After the data to be processed is obtained, the entity type of the entity to which the data to be processed belongs can be determined firstly, and then the entity data of the entity is extracted from the data to be processed, so that the entity data of the entity is added into the knowledge graph, the knowledge graph is expanded, and the knowledge graph does not need to be expanded manually. Therefore, by applying the scheme provided by the embodiment of the invention, the efficiency of expanding the knowledge graph can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow chart of a method for expanding a knowledge graph according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a process for obtaining entity data according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a process of entity matching according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a process of merging entity data according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a knowledge-graph augmenting process according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a knowledge-graph expansion apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In order to improve the efficiency of expanding the knowledge graph, the embodiment of the invention provides a knowledge graph expanding method, a knowledge graph expanding device, electronic equipment and a storage medium.
In one embodiment of the invention, a knowledge graph expanding method is provided, and the method comprises the following steps:
acquiring data to be processed;
identifying the entity type of an entity to be added to which the data to be processed belongs as a target entity type;
extracting attribute information of an attribute type corresponding to a target entity type from the data to be processed based on a first preset relation between the entity type and the attribute type, wherein the attribute information is used as entity data of an entity to be added;
and expanding the created knowledge graph based on the entity data of the entity to be added.
After the data to be processed is obtained, the entity type of the entity to which the data to be processed belongs can be determined firstly, and then the entity data of the entity is extracted from the data to be processed, so that the entity data of the entity is added into the knowledge graph, the knowledge graph is expanded, and the knowledge graph does not need to be expanded manually. Therefore, by applying the scheme provided by the embodiment of the invention, the efficiency of expanding the knowledge graph can be improved.
The method, the apparatus, the electronic device, and the storage medium for expanding a knowledge graph according to embodiments of the present invention are described in detail below with reference to specific embodiments.
Referring to fig. 1, fig. 1 is a schematic flow chart of a knowledge graph expansion method according to an embodiment of the present invention, where the method includes the following steps S101 to S104:
and S101, obtaining data to be processed.
The data to be processed may be: data describing attribute information of entities, such as stars, athletes, movies, games, merchandise, albums, etc. For example, in the case that the entity is a star, the attribute information may include: name information, age information, height information, work information, and the like, and in the case where the entity is an album, the attribute information may include: album name information, issue time information, publisher information, etc.
In an embodiment of the present invention, the data to be processed may be text type data, image type data, voice type data, or the like.
In one embodiment of the present invention, when obtaining the data to be processed, the data may be obtained from a preset database as the data to be processed. And data can be acquired from the public data platform, and the acquired data is taken as data to be processed.
The data platform can be an encyclopedia data platform, such as an encyclopedia data platform, a dog encyclopedia data platform, a google encyclopedia data platform and the like. The encyclopedic data platform has high data accuracy, rich information and clear information typesetting, so that entity data of an entity can be conveniently extracted from the encyclopedic data in the follow-up process, and the data to be processed can be obtained from the encyclopedic data platform, thereby being beneficial to improving the accuracy of the extended knowledge graph in the follow-up process.
Besides, the data platform can be a news data platform, a video data platform, an image data platform, and the like.
S102, identifying the entity type of the entity to be added to which the data to be processed belongs as the target entity type.
Wherein, the entity types are: the type to which the entity belongs may include: star, athlete, movie, game, merchandise, album, etc. For example, in the case that the entity to be added to which the data to be processed belongs is "liu de hua", the target entity type is "star"; and under the condition that the entity to be added to which the data to be processed belongs is 'ten-face buried', the target entity type is 'film'.
In an embodiment of the present invention, when identifying the entity type to which the entity to be added belongs, multiple identification manners may be included, which are respectively described below:
in one mode, a part of data fields may be extracted from the data to be processed, then the extracted part of data fields are matched with a preset classification rule to obtain scores of entities to be added to which the data to be processed belongs and belonging to each entity type, and the entity type with the largest score is selected as a target entity type to which the entities to be added belong.
Wherein, the classification rule is as follows: the data processing method comprises the steps of setting rules for distinguishing the types of entities to which data belong according to fields, wherein the partial data fields can be fields of a header part, a field of a head part, a field of an end part and the like of the data to be processed.
Therefore, the entity to which the data to be processed belongs can be classified according to the classification rule, the type identification mode is simple, and the efficiency of obtaining the type of the target entity is high.
In another mode, the entity types of the entities to be added to which the data to be processed belong can be classified by using a pre-trained binary classification model.
Specifically, each of the two classification models is used for classifying the to-be-processed data of one entity type, for example, if the two classification models are used for classifying the to-be-processed data of a "star" type, after the to-be-processed data is input into the two classification models, the two classification models may output an output result representing "yes" or "no", which indicates that the entity type of the entity of the input to-be-processed data is "star" or not.
A plurality of two classification models can be designed, and different two classification models are used for classifying the data to be processed of different entity types, so that after the data to be processed is obtained, the data to be processed can be sequentially input into each two classification models until the entity type of the entity to which the data to be processed belongs is determined.
In an embodiment of the application, for each secondary-secondary classification model, when the two-classification model is trained, sample data may be obtained in advance, whether an entity type of an entity described by the sample data is a target type is manually judged, label information is obtained, the sample data is input into the two-classification model, a model output result is obtained, parameter adjustment is performed on the two-classification model under the condition that the model output result is inconsistent with the label information, then the sample data is input into the two-classification model after the parameter adjustment again until a preset training end condition is reached, and finally the two-classification model for classifying the data of the target type is obtained.
Wherein the above target types can be understood as: and expecting the entity type corresponding to the data which can be classified by the trained binary classification model.
The training end condition may be that the training frequency reaches a preset frequency threshold, for example, 50000 times, 100000 times, or that a model output result corresponding to a continuous preset number of sample data is consistent with the label information, where the preset number may be 20, 30, 50, or the like.
Therefore, the entity type of the entity to which the data to be processed belongs is identified by using the artificial intelligence model, and the accuracy of the obtained identification result is higher.
In another mode, an entity type of an entity to which each obtained to-be-processed data belongs may also be identified in a manual identification mode.
In an embodiment of the present invention, the entity type of the entity to which the data to be processed belongs may be identified by combining the above three ways. For example, the entity type corresponding to the data to be processed may be identified by using a preset classification rule, and in case of a failure in identification, the entity type may be identified by using a binary model, and if the entity type fails to be identified again, the entity type may be manually identified.
S103, extracting attribute information of the attribute type corresponding to the target entity type from the to-be-processed data based on the first preset relation between the entity type and the attribute type, and using the attribute information as entity data of the to-be-added entity.
Specifically, different entity types and different attribute types have corresponding relationships. For example, taking the entity type as "movie" as an example, the corresponding attribute types include movie name, release time, lead actor, director, movie category, and the like; taking the entity type as game as an example, the corresponding attribute types comprise game names, issuers, game classifications and the like; taking the entity type as "star" as an example, the corresponding attribute types include star name, art name, birth date, work, relationship between relatives and friends, and the like.
For example, assuming that the entity type to which the entity to be added belongs is identified as a star, and the attribute type corresponding to "star" in the first preset relationship includes a star name, an art name, a birth date, a work, and a relationship between relatives and friends, the attribute information of the attribute type may be extracted from the data to be processed as entity data of the star.
In an embodiment of the present invention, when extracting entity data of an entity to be added, the data to be processed and the entity type of the entity to which the data belongs may be input into a data extraction model that is trained in advance, so as to obtain the entity data of the entity to be added, which is output by the model.
In addition, the data to be processed can be converted into a text format, then the data to be processed in the text format is subjected to semantic analysis, and an analysis result used for describing the information of the attribute type corresponding to the target entity type is selected from the analysis result and used as entity data of the entity to be added.
In one embodiment of the invention, the format type of the data to be processed can be determined as a target format; selecting a script for performing data extraction on data in a target format from pre-designed data extraction scripts as a target script; determining an attribute type corresponding to a target entity type as a target attribute type based on a first preset relation between the entity type and the attribute type; and extracting attribute information of the target attribute type from the data to be processed by using the target script to serve as entity data of the entity to be added.
The format types include structured data, semi-structured data, text data, and the like. Because the data extraction modes corresponding to the data with different formats are different, different data extraction scripts can be designed for extracting the data to be processed with different formats.
Specifically, the target format of the data to be processed may be obtained first, then a target script for performing data extraction on the data in the target format is determined, and finally, the target script may be used to extract information of the target attribute type in the data to be processed, which is used as entity data of the entity to be added.
In an embodiment of the application, the target script may perform semantic analysis on the to-be-processed data in the target format to obtain information described by the to-be-processed data, and then search information for describing a target attribute type from the information to obtain an extraction result, so as to obtain entity data of the to-be-added entity.
Referring to fig. 2, fig. 2 is a schematic diagram of a process for obtaining entity data according to an embodiment of the present invention. As shown in fig. 2, the encyclopedia data may be acquired from the encyclopedia data platform through some public data interfaces OpenApi, and is used as the data to be processed, and then the entity type of the entity to which the encyclopedia data belongs is identified by using a pre-trained binary classifier, which is the binary classification model, and may include a star classifier, a game classifier, and the like, which are respectively used for identifying an entity of a star type and an entity of a game type, and after the entity type of the entity to which the encyclopedia data belongs is identified, the entity data of the entity to which the encyclopedia data belongs may be extracted from the encyclopedia data by using an Extractor script, where each entity type corresponds to an Extractor script. Entity Data Raw Data of entities belonging to different entity types can be obtained finally.
And S104, expanding the created knowledge graph based on the entity data of the entity to be added.
Specifically, a knowledge graph may be created in advance based on the entity data of the existing entity, and the created knowledge graph includes entity types, corresponding attribute types, attribute information, and the like of different entities. After the entity data of the entity to be added is obtained, the entity data of the entity to be added can be added into the knowledge graph, so that the knowledge graph is expanded.
In one embodiment of the invention, format conversion can be carried out on the entity data of the entity to be added to obtain the triple information of the entity data of the entity to be added, the triple information is led into the graph database of the knowledge graph, and the entity data of the entity to be added is added into the knowledge graph based on the graph database.
The triple information comprises name information, attribute type information and attribute information of the entity to be added. The format of the triples may be in json-ld format.
Assuming that the entity to be added is "zhou jeren", the belonging entity type is star, and the attribute type corresponding to "star" includes star name, alias, birth date, work, and spouse, the triplet information of the entity to be added may be as shown in table 1 below:
TABLE 1
Specifically, entity data of an entity to be added can be processed to obtain triple information in a json-ld format of the entity data, and then the triple information is imported into a graph database JanusGraph, wherein the graph database can add the triple information into a knowledge graph, so that the knowledge graph is expanded, and services such as query, wandering and the like of the entity can be provided.
In the knowledge graph expansion scheme provided by the embodiment, data to be processed can be obtained; identifying the entity type of an entity to be added to which the data to be processed belongs as a target entity type; extracting attribute information of an attribute type corresponding to a target entity type from the data to be processed based on a first preset relation between the entity type and the attribute type, wherein the attribute information is used as entity data of an entity to be added; and expanding the created knowledge graph based on the entity data of the entity to be added. After the data to be processed is obtained, the entity type of the entity to which the data to be processed belongs can be determined firstly, and then the entity data of the entity is extracted from the data to be processed, so that the entity data of the entity is added into the knowledge graph, the knowledge graph is expanded, and the knowledge graph does not need to be expanded manually. Therefore, the scheme provided by the embodiment can improve the efficiency of expanding the knowledge graph.
In one embodiment of the invention, when the knowledge graph is expanded, whether a target entity which is the same as the entity to be added exists in the created knowledge graph or not can be searched, if so, the entity data of the entity to be added and the entity data of the target entity are merged to realize the expansion of the knowledge graph; and if not, creating an entity to be added in the knowledge graph, and adding entity data of the entity to be added in the knowledge graph.
Specifically, if a target entity which is the same entity as the entity to be added exists in the knowledge graph, the entity to be added does not need to be created in the knowledge graph again, and only the entity data of the entity to be added needs to be merged into the entity data of the target entity in the knowledge graph, so that the knowledge graph can be expanded;
if the knowledge graph does not have a target entity which is the same as the entity to be added, the entity to be added needs to be created in the knowledge graph, and entity data of the entity is added.
In an embodiment of the present invention, when searching for whether a target entity exists in the knowledge graph, a search manner for searching for the target entity that is the same entity as the entity to be added in the original entity library may be determined based on a second preset relationship between the entity type and the search manner, and according to the determined search manner, whether the target entity that is the same entity as the entity to be added exists in the original entity library is searched for, and if so, it is determined that the target entity exists in the knowledge graph.
Wherein, the original entity library comprises entity data of the entities in the knowledge graph. The primary entity library may be a non-relational distributed database Hbase.
Specifically, because the data of different entity types have different attribute types, the data of different attribute types can be searched according to different matching modes. Different entity types correspond to different lookup approaches.
First, according to the entity type of the entity to be added, a search mode corresponding to the entity type is determined from the second preset relationship, then a target entity which is the same as the entity to be added is searched from the original entity library according to the determined search mode, if the target entity can be searched in the original entity library, the target entity exists in the knowledge graph, and if the target entity cannot be searched in the original entity library, the target entity does not exist in the knowledge graph.
In an embodiment of the present invention, the corresponding lookup manner of each entity type is: and searching the attribute information of the attribute type corresponding to the entity type in a one-to-one matching mode.
Therefore, when the target entity is searched, the entity type of the entity to be added can be determined firstly, the attribute type corresponding to the entity type is further determined, then the attribute information of each attribute type is matched one by one, and finally the target entity which is the same entity as the entity to be added is searched.
In an embodiment of the present invention, the entity data of the entity to be added may also be added to the original entity library. Therefore, the original entity library can be ensured to contain the entity data of all the entities in the knowledge graph, and the entity data of the entities in the knowledge graph can be conveniently searched from the original entity library in a follow-up mode.
In an embodiment of the present invention, when searching for a target entity, an entity similar to an entity to be added may be searched for from an original entity library by using a preset fuzzy search manner to serve as a candidate entity, and then, according to the determined search manner, whether a target entity that is the same as the entity to be added exists is searched for from the candidate entity.
Specifically, firstly, a fuzzy search mode is utilized to search whether an entity similar to the entity to be added exists in an original entity library, if so, the target entity which is the same as the entity to be added may exist in the original entity library, so that the searched entity can be used as a candidate entity, and the target entity can be further searched in the candidate entity;
if the candidate entity does not exist, it indicates that the original entity library does not have a target entity which is the same as the entity to be added, so that further search is not needed.
Therefore, the candidate entities are searched firstly, and then the target entities are determined from the candidate entities, and the method for searching the candidate entities is simple, so that the number of searched objects can be reduced when the target entities are searched, and the efficiency of searching the target entities can be improved.
In an embodiment of the present invention, when searching for a candidate entity, an entity whose index information matches the target attribute information in the original entity library may be searched for as the candidate entity.
The index information of each entity comprises an attribute field of preset attribute information in the entity data of the entity, and the target attribute information is as follows: and name information in the attribute information of the entity to be added.
Specifically, an index may be established for each entity in the original entity library, and the index information of each entity includes an attribute field of the preset attribute information, where the preset attribute information may include: name attribute information, relationship attribute information, and the like. Wherein, the index information of each entity can be established by utilizing the ElasticSearch program.
For example, assuming that the entity is "Liu Xiang", the preset attribute information includes name attribute information "Liu Xiang" and relationship attribute information "spouse: wusha', the index information of the entity can be "Liuxiang"; coupling: wusha.
Since the target attribute information is the name information of the entity to be added, it can be understood that a fuzzy search mode in which the name information of the entity to be added is matched with the name information and relationship information of each entity in the original entity library is adopted to search the candidate entity associated with the entity to be added from the original entity library.
In an embodiment of the present invention, when searching for a target entity from candidate entities, the target entity may be searched by using an entity matching model trained in advance.
Specifically, referring to fig. 3, fig. 3 is a schematic diagram of a process of entity matching according to an embodiment of the present invention. As shown in fig. 3, the entity data of the entity to be added and the entity data of the candidate entity may be input into the entity matching model, the entity matching model may first perform structure transformation on the input entity data by using a structure transformation module to obtain entity data in a vector form, then perform matching degree analysis on the entity data after structure transformation by using a trained matching analysis module, and determine whether the candidate entity and the entity to be added are the same entity according to a matching degree analysis result.
Referring to fig. 4, fig. 4 is a schematic diagram of a process of merging entity data according to an embodiment of the present invention. As shown in fig. 4, after the entity data of the entity to be added is obtained, a candidate entity similar to the entity to be added may be searched from the original entity library in a fuzzy search manner by using index information of each entity in the original entity library, then a target entity that is the same entity as the entity to be added is determined from the candidate entity by using the entity matching model, and then the entity data of the entity to be added and the data belonging to the same attribute type in the entity data of the target entity in the knowledge graph are merged according to the attribute type, thereby implementing the expansion of the knowledge graph.
Referring to fig. 5, fig. 5 is a schematic diagram of a process of expanding a knowledge graph according to an embodiment of the present invention, as shown in fig. 5:
the method comprises the steps that firstly, based on the Aiqiyi data, such as data of entities like roles, albums, stars and games, the entity type of the entity to which the Aiqiyi data belong is identified in a manual intervention mode, and the entity data of the entity are extracted from the Aiqiyi data, so that a knowledge graph is constructed and obtained;
acquiring the encyclopedia Data from an encyclopedia Data platform through a plurality of public Data interfaces OpenApi to serve as Data to be processed, identifying the entity type of the entity to which the encyclopedia Data belong by utilizing a pre-trained binary classifier, wherein the binary classifier can comprise a star classifier, a game classifier and the like, extracting the entity Data of the entity to which the encyclopedia Data belong from the encyclopedia Data by utilizing an Extractor script after identifying the entity type of the entity to which the encyclopedia Data belong, and finally obtaining the entity Data Raw Data of the entities belonging to different entity types;
then, aiming at entity data of different entity types, entity data of entities in an entity library, which belong to the same entity with the entity to be added, can be identified by using a deduplication machine, then the entity data belonging to the same entity are merged, and the deduplicated entity data is imported into a graph database to realize the expansion of a knowledge graph.
In the knowledge graph expansion scheme provided by the embodiment, data to be processed can be obtained; identifying the entity type of an entity to be added to which the data to be processed belongs as a target entity type; extracting attribute information of an attribute type corresponding to a target entity type from the data to be processed based on a first preset relation between the entity type and the attribute type, wherein the attribute information is used as entity data of an entity to be added; and expanding the created knowledge graph based on the entity data of the entity to be added. After the data to be processed is obtained, the entity type of the entity to which the data to be processed belongs can be determined firstly, and then the entity data of the entity is extracted from the data to be processed, so that the entity data of the entity is added into the knowledge graph, the knowledge graph is expanded, and the knowledge graph does not need to be expanded manually. Therefore, the scheme provided by the embodiment can improve the efficiency of expanding the knowledge graph.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a knowledge-graph extending apparatus according to an embodiment of the present invention, the apparatus includes:
a data obtaining module 601, configured to obtain data to be processed;
a type identification module 602, configured to identify an entity type of an entity to be added to which the to-be-processed data belongs, as a target entity type;
a data extraction module 603, configured to extract, based on a first preset relationship between an entity type and an attribute type, attribute information of the attribute type corresponding to the target entity type from the to-be-processed data, where the attribute information is used as entity data of the to-be-added entity;
and the map expanding module 604 is configured to expand the created knowledge map based on the entity data of the entity to be added.
In an embodiment of the present application, the atlas extension module 604 includes:
the target entity searching submodule is used for searching whether a target entity which is the same as the entity to be added exists in the created knowledge graph or not, if yes, the first expansion submodule is triggered, and if not, the second expansion submodule is triggered;
the first expansion submodule is used for merging the entity data of the entity to be added with the entity data of the target entity to realize the expansion of the knowledge graph;
the second expansion submodule is used for creating the entity to be added in the knowledge graph and adding entity data of the entity to be added in the knowledge graph.
In an embodiment of the present application, the target entity searching sub-module includes:
the searching mode determining unit is used for determining a searching mode for searching a target entity which is the same as the entity to be added in an original entity library based on a second preset relation between the entity type and the searching mode, wherein the original entity library comprises entity data of the entity in the knowledge graph;
and the target entity searching unit is used for searching whether a target entity which is the same as the entity to be added exists in the original entity library according to the determined searching mode, and if so, determining that the target entity exists in the knowledge graph.
In an embodiment of the present application, the apparatus further includes a data adding unit, configured to:
and adding the entity data of the entity to be added into the original entity library.
In an embodiment of the application, the target entity searching unit includes:
a candidate entity searching subunit, configured to search, by using a preset fuzzy searching manner, an entity similar to the entity to be added from the original entity library, as a candidate entity;
and the target entity searching subunit is used for searching whether a target entity which is the same as the entity to be added exists in the candidate entities according to the determined searching mode.
In an embodiment of the present application, the candidate entity searching subunit is specifically configured to:
searching entities with index information matched with target attribute information in the original entity library as candidate entities, wherein the index information of each entity comprises an attribute field of preset attribute information in entity data of the entity, and the target attribute information is as follows: and name information in the attribute information of the entity to be added.
In an embodiment of the present application, the search manner corresponding to each entity type is:
and searching the attribute information of the attribute type corresponding to the entity type in a one-to-one matching mode.
In an embodiment of the present application, the spectrum expansion module 604 is specifically configured to:
carrying out format conversion on the entity data of the entity to be added to obtain triple information of the entity data of the entity to be added, wherein the triple information comprises name information, attribute type information and attribute information of the entity to be added;
and importing the triple information into a graph database of the knowledge graph, and adding the entity data of the entity to be added into the knowledge graph based on the graph database.
In an embodiment of the present application, the data extraction module 603 is specifically configured to:
determining the format type of the data to be processed as a target format;
selecting a script for performing data extraction on the data in the target format from pre-designed data extraction scripts as a target script;
determining an attribute type corresponding to the target entity type as a target attribute type based on a first preset relation between the entity type and the attribute type;
and extracting attribute information of the target attribute type from the data to be processed by using the target script to serve as entity data of the entity to be added.
In the knowledge graph expansion scheme provided by the embodiment, data to be processed can be obtained; identifying the entity type of an entity to be added to which the data to be processed belongs as a target entity type; extracting attribute information of an attribute type corresponding to a target entity type from the data to be processed based on a first preset relation between the entity type and the attribute type, wherein the attribute information is used as entity data of an entity to be added; and expanding the created knowledge graph based on the entity data of the entity to be added. After the data to be processed is obtained, the entity type of the entity to which the data to be processed belongs can be determined firstly, and then the entity data of the entity is extracted from the data to be processed, so that the entity data of the entity is added into the knowledge graph, the knowledge graph is expanded, and the knowledge graph does not need to be expanded manually. Therefore, the scheme provided by the embodiment can improve the efficiency of expanding the knowledge graph.
An embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete mutual communication through the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to implement the method for expanding a knowledge graph when executing the program stored in the memory 703.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the method for expanding a knowledge graph according to any one of the above embodiments.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of knowledge-graph augmentation described in any of the above embodiments.
In the knowledge graph expansion scheme provided by the embodiment, data to be processed can be obtained; identifying the entity type of an entity to be added to which the data to be processed belongs as a target entity type; extracting attribute information of an attribute type corresponding to a target entity type from the data to be processed based on a first preset relation between the entity type and the attribute type, wherein the attribute information is used as entity data of an entity to be added; and expanding the created knowledge graph based on the entity data of the entity to be added. After the data to be processed is obtained, the entity type of the entity to which the data to be processed belongs can be determined firstly, and then the entity data of the entity is extracted from the data to be processed, so that the entity data of the entity is added into the knowledge graph, the knowledge graph is expanded, and the knowledge graph does not need to be expanded manually. Therefore, the scheme provided by the embodiment can improve the efficiency of expanding the knowledge graph.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, electronic device embodiments, computer-readable storage medium embodiments, and computer program product embodiments are substantially similar to method embodiments and therefore are described with relative ease, as appropriate, with reference to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (12)
1. A method of knowledge-graph augmentation, the method comprising:
acquiring data to be processed;
identifying the entity type of the entity to be added to which the data to be processed belongs as a target entity type;
extracting attribute information of an attribute type corresponding to the target entity type from the to-be-processed data based on a first preset relation between the entity type and the attribute type, wherein the attribute information is used as entity data of the to-be-added entity;
and expanding the created knowledge graph based on the entity data of the entity to be added.
2. The method of claim 1, wherein the augmenting the created knowledge-graph based on the entity data of the entity to be added comprises:
searching whether a target entity which is the same as the entity to be added exists in the created knowledge graph or not;
if the entity data exists, combining the entity data of the entity to be added with the entity data of the target entity to realize the expansion of the knowledge graph;
and if the entity data does not exist, creating the entity to be added in the knowledge graph, and adding the entity data of the entity to be added in the knowledge graph.
3. The method of claim 1, wherein the finding whether the target entity exists in the created knowledge-graph, wherein the target entity is the same entity as the entity to be added comprises:
determining a searching mode for searching a target entity which is the same as the entity to be added in an original entity library based on a second preset relation between the entity type and the searching mode, wherein the original entity library comprises entity data of the entity in the knowledge graph;
and searching whether a target entity which is the same as the entity to be added exists in the original entity library according to the determined searching mode, and if so, determining that the target entity exists in the knowledge graph.
4. The method of claim 3, further comprising:
and adding the entity data of the entity to be added into the original entity library.
5. The method according to claim 3, wherein said searching whether the original entity library has a target entity that is the same as the entity to be added according to the determined search manner comprises:
searching an entity similar to the entity to be added from the original entity library by using a preset fuzzy searching mode to serve as a candidate entity;
and searching whether a target entity which is the same as the entity to be added exists in the candidate entities according to the determined searching mode.
6. The method according to claim 5, wherein the searching for the entity similar to the entity to be added from the original entity library by using a preset fuzzy search manner as the candidate entity comprises:
searching entities with index information matched with target attribute information in the original entity library as candidate entities, wherein the index information of each entity comprises an attribute field of preset attribute information in entity data of the entity, and the target attribute information is as follows: and name information in the attribute information of the entity to be added.
7. The method of claim 3, wherein the lookup manner corresponding to each entity type is:
and searching the attribute information of the attribute type corresponding to the entity type in a one-to-one matching mode.
8. The method according to any one of claims 1-7, wherein the augmenting the created knowledge-graph based on the entity data of the entity to be added comprises:
carrying out format conversion on the entity data of the entity to be added to obtain triple information of the entity data of the entity to be added, wherein the triple information comprises name information, attribute type information and attribute information of the entity to be added;
and importing the triple information into a graph database of the knowledge graph, and adding the entity data of the entity to be added into the knowledge graph based on the graph database.
9. The method according to any one of claims 1 to 7, wherein the extracting attribute information of an attribute type corresponding to the target entity type from the to-be-processed data as the entity data of the to-be-added entity based on a first preset relationship between an entity type and an attribute type includes:
determining the format type of the data to be processed as a target format;
selecting a script for performing data extraction on the data in the target format from pre-designed data extraction scripts as a target script;
determining an attribute type corresponding to the target entity type as a target attribute type based on a first preset relation between the entity type and the attribute type;
and extracting attribute information of the target attribute type from the data to be processed by using the target script to serve as entity data of the entity to be added.
10. A knowledge-graph augmenting apparatus, said apparatus comprising:
the data acquisition module is used for acquiring data to be processed;
the type identification module is used for identifying the entity type of the entity to be added to which the data to be processed belongs as a target entity type;
the data extraction module is used for extracting attribute information of the attribute type corresponding to the target entity type from the to-be-processed data based on a first preset relation between the entity type and the attribute type, and the attribute information is used as entity data of the to-be-added entity;
and the map expansion module is used for expanding the created knowledge map based on the entity data of the entity to be added.
11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method of any one of claims 1 to 9 when executing a program stored in a memory.
12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110610082.1A CN113254665A (en) | 2021-06-01 | 2021-06-01 | Knowledge graph expansion method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110610082.1A CN113254665A (en) | 2021-06-01 | 2021-06-01 | Knowledge graph expansion method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113254665A true CN113254665A (en) | 2021-08-13 |
Family
ID=77185724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110610082.1A Pending CN113254665A (en) | 2021-06-01 | 2021-06-01 | Knowledge graph expansion method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113254665A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114969385A (en) * | 2022-08-03 | 2022-08-30 | 北京长河数智科技有限责任公司 | Knowledge graph optimization method and device based on document attribute assignment entity weight |
CN116303625A (en) * | 2023-05-17 | 2023-06-23 | 之江实验室 | Data query method and device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635120A (en) * | 2018-10-30 | 2019-04-16 | 百度在线网络技术(北京)有限公司 | Construction method, device and the storage medium of knowledge mapping |
CN111026874A (en) * | 2019-11-22 | 2020-04-17 | 海信集团有限公司 | Data processing method and server of knowledge graph |
CN111858962A (en) * | 2020-07-27 | 2020-10-30 | 腾讯科技(成都)有限公司 | Data processing method, device and computer readable storage medium |
CN112100343A (en) * | 2020-08-17 | 2020-12-18 | 深圳数联天下智能科技有限公司 | Method for expanding knowledge graph, electronic equipment and storage medium |
-
2021
- 2021-06-01 CN CN202110610082.1A patent/CN113254665A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635120A (en) * | 2018-10-30 | 2019-04-16 | 百度在线网络技术(北京)有限公司 | Construction method, device and the storage medium of knowledge mapping |
CN111026874A (en) * | 2019-11-22 | 2020-04-17 | 海信集团有限公司 | Data processing method and server of knowledge graph |
CN111858962A (en) * | 2020-07-27 | 2020-10-30 | 腾讯科技(成都)有限公司 | Data processing method, device and computer readable storage medium |
CN112100343A (en) * | 2020-08-17 | 2020-12-18 | 深圳数联天下智能科技有限公司 | Method for expanding knowledge graph, electronic equipment and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114969385A (en) * | 2022-08-03 | 2022-08-30 | 北京长河数智科技有限责任公司 | Knowledge graph optimization method and device based on document attribute assignment entity weight |
CN116303625A (en) * | 2023-05-17 | 2023-06-23 | 之江实验室 | Data query method and device, storage medium and electronic equipment |
CN116303625B (en) * | 2023-05-17 | 2023-07-21 | 之江实验室 | Data query method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804532B (en) | Query intention mining method and device and query intention identification method and device | |
CN103984738B (en) | Role labelling method based on search matching | |
US8145648B2 (en) | Semantic metadata creation for videos | |
CN111831911B (en) | Query information processing method and device, storage medium and electronic device | |
US8577882B2 (en) | Method and system for searching multilingual documents | |
KR20130142121A (en) | Multi-modal approach to search query input | |
CN105956053B (en) | A kind of searching method and device based on the network information | |
CN111274442B (en) | Method for determining video tag, server and storage medium | |
CN110991187A (en) | Entity linking method, device, electronic equipment and medium | |
CN113806588B (en) | Method and device for searching video | |
CN111008321A (en) | Recommendation method and device based on logistic regression, computing equipment and readable storage medium | |
CN109582847B (en) | Information processing method and device and storage medium | |
JP6932360B2 (en) | Object search method, device and server | |
CN112434533B (en) | Entity disambiguation method, entity disambiguation device, electronic device, and computer-readable storage medium | |
JP6829740B2 (en) | Data search method and its data search system | |
KR100896336B1 (en) | System and Method for related search of moving video based on visual content | |
CN113934869A (en) | Database construction method, multimedia file retrieval method and device | |
CN113254665A (en) | Knowledge graph expansion method and device, electronic equipment and storage medium | |
EP3905060A1 (en) | Artificial intelligence for content discovery | |
RU2568276C2 (en) | Method of extracting useful content from mobile application setup files for further computer data processing, particularly search | |
CN112836126A (en) | Recommendation method and device based on knowledge graph, electronic equipment and storage medium | |
CN110209781B (en) | Text processing method and device and related equipment | |
CN112911331B (en) | Music identification method, device, equipment and storage medium for short video | |
JP7395377B2 (en) | Content search methods, devices, equipment, and storage media | |
CN113065018A (en) | Audio and video index library creating and retrieving method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |