CN111026874A - Data processing method and server of knowledge graph - Google Patents

Data processing method and server of knowledge graph Download PDF

Info

Publication number
CN111026874A
CN111026874A CN201911155243.1A CN201911155243A CN111026874A CN 111026874 A CN111026874 A CN 111026874A CN 201911155243 A CN201911155243 A CN 201911155243A CN 111026874 A CN111026874 A CN 111026874A
Authority
CN
China
Prior art keywords
entity
knowledge graph
processed
data
original data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911155243.1A
Other languages
Chinese (zh)
Inventor
陈维强
高雪松
蒋鹏民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Group Co Ltd
Hisense Co Ltd
Original Assignee
Hisense Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Co Ltd filed Critical Hisense Co Ltd
Priority to CN201911155243.1A priority Critical patent/CN111026874A/en
Publication of CN111026874A publication Critical patent/CN111026874A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data processing method and a server of a knowledge graph, wherein the method comprises the following steps: acquiring original data to be processed; determining whether the original data to be processed has a corresponding entity in the current knowledge graph or not according to the original data to be processed and the current knowledge graph; if so, performing knowledge completion updating processing on the corresponding entity in the current knowledge graph according to the original data to be processed to obtain a new knowledge graph; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph. The dynamic completion and updating of the knowledge graph are realized, the availability and the effectiveness of the original data are improved, the work efficiency is effectively improved, and the time and the labor cost are reduced.

Description

Data processing method and server of knowledge graph
Technical Field
The application relates to the technical field of internet, in particular to a data processing method and a server of a knowledge graph.
Background
With the rapid development of big data and artificial intelligence, knowledge maps are an important component of artificial intelligence technology, and have been widely applied to the fields of finance, agriculture, e-commerce, medical electronics, transportation and the like due to their strong semantic processing, interconnection organization, information retrieval and knowledge reasoning capabilities. A knowledge graph is a huge semantic network graph, which represents entities or concepts by nodes (or vertices), and represents relationships by edges to describe various entities or concepts and their relationships existing in the real world.
With the deep application of big data technologies, the public safety field has also opened a new era. By means of effectively integrating various data, constructing a multi-dimensional analysis model and the like, the capabilities of information insights, analysis, study and judgment, investigation and attack and command management are improved. At present, public security institutions already have a large amount of multi-source heterogeneous data, but the association relationship among the data is difficult to clear, so that the data is poor in effectiveness.
Therefore, how to improve the effectiveness of the public security data, effectively improve the work efficiency, and reduce the time and labor cost becomes a technical problem which needs to be solved urgently.
Disclosure of Invention
The application provides a data processing method and a server of a knowledge graph, which aim to overcome the defects of poor validity of original data and the like in the prior art.
In a first aspect, the present application provides a data processing method of a knowledge graph, including:
acquiring original data to be processed;
determining whether the original data to be processed has a corresponding entity in the current knowledge graph or not according to the original data to be processed and the current knowledge graph;
if so, performing knowledge completion updating processing on the corresponding entity in the current knowledge graph according to the original data to be processed to obtain a new knowledge graph;
if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph.
A second aspect of the present application provides a server comprising:
the acquisition module is used for acquiring original data to be processed;
the determining module is used for determining whether the original data to be processed has a corresponding entity in the current knowledge map according to the original data to be processed and the current knowledge map;
a processing module to:
if so, performing knowledge completion updating processing on the corresponding entity in the current knowledge graph according to the original data to be processed to obtain a new knowledge graph; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph.
A third aspect of the present application provides an electronic device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the method as set forth in the first aspect above and in various possible designs of the first aspect.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement a method as set forth in the first aspect and various possible designs of the first aspect.
According to the data processing method and the server of the knowledge graph, original data to be processed (namely newly added original data) are obtained, whether the original data to be processed has a corresponding entity in the current knowledge graph is determined according to the original data to be processed and the current knowledge graph, if yes, the corresponding entity in the current knowledge graph is subjected to complemental knowledge updating according to the original data to be processed, and a new knowledge graph is obtained; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph. The dynamic completion and updating of the knowledge graph are realized, the availability and the effectiveness of the original data are improved, the work efficiency is effectively improved, and the time and the labor cost are reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a schematic diagram of an architecture of a processing system according to an embodiment of the present application;
FIG. 2 is a flow chart of a data processing method of a knowledge graph according to an embodiment of the present application;
FIG. 3 is a schematic flow chart diagram of a data processing method for a knowledge-graph according to another embodiment of the present application;
FIG. 4 is an exemplary diagram of a knowledge-graph taxonomy integration provided by an embodiment of the present application;
FIG. 5 is a schematic illustration of a visualization effect of a public security knowledge graph provided in an embodiment of the present application;
FIG. 6A is a block diagram of a physical portion of an overall architecture diagram of a public security knowledge graph according to an embodiment of the present application;
FIG. 6B is an edge portion of an overall architecture diagram of a public security knowledge graph construction provided in accordance with an embodiment of the present application;
fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms referred to in this application are explained first:
elastic search: is a Lucene-based search server. It provides a distributed multi-user capable full-text search engine based on RESTful web interface. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine. The elastic search is used in cloud computing, can achieve real-time search, and is stable, reliable, rapid, and convenient to install and use.
JanusGraph: is an open source distributed graph database. The method has good expansibility, and can support the storage and query of hundreds of billions of vertex and edge graph data through a multi-computer cluster. JanusGraph is a transaction database that supports a large number of users performing complex real-time graph traversals with high concurrency.
cassandra: the system is a set of open source distributed NoSQL database system and is a mixed type non-relational database.
The data processing method of the knowledge graph provided by the embodiment of the application is suitable for application scenes of supplementing or updating the knowledge graph based on multi-source heterogeneous data, such as a large amount of multi-source heterogeneous data owned by a public security organization, and the complete public security knowledge graph is constructed, perfected and obtained. Fig. 1 is a schematic diagram of an architecture of a processing system according to an embodiment of the present application. The processing system can comprise a server and a terminal, and related personnel can realize knowledge graph modeling and related definitions such as entity label definition, edge label definition, entity attribute definition, edge attribute definition and the like through the interaction between the terminal and the server. After the basic knowledge graph model is constructed, the subsequent server can perform knowledge completion updating on the knowledge graph according to the newly added original data based on the knowledge graph model. The server can obtain original data to be processed (namely newly added original data), and according to the original data to be processed and the current knowledge graph, whether the original data to be processed has a corresponding entity in the current knowledge graph is determined, if yes, the corresponding entity in the current knowledge graph is subjected to complementary knowledge updating processing according to the original data to be processed, and a new knowledge graph is obtained; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph. The dynamic completion and updating of the knowledge graph are realized, the availability and the effectiveness of the original data are improved, the work efficiency is effectively improved, and the time and the labor cost are reduced.
Optionally, the server may perform the above processing in real time or at regular time, and supplement the newly added original data to the knowledge graph in time.
Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. In the description of the following examples, "plurality" means two or more unless specifically limited otherwise.
The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
An embodiment of the present application provides a data processing method of a knowledge graph, which is used for completing or updating the knowledge graph. The execution subject of the present embodiment is a server.
As shown in fig. 2, a schematic flow chart of a data processing method of a knowledge graph provided in this embodiment is provided, where the method includes:
step 101, obtaining original data to be processed.
Specifically, the original data to be processed is text data after preprocessing. The server can read the original data to be processed and acquire the related data therein. Preprocessing may include null filling and format conversion of the raw form data, and so forth.
For example, the raw form data of the public security organization is taken as an example, for example, the raw form data includes case, xlsx, and cell resident, xlsx. As shown in table 1, is an exemplary table of case. xlsx, and as shown in table 2, is an exemplary table of cell population. xlsx.
TABLE 1
Case type Case numbering Suspect Contact telephone Kitchen floor Victims
case_type case_id sus_name sus_phone case_place vic_name
Robbery 123***789 Zhang 152****5741 Zhejiang river Ma x
TABLE 2
Name (I) Sex Identity card number Contact telephone 1 Contact telephone 2 Cell name
person_name gender person_id phone1 phone2 res_name
Korean For male 152****5741 Garden
It can be seen from the above raw table data that the content of many fields in the table is empty, which is frequently reported in the process of creating a knowledge graph using the JanusGraph database, and therefore, the field content needs to be processed as empty, for example, it can be filled with "none" or "empty" words. When the multi-source heterogeneous original form data is used for constructing the knowledge graph, format conversion needs to be carried out on the multi-source heterogeneous original form data to obtain text data in a uniform format. For example, to process the contents of each column of fields in the table, the tab is replaced with a separator. Since the original table data contains many symbols, the choice of separator is also important. To avoid collision with symbols in the original table data, a & symbol is selected as a delimiter. The effects achieved are as follows:
robbery 123***789 Zhang 152****5741 Zhejiang river Ma x
Converting the original table data into text txt content as follows:
robbery &123 × 789& 152 × 5741& zhejiang × and none
And 102, determining whether the original data to be processed has a corresponding entity in the current knowledge graph or not according to the original data to be processed and the current knowledge graph.
Specifically, after the raw data to be processed is obtained, it is necessary to determine whether there is an entity that does not exist in the current knowledge graph in the raw data to be processed, if there is an entity that does not exist, a corresponding entity needs to be created, if there are all entities, it is necessary to determine whether there is other information that does not exist in the current knowledge graph in the raw data to be processed, and if there is other information, completion or updating is performed.
The current knowledge graph is a knowledge graph constructed according to the existing original data before, and the original data to be processed is newly added original data generated currently.
The construction process of the current knowledge graph specifically comprises the following steps: 1. preprocessing original data; 2. and (4) modeling and predefining the knowledge graph data. The method specifically comprises the following steps: defining entity tags, defining edge tags, defining entity attributes, defining edge attributes, building indexes, etc. 3. And modeling the knowledge graph. The knowledge graph data modeling predefinition can be defined according to actual requirements, and a foundation is laid for subsequent knowledge graph modeling.
And performing knowledge expansion, completion and updating on the current knowledge graph based on the newly added original data to be processed.
For example, original data to be processed, such as a txt document obtained through preprocessing, may be read, and by taking case text case. txt as an example, the case. txt is traversed line by line, and relevant data in the text is taken by a separator. The header content of the non-data in the text can be removed according to actual requirements, for example, the first two rows in the original table data (table 1 and table 2) are also converted into the text, but the header content of the non-data is removed when the text data is read.
The items of the original data to be processed, which need to establish an entity, can be determined according to preset entity rules, for example, a person needs to establish a person entity, a mobile phone number needs to establish a mobile phone number entity, an identification card number needs to establish an identification card number entity, a vehicle needs to establish a vehicle entity, and the like.
Taking the establishment of the mobile phone number entity as an example, if one item of content is a mobile phone number in one line in the process of traversing the original data to be processed, the mobile phone number entity needs to be established, but it needs to be judged whether the mobile phone number has a corresponding entity in the current knowledge graph or not at first, if the mobile phone number has the corresponding entity, the mobile phone number entity does not need to be established, and if the mobile phone number does not have the corresponding entity, the mobile phone number entity needs to be established, so that the uniqueness of the entity in the knowledge graph is ensured.
For a mobile phone number entity, a specific mobile phone number is the unique attribute of the mobile phone number entity, and in order to ensure the uniqueness of the entity, the unique attribute of the entity cannot be null. When the mobile phone number is not empty, the mobile phone number can be adopted to inquire whether the mobile phone number entity exists in the current knowledge graph.
In a knowledge graph, each entity, when created, generates a unique identifier, such as an entity ID, for the entity. The entity is given related content, for example, the entity unique identifier corresponding to the human entity 1 is 0001, and the attribute is given to the human entity 1: three names, a sex male, a mobile phone number 1, an identity card number 1 and the like are all attributes of the entity 1 of the person. Wherein, the identification number 1 is the only attribute of the person entity 1. Each attribute may also have a corresponding attribute identification, such as an attribute ID. That is, the attributes may exist in the form of key-value pairs.
The mobile phone number can exist as the attribute of a human entity or as the unique attribute of a mobile phone entity. Other attribute contents can also be applied to the situation and can be set according to the actual situation.
And 103, if so, performing knowledge completion updating processing on the corresponding entity in the current knowledge graph according to the original data to be processed to obtain a new knowledge graph.
Specifically, if the original data to be processed has a corresponding entity in the current knowledge graph, the corresponding entity in the current knowledge graph is subjected to completion of knowledge and update according to the original data to be processed, so that a new knowledge graph is obtained, and the new knowledge graph can be used as the current knowledge graph for the next round of processing.
The knowledge completion updating refers to performing knowledge completion on the attribute of the entity. For example, when traversing to a certain row in the original data to be processed, the contents are: robbery &123 × 789& 152 × 5741& zhejiang × 57 × mare, mobile phone number entity 1 needs to be established, and the corresponding attributes are: if the mobile phone number entity 1 exists in the current knowledge graph, but only the mobile phone number 152 is left in the attributes of the mobile phone number entity 1, attribute knowledge completion needs to be performed on the mobile phone number entity 1, and other attributes are completed in the current knowledge graph to obtain a new knowledge graph. For another example, the attributes of the mobile phone number entity 1 include a mobile phone number 152 × 5741 and an age 3 × year, and the attribute age 3 × year needs to be updated based on the original data to be processed, where the updating method may be to replace the age 3 × year with the age 4, or to reserve the age 3 × year and add the age 4 × year, or to mark the update time.
And step 104, if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph.
Specifically, if the original data to be processed does not have a corresponding entity in the current knowledge graph, a corresponding entity is newly built in the current knowledge graph according to the original data to be processed, and the newly built entity is subjected to knowledge completion processing according to the original data to be processed, so that a new knowledge graph is obtained.
For example, in the current knowledge map, if the mobile phone number entity 1 does not exist, a new mobile phone number entity 1 needs to be created, and attribute knowledge completion updating processing is performed on the mobile phone number entity 1 based on the original data to be processed. The specific attribute knowledge completion update process is similar to that described above, and is not described herein again.
In the data processing method of the knowledge graph provided by this embodiment, original data to be processed (i.e., newly added original data) is obtained, and whether the original data to be processed has a corresponding entity in the current knowledge graph is determined according to the original data to be processed and the current knowledge graph, and if yes, the corresponding entity in the current knowledge graph is subjected to completion-of-knowledge updating according to the original data to be processed, so that a new knowledge graph is obtained; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph. The dynamic completion and updating of the knowledge graph are realized, the availability and the effectiveness of the original data are improved, the work efficiency is effectively improved, and the time and the labor cost are reduced.
The method provided by the above embodiment is further described in an additional embodiment of the present application.
As shown in fig. 3, a flow chart of the data processing method of the knowledge graph provided in this embodiment is schematically illustrated.
As an implementable manner, on the basis of the foregoing embodiment, optionally, determining whether the raw data to be processed has a corresponding entity in the current knowledge graph according to the raw data to be processed and the current knowledge graph, includes:
step 1021, traversing the raw data to be processed according to rows, and judging items of entities to be established according to preset entity rules for each row of data.
Step 1022, for each item that needs to establish an entity, determine whether the item has a corresponding entity in the current knowledge-graph.
Specifically, after the raw data to be processed is obtained, traversal operation may be performed on the raw data to be processed according to rows, and for each row of data, an item in which an entity needs to be established may be determined according to a preset entity rule. The preset entity rules can be that a person entity needs to be established for a person, a mobile phone number entity needs to be established for a mobile phone number, a vehicle entity needs to be established for a vehicle, and the like. The method can be specifically set according to actual requirements.
After determining the item needing to establish the entity, judging the item, judging whether the item has a corresponding entity in the current knowledge graph, if so, not establishing the entity, only performing attribute knowledge completion updating processing on the entity, otherwise, establishing the corresponding entity in the current knowledge graph, and performing attribute knowledge completion updating processing on the established corresponding entity based on the original data to be processed of the item.
For example, taking establishing a mobile phone number entity as an example, if one content of one line is a mobile phone number in the process of traversing the original data to be processed, the mobile phone number entity needs to be established, but it needs to be judged whether the mobile phone number already has a corresponding entity in the current knowledge graph, if the mobile phone number already has the corresponding entity, the mobile phone number entity does not need to be established, and if the mobile phone number entity does not already have the corresponding entity, the mobile phone number entity needs to be established, so that the uniqueness of the entity in the knowledge graph is ensured.
Optionally, for each item that needs to establish an entity, determining whether the item has a corresponding entity in the current knowledge-graph includes:
step 2011, obtain the unique attribute corresponding to the item.
Step 2012, according to the unique attribute corresponding to the item, querying whether an entity having the same attribute as the unique attribute and belonging to the same type as the item exists in the current knowledge graph.
And 2013, if yes, indicating that the item has a corresponding entity in the current knowledge graph.
If not, it indicates that the item does not have a corresponding entity in the current knowledge-graph.
Specifically, in the knowledge-graph, each entity, when created, generates a unique identifier, such as an entity ID, for the entity. The entity is given related content, for example, the entity unique identifier corresponding to the human entity 1 is 0001, and the attribute is given to the human entity 1: three names, a sex male, a mobile phone number 1, an identity card number 1 and the like are all attributes of the entity 1 of the person. Wherein, the identification number 1 is the only attribute of the person entity 1. Each attribute may also have a corresponding attribute identification, such as an attribute ID. That is, the attributes may exist in the form of key-value pairs.
When judging whether an item needing to establish an entity has a corresponding entity in the current knowledge graph, acquiring a unique attribute corresponding to the item, inquiring whether an entity which has the same attribute as the unique attribute corresponding to the item and belongs to the same type as the item exists in the current knowledge graph according to the unique attribute corresponding to the item, if so, indicating that the item has the corresponding entity in the current knowledge graph, and if not, indicating that the item does not have the corresponding entity in the current knowledge graph. The type refers to a classification of an entity, and the classification of the entity can be determined by defining an entity tag, and the type may include: the system comprises a human entity, a mobile phone number entity, a vehicle entity, an identity card number entity and the like, and can be specifically set according to actual requirements.
For example, for a mobile phone number entity, a specific mobile phone number is the only attribute of the mobile phone number entity, and whether the mobile phone number entity exists can be queried in the current knowledge graph according to the specific mobile phone number. For example, when traversing to a certain row in the original data to be processed, the contents are: robbing &123 × 789& addi &40 & addi &152 × 5741& zhe jiang × 5741& addi & mary & none, determining that an item needing to establish a mobile phone number entity is 152 × 5741, then obtaining a unique attribute (i.e. 152 × 5741) corresponding to the item, searching whether mobile phone number entities with the same attribute exist in the current knowledge graph according to 152 × 5741, if so, indicating that the mobile phone number entity corresponding to the item exists in the current knowledge graph, and if not, indicating that the mobile phone number entity corresponding to the item does not exist in the current knowledge graph.
Optionally, the method further comprises:
step 2021, obtain the index information corresponding to the unique attribute corresponding to the item.
According to the unique attributes corresponding to the items, inquiring whether entities which have the same attributes as the unique attributes and belong to the same types as the items exist in the current knowledge graph or not, wherein the inquiring comprises the following steps:
step 2022, according to the unique attribute and the index information corresponding to the item, it is queried whether there is an entity having the same attribute as the unique attribute and belonging to the same type as the item in the current knowledge graph.
Specifically, indexes can be established for the knowledge graph so as to improve the retrieval and query efficiency. JanusGraph supports indexing to increase query processing speed, graph queries can often be traversed starting from a list of entities or edges identified by their attributes, and indexing makes global retrieval in large graphs very efficient. JanusGraph may also perform a full graph scan to find the desired vertices (i.e., entities) if an index is missing. This of course also returns the correct results, but the full map scan is very inefficient and reduces system performance throughout the production environment.
The index information of the entity is declared by the entity attribute, so that the retrieval and query speed can be increased.
For example, when a certain item is a corresponding mobile phone number entity and it is necessary to determine whether the item has a corresponding mobile phone number entity in the current knowledge graph, the current knowledge graph needs to be queried according to the unique attribute (i.e., a specific mobile phone number) corresponding to the item, and if an index is established, for example, a corresponding relationship between the unique attribute of the mobile phone number entity and a specific storage region is established in the current knowledge graph, the region corresponding to the unique attribute of the mobile phone number entity can be queried according to the unique attribute corresponding to the item without traversing other entity regions in the current knowledge graph. For example, the current knowledge graph is stored in a database, the unique attribute of the mobile phone number entity is a column, and the current knowledge graph can be queried in the column according to the unique attribute corresponding to the item without querying other columns.
As another practicable manner, on the basis of the foregoing embodiment, optionally, the method further includes:
step 2031, if it is determined that the first entity and the second entity have the preset relationship according to the raw data to be processed, querying a third entity pointed by each edge connected with the first entity from the new knowledge graph.
Step 2032, determining whether a corresponding edge exists between the first entity and the second entity according to the unique identifier of each third entity and the unique identifier of the second entity.
Step 2033, if yes, performing knowledge completion update processing on the edge between the first entity and the second entity according to the raw data to be processed.
Step 2034, if not, establishing an edge between the first entity and the second entity according to the raw data to be processed, and performing knowledge completion updating processing.
Specifically, on the basis of establishing the entities and ensuring the uniqueness of the entities, the relationships between the entities (represented by edges in the knowledge graph) need to be updated additionally. Whether a certain relation exists between two entities needs to be determined, if so, the side between the two entities needs to be established, and before the side is established, whether a corresponding side exists in the knowledge graph (if the new knowledge graph is obtained, the new knowledge graph is referred to) needs to be judged. If the corresponding edge does not exist, the edge between the two entities needs to be established. Taking the first entity and the second entity as an example, the third entities to which the edges connected with the first entity point can be specifically queried from the new knowledge graph, and whether the corresponding edges exist between the first entity and the second entity is determined according to the unique identifiers of the third entities and the unique identifier of the second entity.
For example, taking a phone _ person as an example for creating an edge between a mobile phone number entity and a person entity, the uniqueness processing and the knowledge completion updating processing have been performed on the mobile phone number entity, and when the edge is created, the uniqueness processing and the knowledge completion updating processing need to be performed on the mobile phone number entity by using two entities existing in a knowledge graph. Specifically, firstly, whether a human entity exists in a knowledge graph or not needs to be judged, and the unique attribute identity card number of the human entity cannot be null; judging whether the edge phone _ person pointing to the person entity (namely the first entity) exists in the knowledge graph or not, if not, directly establishing the edge phone _ person between the person entity and the mobile phone number entity, if so, judging whether the mobile phone number entity (namely the third entity) pointing to the person entity by the edge phone _ person exists in the knowledge graph or not, if so, finding out all mobile phone number entities (namely the third entities) meeting the conditions in the knowledge graph in a form of a list, traversing each third entity, judging whether the unique identifier of each third entity is the same as that of the second entity, if one third entity has the unique identifier of the third entity which is the same as that of the second entity, determining that the third entity is the second entity, namely, the edge between the first entity and the second entity already exists. If the unique identifier of each third entity is different from the unique identifier of the second entity, determining that the edge between the first entity and the second entity does not exist in the knowledge graph, and directly establishing the edge between the first entity and the second entity in the knowledge graph to ensure the uniqueness of the edge.
Optionally, the determining whether a corresponding edge exists between the first entity and the second entity according to the unique identifier of each third entity and the unique identifier of the second entity includes:
step 2041, if the unique identifier of the third entity is the same as the unique identifier of the second entity, it is determined that a corresponding edge exists between the first entity and the second entity.
If it is determined that the corresponding edge exists between the first entity and the second entity, the knowledge completion updating process is further required to be performed on the edge between the first entity and the second entity according to the to-be-processed original data, and if it is determined that the corresponding edge does not exist between the first entity and the second entity, the edge between the first entity and the second entity is established, and the knowledge completion process is performed on the established edge.
The knowledge completion processing is specifically the attribute knowledge completion updating processing is performed on the opposite side.
The attribute of the edge refers to specific contents given to the relationship between the two entities, such as the label of the edge (being a colleague, a friend, a classmate, an affiliation, a trip, a brother, and the like), the head and tail of the edge, the time for establishing the edge, and the like, and can be specifically set according to actual requirements.
Illustratively, the mobile phone number entity 1-edge 1- > person entity 1, and the attribute of edge 1 may include: the side label is belonged (namely the mobile phone number entity 1 belongs to the human entity 1), the establishment time is 20 minutes and 40 seconds at 11 months, 11 days, 12 o' clock in 2019, the head is the mobile phone entity 1, and the tail is the human entity 1. The method is only exemplary and not limited to the present application, and may be specifically set according to actual requirements, and is not limited to the above.
Optionally, similar to the entity, an index of the edge may also be established, when the edge is searched, the query speed may be increased, and the processing efficiency may be increased, and the specific index may also declare the index according to the attribute of the edge, which is not described herein again.
As another implementable manner, on the basis of the foregoing embodiment, optionally traversing the raw data to be processed in rows, and determining, for each row of data in the raw data, an item in which an entity needs to be established according to a preset entity rule, includes:
step 2051, traverse the raw data to be processed line by line and remove the unnecessary header content.
And step 2052, determining the items of the required entities according to preset entity rules for each row of data required by the data.
Specifically, when the original table data is converted into the original data to be processed, the header in the original table is also converted into the original data to be processed, and the header data is unnecessary non-data and needs to be removed.
Illustratively, a txt document obtained through data preprocessing is read, taking case text case. Txt is traversed line by line in case text, and related data in the text is taken by using & for separators, and then the contents of non-data Chinese and English headers in the text are removed as follows:
new File ("data/public _ search _ KG/case. txt"). eachLine {// a line-by-line traversal operation
Split ("&"); // p [ i ] denotes the ith column data
if (p [0] ═ case _ type ") | (p [0] ═ case type")) {// remove the contents of the first two lines of the document
}
As another implementable manner, on the basis of the foregoing embodiment, optionally, the acquiring raw data to be processed includes:
at step 1011, raw form data is obtained.
Step 1012, preprocessing the original form data to obtain text data in a preset format.
And step 1013, using the text data in the preset format as the original data to be processed.
Optionally, preprocessing the original form data to obtain text data in a preset format, including:
step 2061, for the part of the original form data with empty field content, filling processing is performed by using a preset special word, and the processed form data is obtained.
Step 2062, adopt the separator & convert the form data to the text data of the preset format.
Specifically, the content of many fields in the original form data is usually empty, and errors are frequently reported in the process of creating the knowledge graph by using the JanusGraph database, so that the field content needs to be processed to be empty, for example, the field content can be filled with "none" or "empty" words. When the multi-source heterogeneous original form data is used for constructing the knowledge graph, format conversion needs to be carried out on the multi-source heterogeneous original form data to obtain text data in a uniform format. For example, to process the contents of each column of fields in the table, the tab is replaced with a separator. Since the original table data contains many symbols, the choice of separator is also important. To avoid collision with symbols in the original table data, a & symbol is selected as a delimiter. The effects achieved are as follows:
robbery 123***789 Zhang 152****5741 Zhejiang river Ma x
Converting the original table data into text txt content as follows:
robbery &123 × 789& 152 × 5741& zhejiang × and none
As another practicable manner, on the basis of the foregoing embodiment, optionally, the method may further include:
and 2071, classifying and integrating the knowledge graph according to preset classification rules to obtain the classified knowledge graph.
Here, the knowledge graph may be a current knowledge graph or a new knowledge graph, or may be a knowledge graph at any time or at any stage, and the knowledge graph is not particularly limited as long as it is a knowledge graph.
Illustratively, taking the public security knowledge graph as an example, the constructed public security knowledge graph entities have no hierarchy, and the whole knowledge graph framework cannot be intuitively grasped, so that the public security knowledge graph can be integrated by a division layer structure such as people, places, things, objects, organizations, identities and the like, for example, by dividing a 6-layer structure for integration. Entities of the same type may be aggregated. Taking car entities as an example, entity 1 (license number: lu a1 x 654), entity 2 (license number: lu B1 x 674), entity 3 (license number: lu a1 x 554), and so on, can bring all entities together through label or other special attributes, create a large class of entities, and connect together through relationship belonging 2.
As shown in fig. 4, an exemplary diagram of the knowledge-graph classification integration provided for the present embodiment.
As another practicable manner, on the basis of the foregoing embodiment, optionally, the method may further include:
and 2081, performing visual display processing on the knowledge graph.
Specifically, the knowledge graph can be displayed through visual display processing of the knowledge graph, so that related personnel can know the relationship among entities, such as a public security knowledge graph, and the public security knowledge graph mainly comprises entities such as cases, people, vehicles, mobile phones and hotels and the like and relationships (sides) such as residence, travel, colleagues and post-residents, so that public security personnel can clearly see the relationships among people, cases, places and the like. As shown in fig. 5, a schematic view of a visualization effect of the public security knowledge graph provided in this embodiment is provided.
As another practicable manner, on the basis of the foregoing embodiment, optionally, the method may further include:
step 2091, mining the implicit relationship between the entities according to the current knowledge graph and the preset mining rule.
And 2092, performing completion updating processing on the current knowledge graph according to the mined implicit relationship among the entities.
Specifically, a hidden relationship may exist between two entities in the knowledge graph, and cannot be directly found, so that mining rules may be preset, a hidden relationship between the entities is mined based on the current knowledge graph, and the current knowledge graph is subjected to completion update processing according to the mined hidden relationship between the entities.
Illustratively, the case 2 relates to a suspect 4, the suspect 4 lives in the hotel 1, and the resident 1 lives in the hotel 1 at the same time, so that the possibility that the resident 1 is related to the case can be inferred to be high; case 1 is not connected with the surface of case 2, but a suspect 2 is involved in case 1, a vehicle 4 is involved in case 2, and the suspect 2 belongs to vehicle 4, so that the fact that the two cases are related is shown; friends 4 of victims 1 of the case 1, belongers 4 of vehicles 1 to which the case 1 relates, illustrative persons 4 related to the case 1, and so on.
As an exemplary implementation manner, optionally, as shown in fig. 6A, an entity portion of the overall architecture diagram constructed for the public security knowledge graph provided in this embodiment is shown, and as shown in fig. 6B, an edge portion of the overall architecture diagram constructed for the public security knowledge graph provided in this embodiment is shown. FIG. 6B has a continuous relationship with FIG. 6A, and the combination of the two forms the overall architecture of the public security knowledge graph construction. The method mainly comprises the steps of preprocessing original data, preprocessing a public security knowledge graph, building entity labels, edge labels, entity attributes, edge attributes and indexes, building entities, completing and updating entity attributes, building edges between the entities, completing and updating edge attributes and the like. The constructed public security knowledge graph can ensure the uniqueness of the entity and the uniqueness of the edge, and can perform the completion of knowledge updating on the existing knowledge graph. The specific steps of the public security knowledge graph construction are as follows:
1. preprocessing of raw data
The raw data (i.e., raw table data) is shown in tables 1 and 2 above. It can be seen from the police data that the contents of many fields in the table are empty, and frequent errors will be reported in the process of creating the knowledge graph by using the janus graph database, so that the fields with empty contents need to be processed, and can be replaced by "no" words. The contents of each column of fields in the table are processed, what separator is adopted is determined according to specific application scenarios, for example, commas do not exist in the table, csv files are directly processed, but public security data contains a plurality of symbols, and after deep research and analysis, the & symbols are determined to be adopted to make the separator, and finally, the following effects are achieved:
robbery 123***789 Zhang 152****5741 Zhejiang river Ma x
The content of the table converted into the text txt is as follows:
robbery &123 × 789& 152 × 5741& zhejiang × and none
2. Knowledge graph data modeling predefinition
Each janussgraph has a schema (which may be referred to as a graph model) consisting of edge labels, property keys, and vertex. The schema of janus graph may be either explicitly or implicitly defined.
2.1, define entity label
The entity tag is used for determining the type of the entity, for example, the entity tag such as Zhang III, Li Si and the like is a person, that is, a human entity; entity labels such as LuA 1 x 1, Lu 1 x 1 are cars, i.e. car entities; physical tags such as signal, force, etc. are units, etc. The method can be specifically set according to actual requirements.
To create an entity label, a graph or management transaction opened by makeVertexLabel (string). make () is called, and the name of the vertex label (i.e., the entity label) is provided as a parameter, which must be unique in the graph.
2.2, define edge label (i.e. edge label)
The edge tag is used to determine the type of edge, such as classmates, colleagues, places of residence, brother, and so on.
To define an edge label, a makeedgelabel (string) is called on an open graph or management transaction, and the name of the edge label is provided as a parameter. The edge tag name must be unique in the graph. The method returns a defined edge label diversity builder. The diversity of edge labels defines the diversity constraint for the edge with the label, i.e., the maximum number of edges between a pair of vertices (i.e., entities). Janus graph supports the following diversity:
MULTI: the vertices of any pair allow multiple edges to be labeled the same.
SIMPLE: only one edge of such a label is allowed between any pair of vertices.
MANY2ONE allows at most ONE output edge of this label at any vertex in the graph, but does not limit the number of input edges. ONE2 MANY: at most one input edge of this label is allowed at any vertex in the graph, but the output edge is not limited.
ONE 2ONE: at most one input edge and one output edge of this label are allowed at any vertex of the graph.
2.3 definition of entity and edge attributes
Cardinality (cardinality) is used to define the number of cardinalities allowed by the value associated with the key of any given vertex.
Setting of base number:
SINGLE-for this bond, at most one value per element is allowed.
LIST-any number of values per element is allowed for such keys.
SET allows multiple but non-repeating values for such keys.
2.4 creation of index
The JanusGraph supports indexing to improve query processing speed, most graph queries are traversed from lists of vertices or edges identified by their attributes, and indexing makes global retrieval in large graphs very efficient. If there is no index, JanusGraph will do a full graph scan to find the desired vertex. This of course also returns the correct results, but the full map scan is very inefficient and reduces system performance throughout the production environment.
The index is declared by an entity attribute and an edge attribute.
3. Public security knowledge graph modeling
3.1, 3.1 modeling preprocessing
Txt document obtained through data preprocessing is read, and case text case is taken as an example. Txt is traversed line by line in case text, and related data in the text is taken by using a separator, and then the head content of Chinese and English tables which are not data in the text is removed.
Figure RE-GDA0002391679140000141
3.2 entity uniqueness and knowledge completion and update
Judging whether the knowledge map library exists or not by using the entity unique identifier, if so, performing attribute completion and attribute updating according to requirements, and reserving original and current attributes; if not, the entity is newly built.
Taking the construction of a mobile phone number entity as an example to complete the processing of entity uniqueness, the completion of entity knowledge and the updating of entity knowledge:
first, the content of the mobile phone number cannot be null, because the only attribute of the mobile phone number entity is the mobile phone number, i.e., if (p [6] | none ") { }; and then, using a gremlin query language to query whether the entity exists in a map database by using the unique attribute value of the indexed mobile phone entity:
Figure RE-GDA0002391679140000142
if the mobile phone number entity is not found, the mobile phone number entity with the attribute of phone _ number ═ p [6] does not exist in the map database, and then a new mobile phone number entity is created:
v11=graph.addVertex(label,'phone',’vertex_label’,’phone’, 'vertex_date',t1,'phone_number',p[6]);
it should be noted that the entity uniqueness processing method is basically similar, and special processing is required according to actual requirements in special cases, such as building a human entity, and although the identification number is the unique identifier of the human entity, in consideration of the knowledge fusion in the following, the identification number does not exist but the name also requires creating a new entity, and the only difference is that the entity only needs creating a new entity without checking and duplicating.
3.3 edge uniqueness and knowledge completion and update
For example, an edge phone _ person is created between a mobile phone number entity and a person entity to complete the processing of edge uniqueness between the two entities and the completion and updating of knowledge of the edge. The uniqueness processing, the knowledge completion and the knowledge updating have been carried out on the mobile phone number entity, and the premise of carrying out the uniqueness processing is that the two entities exist on a graph database (namely a database stored by a knowledge graph), and only the condition that the person identity card number is not empty and the person entity exists in the graph database (namely the graph database) and the knowledge completion and the knowledge updating have been carried out on the attributes of the person entity is considered. But the same is true for the following:
Figure RE-GDA0002391679140000151
firstly, judging whether an edge phone _ person pointing to a human entity exists, if not, establishing the edge phone _ person directly between the human entity and a mobile phone number entity, if so, judging whether the mobile phone number entity pointing to the human entity by the edge phone _ person exists, and if so, searching all mobile phone number entities meeting the conditions in a list form.
Figure RE-GDA0002391679140000152
Secondly, traversing all mobile phone number entities meeting the conditions, if the unique identification id of the person entity-edge phone _ person-mobile phone number entity is equal to the unique identification id of the mobile phone number entity with the attribute of phone _ number ═ p [6] in the atlas database, indicating that the edge phone _ person exists between the person entity and the mobile phone number entity, and finding out all edges meeting the conditions in the atlas database in a list form. Traversing all mobile phone number entities in the map library, if the mobile phone number entities are not equal, indicating that no edge meeting the condition exists, and reestablishing an edge phone _ person between the two entities. The method comprises the following specific steps:
Figure RE-GDA0002391679140000153
Figure RE-GDA0002391679140000161
and finally, by traversing all edges meeting the conditions, if the unique identifier id of the person entity inquired by the edge phone _ person is equal to the unique identifier id of the current person entity, performing knowledge completion and knowledge updating on the edge attribute.
Figure RE-GDA0002391679140000162
3.4 integration of maps with 6-layer structure of people, places, things, etc
The constructed public security knowledge graph entities have no hierarchy, and the whole knowledge graph framework can not be intuitively constructed, so that the entities of the same type need to be gathered. Taking the car entity as an example, entity 1 (license number: lu a1 x 654), entity 2 (license number: lu B1 x 674), entity 3 (license number: lu a1 x 554), etc., can group all entities together through label or other special attributes to create a large class of entities, which are connected together through the relationship belonging 2. The following were used:
Figure RE-GDA0002391679140000163
if the existing entity is newly established, the following steps are carried out: addvertex (label, 'car', 'vertex _ date', t 1);
similarly, the vehicle entity belongs to an object in a 6-layer structure, and the new physical entity connects the vehicle entities together through the side beloning 2.
Figure RE-GDA0002391679140000171
No new entity exists:
addvertex (label, 'Material', 'NAME', 'substance', 'vertex _ date', t 1);
4. graph-based inference query
4.1 visualization of public security knowledge map
The public security knowledge graph mainly comprises cases, people, vehicles, mobile phones, hotels and other entities and relations of living, traveling, colleagues, passing households and the like. The public security knowledge graph can be visually displayed, so that related personnel can know all the associated conditions of the case more conveniently and clearly. The specific visualization effect illustration is shown above and will not be described in detail here.
4.2 knowledge inference technique
The implicit relationship existing between the two entities may not be directly discovered from the knowledge graph, so that the hidden relationship between the entities can be mined through a knowledge reasoning technology, and the relation between the entities is further clarified. For example, the case 2 relates to the suspect 4, the suspect 4 lives in the hotel 1, and the resident 1 lives in the hotel 1 at the same time, so that the possibility that the resident 1 is related to the case can be inferred to be high; case 1 is not connected with the surface of case 2, but a suspect 2 is involved in case 1, a vehicle 4 is involved in case 2, and the suspect 2 belongs to vehicle 4, so that the fact that the two cases are related is shown; the friend 4 of the victim 1 of the case 1, the person 4 of the vehicle 1 to which the case 1 relates, and the illustrative person 4 are related to the case 1.
4.3 atlas function analysis
Can help the public security organ to draw the relation net of people, car, the three dimension of case and construct the knowledge map for the criminal investigation personnel can master the dominant and recessive relation between the suspect fast, masters the thread of solving a case rapidly, can help public security criminal investigation and other police to do a job twice with half the effort when solving a case. The system can help public security criminal investigation personnel to realize live analysis, peer analysis, call bill relation analysis, logistics relation analysis, case association analysis, vehicle relation analysis, multidimensional relation analysis and the like.
It should be noted that the respective implementable modes in the present embodiment may be implemented individually, or may be implemented in combination in any combination without conflict, and the present application is not limited thereto.
In the data processing method of the knowledge graph provided by this embodiment, original data to be processed (i.e., newly added original data) is obtained, and whether the original data to be processed has a corresponding entity in the current knowledge graph is determined according to the original data to be processed and the current knowledge graph, and if yes, the corresponding entity in the current knowledge graph is subjected to completion-of-knowledge updating according to the original data to be processed, so that a new knowledge graph is obtained; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph. The dynamic completion and updating of the knowledge graph are realized, the availability and the effectiveness of the original data are improved, the work efficiency is effectively improved, and the time and the labor cost are reduced. The relationship among the entities can be supplemented, so that the knowledge graph is more perfect. Indexes can be established, and query efficiency is improved. The original form data can be preprocessed, the multisource heterogeneous data is used for constructing the knowledge graph, and the availability and the effectiveness of the data are improved. The knowledge graph can be classified and integrated, and the hierarchy of each entity in the knowledge graph is effectively reflected. The knowledge graph can be visually displayed, so that related personnel can know the relevance among the entities more conveniently, and the work efficiency is improved. In addition, the implicit relationship among the entities can be mined according to the current knowledge graph, and related personnel are further helped to find out the relevance among the entities.
Yet another embodiment of the present application provides a server for executing the method of the above embodiment.
As shown in fig. 7, a schematic structural diagram of the server provided in this embodiment is shown. The server 30 comprises an acquisition module 31, a determination module 32 and a processing module 33.
The acquisition module is used for acquiring original data to be processed; the determining module is used for determining whether the original data to be processed has a corresponding entity in the current knowledge map according to the original data to be processed and the current knowledge map; a processing module to: if so, performing knowledge completion updating processing on the corresponding entity in the current knowledge graph according to the original data to be processed to obtain a new knowledge graph; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph.
The specific manner in which each module performs operations has been described in detail in the embodiment of the method with respect to the server in the present embodiment, and will not be elaborated here.
According to the server provided by the embodiment, whether the original data to be processed has a corresponding entity in the current knowledge map is determined by acquiring the original data to be processed (namely, newly added original data) and according to the original data to be processed and the current knowledge map, if so, the corresponding entity in the current knowledge map is subjected to complementary knowledge updating processing according to the original data to be processed, and a new knowledge map is obtained; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph. The dynamic completion and updating of the knowledge graph are realized, the availability and the effectiveness of the original data are improved, the work efficiency is effectively improved, and the time and the labor cost are reduced.
The present application further provides a supplementary description of the server provided in the foregoing embodiment.
As an implementable manner, on the basis of the foregoing embodiment, optionally, the determining module is specifically configured to:
traversing the original data to be processed according to rows, and judging items of the data in each row of the original data to be processed, wherein the items need to establish entities according to a preset entity rule;
for each item that requires an entity to be established, it is determined whether the item has a corresponding entity in the current knowledge-graph.
Optionally, the determining module is specifically configured to:
acquiring a unique attribute corresponding to the item;
inquiring whether an entity which has the same attribute as the unique attribute and belongs to the same type as the item exists in the current knowledge graph or not according to the unique attribute corresponding to the item;
if yes, indicating that the item has a corresponding entity in the current knowledge graph;
optionally, the obtaining module is further configured to obtain index information corresponding to the unique attribute corresponding to the item;
a determination module specifically configured to:
and inquiring whether entities which have the same attribute as the unique attribute and belong to the same type as the item exist in the current knowledge graph or not according to the unique attribute and the index information corresponding to the item.
Optionally, the processing module is further configured to:
if the preset relationship between the first entity and the second entity is determined according to the original data to be processed, inquiring a third entity pointed by each edge connected with the first entity from the new knowledge graph;
judging whether a corresponding edge exists between the first entity and the second entity according to the unique identification of each third entity and the unique identification of the second entity;
if the data exists, performing knowledge completion updating processing on the edge between the first entity and the second entity according to the original data to be processed;
if not, establishing an edge between the first entity and the second entity according to the original data to be processed, and performing knowledge completion processing.
Optionally, the processing module is specifically configured to:
and if the unique identifier of the third entity is the same as the unique identifier of the second entity, determining that a corresponding edge exists between the first entity and the second entity.
Optionally, the determining module is specifically configured to:
traversing the original data to be processed according to rows, and removing unnecessary header contents;
and for each row of data required in the data, determining items in which entities need to be established according to preset entity rules.
As another implementable manner, on the basis of the foregoing embodiment, optionally, the obtaining module is specifically configured to:
acquiring original form data;
preprocessing original table data to obtain text data in a preset format;
and taking the text data in a preset format as original data to be processed.
Optionally, the obtaining module is specifically configured to perform preprocessing on the raw form data to obtain text data in a preset format, and includes:
filling a part with empty field content in original form data by adopting a preset special word to obtain processed form data;
and adopting separators & converting the table data into text data in a preset format.
As another implementable manner, on the basis of the foregoing embodiment, optionally, the processing module is further configured to:
and classifying and integrating the knowledge graph according to a preset classification rule to obtain the classified knowledge graph.
As another implementable manner, on the basis of the foregoing embodiment, optionally, the processing module is further configured to:
and carrying out visual display processing on the knowledge graph.
As another implementable manner, on the basis of the foregoing embodiment, optionally, the processing module is further configured to:
mining implicit relations among the entities according to the current knowledge graph and preset mining rules;
and performing completion updating processing on the current knowledge graph according to the mined implicit relationship among the entities.
The specific manner in which each module performs operations has been described in detail in the embodiment of the method with respect to the server in the present embodiment, and will not be elaborated here.
It should be noted that the respective implementable modes in the present embodiment may be implemented individually, or may be implemented in combination in any combination without conflict, and the present application is not limited thereto.
According to the server of the embodiment, by acquiring to-be-processed original data (namely, newly-added original data), whether the to-be-processed original data has a corresponding entity in the current knowledge map is determined according to the to-be-processed original data and the current knowledge map, and if yes, the corresponding entity in the current knowledge map is subjected to complementary knowledge updating processing according to the to-be-processed original data to obtain a new knowledge map; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph. The dynamic completion and updating of the knowledge graph are realized, the availability and the effectiveness of the original data are improved, the work efficiency is effectively improved, and the time and the labor cost are reduced. The relationship among the entities can be supplemented, so that the knowledge graph is more perfect. Indexes can be established, and query efficiency is improved. The original form data can be preprocessed, the multisource heterogeneous data is used for constructing the knowledge graph, and the availability and the effectiveness of the data are improved. The knowledge graph can be classified and integrated, and the hierarchy of each entity in the knowledge graph is effectively reflected. The knowledge graph can be visually displayed, so that related personnel can know the relevance among the entities more conveniently, and the work efficiency is improved. In addition, the implicit relationship among the entities can be mined according to the current knowledge graph, and related personnel are further helped to find out the relevance among the entities.
Yet another embodiment of the present application provides an electronic device for performing the method provided by the foregoing embodiment. The electronic device may be a server.
As shown in fig. 8, is a schematic structural diagram of the electronic device provided in this embodiment. The electronic device 50 includes: at least one processor 51 and memory 52;
the memory stores computer-executable instructions; the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform a method as provided by any of the embodiments above.
According to the electronic device of the embodiment, whether the original data to be processed has a corresponding entity in the current knowledge map is determined by acquiring the original data to be processed (namely, newly added original data) and according to the original data to be processed and the current knowledge map, if so, the corresponding entity in the current knowledge map is subjected to complementary knowledge updating processing according to the original data to be processed, and a new knowledge map is obtained; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph. The dynamic completion and updating of the knowledge graph are realized, the availability and the effectiveness of the original data are improved, the work efficiency is effectively improved, and the time and the labor cost are reduced.
Yet another embodiment of the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the processor executes the computer-executable instructions, the method provided in any one of the above embodiments is implemented.
According to the computer-readable storage medium of this embodiment, by acquiring original data to be processed (i.e., newly added original data), determining whether the original data to be processed has a corresponding entity in the current knowledge graph according to the original data to be processed and the current knowledge graph, and if so, performing completion of knowledge update processing on the corresponding entity in the current knowledge graph according to the original data to be processed to obtain a new knowledge graph; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph. The dynamic completion and updating of the knowledge graph are realized, the availability and the effectiveness of the original data are improved, the work efficiency is effectively improved, and the time and the labor cost are reduced.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus (e.g., server) and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (13)

1. A data processing method of a knowledge graph is characterized by comprising the following steps:
acquiring original data to be processed;
determining whether the original data to be processed has a corresponding entity in the current knowledge graph or not according to the original data to be processed and the current knowledge graph;
if so, performing knowledge completion updating processing on the corresponding entity in the current knowledge graph according to the original data to be processed to obtain a new knowledge graph;
if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph.
2. The method of claim 1, wherein determining whether the raw data to be processed has a corresponding entity in the current knowledge-graph according to the raw data to be processed and the current knowledge-graph comprises:
traversing the original data to be processed according to rows, and judging items of entities to be established according to preset entity rules for each row of data;
for each item requiring an entity to be established, determining whether the item has a corresponding entity in the current knowledge-graph.
3. The method of claim 2, wherein determining, for each item for which an entity needs to be established, whether the item has a corresponding entity in the current knowledge-graph comprises:
acquiring a unique attribute corresponding to the item;
inquiring whether an entity which has the same attribute as the unique attribute and belongs to the same type as the item exists in the current knowledge graph or not according to the unique attribute corresponding to the item;
if yes, indicating that the item has a corresponding entity in the current knowledge graph;
and if not, indicating that the item does not have a corresponding entity in the current knowledge-graph.
4. The method of claim 3, further comprising:
acquiring index information corresponding to the unique attribute corresponding to the item;
the querying whether an entity which has the same attribute as the unique attribute and belongs to the same type as the item exists in the current knowledge graph according to the unique attribute corresponding to the item includes:
and inquiring whether an entity which has the same attribute as the unique attribute and belongs to the same type as the item exists in the current knowledge graph or not according to the unique attribute corresponding to the item and the index information.
5. The method of claim 2, further comprising:
if the fact that the first entity and the second entity have the preset relation is determined according to the original data to be processed, inquiring a third entity pointed by each edge connected with the first entity from the new knowledge graph;
judging whether a corresponding edge exists between the first entity and the second entity according to the unique identification of each third entity and the unique identification of the second entity;
if the data exists, performing knowledge completion updating processing on the edge between the first entity and the second entity according to the original data to be processed;
if not, establishing an edge between the first entity and the second entity according to the original data to be processed, and performing knowledge completion processing.
6. The method of claim 5, wherein the determining whether the corresponding edge exists between the first entity and the second entity according to the unique identifier of each third entity and the unique identifier of the second entity comprises:
and if the unique identifier of the third entity is the same as the unique identifier of the second entity, determining that a corresponding edge exists between the first entity and the second entity.
7. The method according to claim 2, wherein traversing the raw data to be processed row by row, and for each row of data, determining an item in which an entity needs to be established according to a preset entity rule comprises:
traversing the original data to be processed according to rows, and removing unnecessary header contents;
and for each row of data required in the data, determining items in which entities need to be established according to preset entity rules.
8. The method according to claim 1, wherein the obtaining raw data to be processed comprises:
acquiring original form data;
preprocessing the original form data to obtain text data in a preset format;
and taking the text data in the preset format as the original data to be processed.
9. The method according to claim 8, wherein the preprocessing the raw form data to obtain text data in a predetermined format comprises:
filling the part with empty field content in the original form data by adopting a preset special word to obtain processed form data;
and adopting separators & converting the table data into the text data in the preset format.
10. The method of claim 1, further comprising:
and classifying and integrating the knowledge graph according to a preset classification rule to obtain the classified knowledge graph.
11. The method of claim 1, further comprising:
and carrying out visual display processing on the knowledge graph.
12. The method according to any one of claims 1-11, further comprising:
mining implicit relations among the entities according to the current knowledge graph and preset mining rules;
and performing completion updating processing on the current knowledge graph according to the mined implicit relationship among the entities.
13. A server, comprising:
the acquisition module is used for acquiring original data to be processed;
the determining module is used for determining whether the original data to be processed has a corresponding entity in the current knowledge map according to the original data to be processed and the current knowledge map;
a processing module to:
if so, performing knowledge completion updating processing on the corresponding entity in the current knowledge graph according to the original data to be processed to obtain a new knowledge graph; if not, establishing a corresponding entity in the current knowledge graph according to the original data to be processed, and performing knowledge completion processing on the established entity according to the original data to be processed to obtain a new knowledge graph.
CN201911155243.1A 2019-11-22 2019-11-22 Data processing method and server of knowledge graph Pending CN111026874A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911155243.1A CN111026874A (en) 2019-11-22 2019-11-22 Data processing method and server of knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911155243.1A CN111026874A (en) 2019-11-22 2019-11-22 Data processing method and server of knowledge graph

Publications (1)

Publication Number Publication Date
CN111026874A true CN111026874A (en) 2020-04-17

Family

ID=70206888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911155243.1A Pending CN111026874A (en) 2019-11-22 2019-11-22 Data processing method and server of knowledge graph

Country Status (1)

Country Link
CN (1) CN111026874A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111710771A (en) * 2020-05-13 2020-09-25 浙江云科智造科技有限公司 Rubber powder ratio recommendation method for LED product
CN111984643A (en) * 2020-06-29 2020-11-24 联想(北京)有限公司 Knowledge graph construction method and device, knowledge graph system and equipment
CN112015916A (en) * 2020-09-01 2020-12-01 中国银行股份有限公司 Completion method and device of knowledge graph, server and computer storage medium
CN112579797A (en) * 2021-02-20 2021-03-30 支付宝(杭州)信息技术有限公司 Service processing method and device for knowledge graph
CN113157944A (en) * 2021-04-30 2021-07-23 携程旅游网络技术(上海)有限公司 Interaction-based knowledge graph expanding method, system, equipment and storage medium
CN113254665A (en) * 2021-06-01 2021-08-13 北京爱奇艺科技有限公司 Knowledge graph expansion method and device, electronic equipment and storage medium
CN113488034A (en) * 2020-04-27 2021-10-08 海信集团有限公司 Voice information processing method, device, equipment and medium
CN114328977A (en) * 2022-03-09 2022-04-12 北京有生博大软件股份有限公司 Personnel migration map spectrum construction method based on map database
CN114491078A (en) * 2022-02-16 2022-05-13 松立控股集团股份有限公司 Community project personnel foothold and peer personnel analysis method based on knowledge graph
US20220335086A1 (en) * 2021-04-15 2022-10-20 Vesoft Inc. Full-text indexing method and system based on graph database
WO2023065545A1 (en) * 2021-10-19 2023-04-27 平安科技(深圳)有限公司 Risk prediction method and apparatus, and device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330125A (en) * 2017-07-20 2017-11-07 云南电网有限责任公司电力科学研究院 The unstructured distribution data integrated approach of magnanimity of knowledge based graphical spectrum technology
CN107908637A (en) * 2017-09-26 2018-04-13 北京百度网讯科技有限公司 The entity update method and system in a kind of knowledge based storehouse
WO2018072563A1 (en) * 2016-10-18 2018-04-26 中兴通讯股份有限公司 Knowledge graph creation method, device, and system
CN109739939A (en) * 2018-12-29 2019-05-10 颖投信息科技(上海)有限公司 The data fusion method and device of knowledge mapping
CN109857917A (en) * 2018-12-21 2019-06-07 中国科学院信息工程研究所 Towards the security knowledge map construction method and system for threatening information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018072563A1 (en) * 2016-10-18 2018-04-26 中兴通讯股份有限公司 Knowledge graph creation method, device, and system
CN107330125A (en) * 2017-07-20 2017-11-07 云南电网有限责任公司电力科学研究院 The unstructured distribution data integrated approach of magnanimity of knowledge based graphical spectrum technology
CN107908637A (en) * 2017-09-26 2018-04-13 北京百度网讯科技有限公司 The entity update method and system in a kind of knowledge based storehouse
CN109857917A (en) * 2018-12-21 2019-06-07 中国科学院信息工程研究所 Towards the security knowledge map construction method and system for threatening information
CN109739939A (en) * 2018-12-29 2019-05-10 颖投信息科技(上海)有限公司 The data fusion method and device of knowledge mapping

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
夏丹等: "《中国图书情报知识图谱研究》", 31 August 2018, pages: 89 - 97 *
李继光,杨迪著: "《大数据背景下数据挖掘及处理分析》", 31 January 2019, pages: 186 - 187 *
陆国栋等: "图学应用教程", 高等教育出版社, pages: 48 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113488034A (en) * 2020-04-27 2021-10-08 海信集团有限公司 Voice information processing method, device, equipment and medium
CN111710771A (en) * 2020-05-13 2020-09-25 浙江云科智造科技有限公司 Rubber powder ratio recommendation method for LED product
CN111984643A (en) * 2020-06-29 2020-11-24 联想(北京)有限公司 Knowledge graph construction method and device, knowledge graph system and equipment
CN112015916A (en) * 2020-09-01 2020-12-01 中国银行股份有限公司 Completion method and device of knowledge graph, server and computer storage medium
CN112015916B (en) * 2020-09-01 2023-07-21 中国银行股份有限公司 Knowledge graph completion method, knowledge graph completion device, server and computer storage medium
CN112579797A (en) * 2021-02-20 2021-03-30 支付宝(杭州)信息技术有限公司 Service processing method and device for knowledge graph
CN112579797B (en) * 2021-02-20 2021-05-18 支付宝(杭州)信息技术有限公司 Service processing method and device for knowledge graph
US20220335086A1 (en) * 2021-04-15 2022-10-20 Vesoft Inc. Full-text indexing method and system based on graph database
CN113157944A (en) * 2021-04-30 2021-07-23 携程旅游网络技术(上海)有限公司 Interaction-based knowledge graph expanding method, system, equipment and storage medium
CN113254665A (en) * 2021-06-01 2021-08-13 北京爱奇艺科技有限公司 Knowledge graph expansion method and device, electronic equipment and storage medium
WO2023065545A1 (en) * 2021-10-19 2023-04-27 平安科技(深圳)有限公司 Risk prediction method and apparatus, and device and storage medium
CN114491078B (en) * 2022-02-16 2022-08-02 松立控股集团股份有限公司 Community project personnel foothold and peer personnel analysis method based on knowledge graph
CN114491078A (en) * 2022-02-16 2022-05-13 松立控股集团股份有限公司 Community project personnel foothold and peer personnel analysis method based on knowledge graph
CN114328977A (en) * 2022-03-09 2022-04-12 北京有生博大软件股份有限公司 Personnel migration map spectrum construction method based on map database

Similar Documents

Publication Publication Date Title
CN111026874A (en) Data processing method and server of knowledge graph
US7606817B2 (en) Primenet data management system
CN107391677B (en) Method and device for generating Chinese general knowledge graph with entity relation attributes
US11449477B2 (en) Systems and methods for context-independent database search paths
CN110928963B (en) Column-level authority knowledge graph construction method for operation and maintenance service data table
Frischmuth et al. Linked data in enterprise information integration
Wu Enterprise integration in e-government
Xu et al. Application of rough concept lattice model in construction of ontology and semantic annotation in semantic web of things
US20140280194A1 (en) Method and system for generating and using a master entity associative data network
CN110019554B (en) Data model, data modeling system and method for data driven applications
Liu et al. A general multi-source data fusion framework
Gunaratna et al. Alignment and dataset identification of linked data in semantic web
CN117573880A (en) Rolling process data element model and data space construction method and system
CN115168474B (en) Internet of things central station system building method based on big data model
Alwahaishi et al. Analysis of the dblp publication classification using concept lattices
Ivánová et al. Provenance in the next-generation spatial knowledge infrastructure
CN112685574A (en) Method and device for determining hierarchical relationship of domain terms
Zhang et al. Storing fuzzy description logic ontology knowledge bases in fuzzy relational databases
US20230385337A1 (en) Systems and methods for metadata based path finding
Qu et al. Research on Cross-Domain Data Integration Architecture Based on Data Fabric
Olaru Partial multi-dimensional schema merging in heterogeneous data warehouses
Laadidi et al. Simplification of owl ontology sources for data warehousing
US10929396B1 (en) Multi-type attribute index for a document database
Mattam et al. A Framework for Knowledgebase Curation using Cognitive Web Architecture
Kaabi et al. A new approach to discover the complex mappings between ontologies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200417