CN112363996A - Method, system, and medium for building a physical model of a power grid knowledge graph - Google Patents
Method, system, and medium for building a physical model of a power grid knowledge graph Download PDFInfo
- Publication number
- CN112363996A CN112363996A CN202011197189.XA CN202011197189A CN112363996A CN 112363996 A CN112363996 A CN 112363996A CN 202011197189 A CN202011197189 A CN 202011197189A CN 112363996 A CN112363996 A CN 112363996A
- Authority
- CN
- China
- Prior art keywords
- source
- objects
- field
- relationship
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 17
- 238000012358 sourcing Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 8
- 238000013500 data storage Methods 0.000 claims description 8
- 238000013499 data model Methods 0.000 abstract description 11
- 238000000605 extraction Methods 0.000 abstract description 8
- 238000013461 design Methods 0.000 abstract description 5
- 230000007812 deficiency Effects 0.000 abstract 1
- 238000010276 construction Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000002776 aggregation Effects 0.000 description 4
- 238000004220 aggregation Methods 0.000 description 4
- 239000000470 constituent Substances 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 230000004931 aggregating effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000001502 supplementing effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
- G06F16/212—Schema design and management with details for data modelling support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method, a system and a medium for establishing a physical model for a power grid knowledge graph. The method comprises the following steps: determining a table schema for defining table objects and fields thereof; generating table information of all table objects according to a table mode based on a first data source to generate a physical table set of a physical model; determining a relationship schema for defining a relationship between a source table object and a target table object; for each pair of source table objects and target table objects in the second data source that are subject to de-duplication processing, generating corresponding table relationship information in a relationship mode based on the second data source to generate a set of relationships of the physical model; a physical model including table objects, fields, and relationships is built based on the set of physical tables and the set of relationships. By using the scheme of the invention, knowledge extraction can be carried out on different data sources, leakage and deficiency of the existing model are checked and repaired to make up for the design shortness of the existing model, a more reasonable control model is provided for a user, and information matching of a unified data model is supported.
Description
Technical Field
The present invention relates to knowledge-graph technology, and more particularly, to a method for building a physical model for a power grid knowledge-graph, and a corresponding system and computer-readable storage medium.
Background
With the further development of knowledge graph technology, the knowledge graph lays a foundation for large-scale knowledge base organization and intelligent application by the strong semantic processing capability and knowledge organization capability of the knowledge graph. A knowledge graph is composed of a large number of entities and entity associations. Through the knowledge graph, entities such as landmarks, names of people, cities, sports teams, buildings, geographic features, movies, celestial bodies, artistic works and the like can be retrieved, and information related to the entities is obtained. This is the key to building intelligent applications, which integrate into the collective intelligence of the network and can be more humanlike to understand the world. In a specific application occasion, a domain knowledge map is required to be built based on a specific domain ontology base, and information intelligent retrieval and domain intelligent application construction facing to a specific domain are supported. Knowledge graph construction facing a specific field not only needs general knowledge, but also focuses on combining field professional knowledge. The construction of the domain knowledge graph needs to support the practical engineering application, and compared with the construction of a general knowledge graph, the construction of the domain knowledge graph has higher requirements on the aspects of identification rate, accuracy and other related indexes. In order to satisfy the field-oriented large-scale knowledge base and intelligent application construction, an information extraction technology adapted to field characteristics and a construction method of a field knowledge map need to be researched.
In recent years, a large number of knowledge graphs with Chinese as a main language are provided domestically, and the knowledge graphs are mainly constructed based on the structured information of encyclopedia and Wikipedia and aim to utilize community strength to maintain the Schema standard of the knowledge graph of the open domain. The construction mode of the knowledge graph comprises manual editing and automatic extraction, but the automatic extraction method is mainly based on the structured information in the online encyclopedia and ignores the unstructured text, and most information in the Internet is just presented in an unstructured free text form. In the same period of development of link data, a plurality of knowledge acquisition methods based on information extraction technology are proposed to construct an open domain knowledge graph based on free text. In 2007, Bank o et al, Washington university, first proposed open-Domain information extraction (OIE), and directly extracted entity-relationship triples, namely, three parts, namely, a head entity, a relationship indicator and a tail entity, from a large-scale free text. Before the OIE is proposed, many free text-oriented information abstractions have been proposed, but the main idea of these methods is to train a corresponding extractor for each target relationship. Such a conventional information extraction method cannot work efficiently when facing massive relation types in internet texts, that is, it is unrealistic to train an extractor for each target relation, and what is more serious, in many cases, the type of relation cannot be determined in advance for massive web texts.
In addition, the current knowledge resource classification, intelligent search and cross-domain knowledge fusion and representation based on enterprise-level data models are still in a starting stage, an intuitive and popular model interface facing relevant managers and business personnel is lacked, and the logical link search capability and the static semantic analysis and evaluation capability of the data models are also severely limited. The data model such as the national grid company enterprise public data model (SG-CIM) is a comprehensive abstraction of data in the aspects of company enterprise-level power grid, assets, finance and the like, and is not only huge in quantity, but also extremely numerous in professional categories, so that the following problems still exist in three aspects of model achievement, application and support: (1) the quality of model design still needs to be improved, namely, in the current model design result, the practical problems of inconsistent abstraction degrees of partial data objects, inaccurate entity relationship, incomplete data objects and attributes, incomplete duplicate removal, incomplete data tracing, non-correspondence between standard codes and source end service system codes and the like still exist; (2) the mapping rate of the model is not high, namely, each unit is mapped and compared based on physical models of different versions, so that the average mapping rate is low; (3) the method is lack of tool support, namely the existing data model management and control mostly adopts an offline mode, the process is complex, the communication efficiency is low, the model design result is abstract, all levels of personnel are difficult to understand the model, the application capability is insufficient, and the quality of model application and iteration perfection cannot be guaranteed.
Therefore, there is a need to provide an improved solution to overcome the drawbacks of the existing data models.
Disclosure of Invention
The present invention is directed to a solution to the above technical problem.
Specifically, according to a first aspect of the present invention, there is provided a method for building a physical model for a power grid knowledge-graph, comprising:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information, and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of relationships between source table objects and target table objects, wherein the second data source comprises a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set;
based on the physical table set of the physical model and the relationship set of the physical model, a physical model including table objects, fields, and relationships is established.
In one embodiment, the field is determined according to a predefined field pattern based on the field related information and the field source related information in the first data source, the field pattern including a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system.
In one embodiment, the table schema includes a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data sourcing system, a table name of a data sourcing system, and a field list for a table object.
In one embodiment, the relationship schema includes a table name of the source table object, a table name of the target table object, an association between the source table object and the target table object, an association field between the source table object and the target table object, a subject field, and a secondary subject field.
In one embodiment, generating table relationship information for a pair of source table objects and target table objects in the relationship schema based on the second data source comprises: for a pair of source table objects and target table objects and another pair of source table objects and target table objects in the second data source, if the table names of the respective source table objects, the table names of the target table objects, the association relationship between the source table objects and the target table objects and the association fields between the source table objects and the target table objects are all the same, then the relationship between the source table objects and the target table objects of the pair is judged to be the same as the relationship between the source table objects and the target table objects of the another pair, and for the same relationship, only one relationship is normalized and table relationship information of the corresponding source table objects and target table objects is generated according to the relationship mode.
In one embodiment, a library of table schemas for table objects and their fields is provided from which a table schema for defining table objects and their fields is determined.
In one embodiment, a library of relational schemas is provided that represent relationships between table objects, from which a relational schema is determined that defines relationships between source table objects and target table objects.
In one embodiment, an alias library is provided with table objects, fields thereof and relations among the table objects, wherein the alias library comprises aliases recorded in the past and occurrence frequencies thereof, and records the relations among the table objects, the fields thereof and the table objects appearing in the first data source and the second data source into the alias library and accumulates the occurrence frequencies; the relationship among the displayed table objects, the fields thereof and the table objects is the relationship among the table objects, the fields thereof and the table objects with the largest frequency of occurrence.
According to a second aspect of the present invention, there is provided a system for building a physical model for a power grid knowledge-graph, comprising: a physical table set generating unit, a relation set generating unit and a processing unit,
wherein the physical table set generation unit is configured to:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information, and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
wherein the relationship set generation unit is configured to:
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of relationships between source table objects and target table objects, wherein the second data source comprises a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set;
wherein the processing unit is configured to:
based on the physical table set of the physical model and the relationship set of the physical model, a physical model including table objects, fields, and relationships is established.
In one embodiment, the field is determined according to a predefined field pattern based on the field related information and the field source related information in the first data source, the field pattern including a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system.
In one embodiment, the table schema includes a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data sourcing system, a table name of a data sourcing system, and a field list for a table object.
In one embodiment, the relationship schema includes a table name of the source table object, a table name of the target table object, an association between the source table object and the target table object, an association field between the source table object and the target table object, a subject field, and a secondary subject field.
In one embodiment, generating table relationship information for a pair of source table objects and target table objects in the relationship schema based on the second data source comprises: for a pair of source table objects and target table objects and another pair of source table objects and target table objects in the second data source, if the table names of the respective source table objects, the table names of the target table objects, the association relations between the source table objects and the target table objects, and the association fields between the source table objects and the target table objects are all the same, it is determined that the relations between the pair of source table objects and the target table objects are the same as the relations between the another pair of source table objects and the target table objects, and for the same relations, only one relation is normalized and table relation information of the corresponding source table objects and target table objects is generated according to the relation mode.
In one embodiment, a library of table schemas for table objects and their fields is provided from which a table schema for defining table objects and their fields is determined.
In one embodiment, a library of relational schemas is provided that represent relationships between table objects, from which a relational schema is determined that defines relationships between source table objects and target table objects.
In one embodiment, an alias library is provided with table objects, fields thereof and relations among the table objects, wherein the alias library comprises aliases recorded in the past and occurrence frequencies thereof, and records the relations among the table objects, the fields thereof and the table objects appearing in the first data source and the second data source into the alias library and accumulates the occurrence frequencies; the relationship among the displayed table objects, the fields thereof and the table objects is the relationship among the table objects, the fields thereof and the table objects with the largest frequency of occurrence.
According to a third aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, causes the above-described method for establishing a physical model for a power grid knowledge graph to be performed.
According to the scheme of the invention, the data of the table objects, the fields and the relations are obtained from a plurality of data sources, the data are subjected to standardization processing, and a uniform and complete data model for the power grid knowledge graph is established according to a predefined table mode, a predefined field mode and a predefined relation mode. By utilizing the method and the system, knowledge extraction can be carried out on different data sources, the existing model is subjected to gap and leakage detection and is made up for the existing model design short board, meanwhile, a more reasonable management and control model can be provided for management and business personnel, and information matching and sharing of a company unified data model are supported. In addition, the invention can further promote the model standard to implement and construct a complete system based on the existing data model, lays a solid foundation for further promoting the data quality management, supports the construction of a data middle station and a service middle station, and obtains direct or indirect benefits in practical application.
Drawings
Non-limiting and non-exhaustive embodiments of the present invention are described by way of example with reference to the following drawings, in which:
FIG. 1 is a flow diagram schematically illustrating a method for building a physical model for a power grid knowledge-graph, in accordance with one embodiment of the present invention;
FIG. 2 is a flow diagram that schematically illustrates a set of physical tables that build a physical model, in accordance with an embodiment of the present invention;
FIG. 3 is a flow diagram that schematically illustrates a set of relationships that establish a physical model, in accordance with an embodiment of the present invention; and
FIG. 4 is a schematic diagram illustrating a system for building a physical model for a grid knowledge graph according to one embodiment of the invention.
Detailed Description
In order to make the above and other features and advantages of the present invention more apparent, the present invention is further described below with reference to the accompanying drawings. It is understood that the specific embodiments described herein are for purposes of illustration only and are not intended to be limiting.
As a first aspect of the invention, a method is provided for building a physical model for a power grid knowledge graph. Fig. 1 schematically shows a method S100 for building a physical model for a power grid knowledge graph according to an embodiment of the invention. As shown in fig. 1, S100 may include step S101, step S102, step S103, step S104, step S105, and step S106.
In step S101, a table schema for defining the table object and its fields is determined. A table schema may also be referred to herein as a table definition, which is used to define constituent members of a table object, and may include, for example, various suitable constituent members for distinguishing one table object from other table objects.
In one embodiment, the table schema may include a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data sourcing system, a table name of a data sourcing system, and a field list for a table object. The table name may include at least one of a table english name and a table chinese name of the table object. For example, the table schema may be determined in json format, as follows:
{
'name' [ table name (English), table name (Chinese) ],
'area' a subject field,
a secondary topic area,
type' is a table type of the data,
description of the table description,
'department' responsibility department,
source system name of data source,
a source table [ data source system table name (English), data source system table name (Chinese) ],
the 'more' is a remark that,
'fields' [ field List ]
}
In one embodiment, the list of fields in the table schema is determined according to a predefined field schema based on the field related information and the field source related information in the first data source, and the field schema may include a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system. A field schema may also be referred to herein as a field definition, which is used to define the constituent members of a field. For example, the field pattern may be determined in json format, as follows:
{
'name' [ field (English), field (Chinese) ],
'datatype' field data type,
description of a field description,
a standard code,
'storage format' data storage format,
a hash column,
'department' responsibility department,
'source system' [ data Source System English name, data Source System Chinese name ],
a source table [ data source system table name (English), data source system table name (Chinese) ],
a source field [ data source system field (English), data source system field (Chinese) ],
source data type data source field type,
remarks to' more
}
In step S102, a first data source comprising a plurality of table objects is received, the first data source comprising table object related information, field related information, table object source related information and/or field source related information. The first data source may be broadly understood herein to encompass data sources in a variety of possible forms, including structured, semi-structured, and unstructured forms, such as relational databases, data silos, non-relational databases, document libraries, various types of reports, and the like. Preferably, the first data source of the present invention comprises a data source in the form of an excel document.
In step S103, for each table object, generating corresponding table information according to the table schema based on the first data source, so as to obtain a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, where the table information at least indicates a table name, a field, a table object source, and a field source of the table object. It should be understood that for each table object, the table name may be referred to as a recognition criterion of the table object, for example, a recognition criterion of "table english name + table chinese name" as the table object, a recognition criterion of "table english name" as the table object, or a recognition criterion of "table chinese name" as the table object. The set of physical tables of the physical model may be stored in various suitable file forms, such as a json storage file form, as desired. In one embodiment, the json storage file form of the physical table set of the physical model is as follows:
step S103 is described in detail below with reference to fig. 2.
As shown in fig. 2, the excel document as the first data source includes three parts of "data table information" indicating the table names and relevant information of fields of all table objects, "data table comparison table" indicating the table source relevant information of the source system of the table objects, and "field comparison table" indicating the field source relevant information of the source system of the fields. The source system may, for example, include a data platform of various possible power knowledge aspects. Because the information has the problem of inconsistent capital and small cases of the table names, the capital and small insensitive principle is adopted in the process of establishing the model, and the English names of the table are uniformly subjected to capital treatment. The specific process is as follows: firstly, taking 'table English name + table Chinese name' as a discrimination standard of table objects, and aggregating according to the 'table English name + table Chinese name' to obtain table name information of each table object; secondly, for each table object, carrying out normalization processing on the table name of the table object, for example, removing a space and a line break in the table object; thirdly, standardizing field names, field data types and the like of all fields of the table object, for example, removing spaces, line breaks and the like, sorting according to field definitions to obtain a standardized field, supplementing information of the field based on a field comparison table, for example, acquiring the name of a source system of the field, the table name of the source system, the field type of the source system and the like from the field comparison table, standardizing the information, and integrating the information subjected to the standardized processing into the field information; after all fields of the table object are processed, generating the table of the table object and field information thereof according to the table definition; then, supplementing the table and the field information thereof based on the data table comparison table, for example, acquiring the name of the source system of the table object, the table name of the data source system and the like from the data table comparison table, carrying out standardization processing on the information, and integrating the information subjected to the standardization processing into the information of the table object; finally, all the information of the integrated table object is arranged into corresponding table information, and the table information can be represented by a json string. And repeating the steps until all the table objects are processed, thereby obtaining the physical table set containing all the table objects. The set of physical tables containing all table objects may be exported in a json storage file.
In step S104, a relationship schema for defining the relationship between the source table object and the target table object is determined. A relationship schema may also be referred to herein as a relationship definition for defining constituent members of a relationship between pairs of table objects, such as may represent an association between a source table object and a target table object. The table name of the source table object and the table name of the target table object may include at least one of a corresponding table english name and a table chinese name. For example, the relationship schema may be determined in json format, as follows:
{
'entity1' [ Source Table object Table name (English), Source Table object Table name (Chinese) ],
'entity2' [ target table object table name (English), target table object table name (Chinese) ],
a relationship of 'relationship',
a 'field' association field, an association field,
'area' a subject field,
second subject area
}
In step S105, a second data source including relationship related information of relationships between source table objects and target table objects is received, the second data source including a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, table relationship information of the pair of source table objects and target table objects is generated according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set. The second data source may be broadly understood herein to encompass data sources in a variety of possible forms, including structured, semi-structured, and unstructured forms, such as relational databases, data silos, non-relational databases, document libraries, various types of reports, and the like. Preferably, the second data source of the present invention comprises a data source in the form of an excel file. The first data source and the second data source of the invention can comprise data based on each business system under the power grid and time sequence data acquired on the smart power grid, mainly comprise company marketing data, quantitative acquisition data, operation and inspection data and some graphical image webpage data, and can process, extract knowledge and fuse the data in three different forms of structuring, semi-structuring and unstructured. The set of relationships of the physical model may be stored in various suitable file forms, such as a json storage file form, as desired. In one embodiment, the json storage file form of the set of relationships of the physical model is as follows:
in one embodiment, step S105 may include: for a pair of source table objects and target table objects and another pair of source table objects and target table objects in the second data source, if the table names of the respective source table objects, the table names of the target table objects, the association relationship between the source table objects and the target table objects and the association fields between the source table objects and the target table objects are all the same, then the relationship between the source table objects and the target table objects of the pair is judged to be the same as the relationship between the source table objects and the target table objects of the another pair, and for the same relationship, only one relationship is normalized and table relationship information of the corresponding source table objects and target table objects is generated according to the relationship mode. That is, the table name of the source table object, the table name of the target table object, the association relationship between the source table object and the target table object, and the association field between the source table object and the target table object are taken as identifiers for identifying one relationship.
Step S105 is described in detail below with reference to fig. 3.
As shown in fig. 3, an excel document as a second data source is read and data of "association" therein is acquired. Because the data of the incidence relation has the problem of inconsistent capital and small cases of the table names, the capital and small insensitive principle is adopted in the process of establishing the model, and the English names of the table are uniformly subjected to capital treatment. The specific process is as follows: taking the table name (English) of a source table object, the table name (Chinese) of the source table object, the table name (English) of a target table object, the associated table name (Chinese) of the target table object, corresponding association relations and corresponding association fields as the judgment standard of each relation, aggregating the six items aiming at each relation to obtain aggregation identifiers of a plurality of table relations, and judging the table relations with the same aggregation identifier as the same aggregation group; for the table relationships of the same aggregation group, only the first piece of data (so that the table relationships can be deduplicated to avoid redundancy) is subjected to normalization processing on the table names, the association relationships, the association fields and the like, for example, spaces, line feed characters, redundant horizontal bars, equal numbers and the like are removed; the information based on the normalization process is sorted into corresponding table relationship information according to the relationship definition, and the table relationship information can be represented by a json string. And repeating the steps until all the table relations are processed, thereby obtaining a relation set containing all the table relations. The set of relationships that contains all the table relationships may be exported in a json storage file.
In step S106, a physical model including table objects, fields and relationships is built based on the physical table set of the physical model and the relationship set of the physical model.
In one embodiment, the method of the present invention further comprises: and calculating the similarity between the table object pairs in the physical model based on the physical table set of the physical model, and performing de-duplication processing on the table object pairs with the similarity exceeding a preset threshold value to generate the physical table set of the physical model with lower redundancy. The physical table set of the physical model can be matched with the corresponding logic model to realize consistency detection of the model, so that the rationality and completeness of static semantics of the existing model (for example, a national power grid company enterprise public data model SG-CIM4.0) are improved, redundancy is effectively reduced, non-spatial knowledge data which are difficult to observe are converted into a spatial map, cognition and understanding of personnel in related fields are facilitated, and an effective solution is provided for correlation and communication of cross-domain entities. Meanwhile, the strong semantic processing capability of the knowledge graph technology to describe entities, attributes and relationships can be well embodied.
As a second aspect of the invention, a system for building a physical model for a power grid knowledge-graph is provided. Fig. 4 schematically illustrates a system 200 for building a physical model for a grid intellectual graph according to one embodiment of the invention. The system 200 may include a physical table set generation unit 201, a relationship set generation unit 202, and a processing unit 203. The processing unit 203 is communicatively coupled with the physical table set generation unit 201 and the relationship set generation unit 202.
The physical table set generating unit 201 may be configured to:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information, and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set including the physical model of the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object.
The relationship set generation unit 202 may be configured to:
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of relationships between source table objects and target table objects, wherein the second data source comprises a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set.
In one embodiment, for a pair of source table object and target table object and another pair of source table object and target table object in the second data source, if the table name of its respective source table object, the table name of the target table object, the association relationship between the source table object and the target table object, and the association field between the source table object and the target table object are all the same, it is determined that the relationship between the pair of source table object and target table object is the same as the relationship between the another pair of source table object and target table object, and for the same relationship, only one of the relationships is normalized and table relationship information of the corresponding source table object and target table object is generated according to the relationship mode.
The processing unit 203 may be configured to: based on the physical table set of the physical model and the relationship set of the physical model, a physical model including table objects, fields, and relationships is established.
In one embodiment, the field is determined according to a predefined field pattern based on the field related information and the field source related information in the first data source, the field pattern including a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system.
In one embodiment, the table schema includes a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data sourcing system, a table name of a data sourcing system, and a field list for a table object.
In one embodiment, the relationship schema includes a table name of the source table object, a table name of the target table object, an association between the source table object and the target table object, an association field between the source table object and the target table object, a subject field, and a secondary subject field.
It will be appreciated that the specific features described herein in relation to the method for building a physical model for a power grid intellectual graph of the first aspect may also be applied similarly to the system for building a physical model for a power grid intellectual graph of the second aspect to similar extensions. For the sake of simplicity, it is not described in detail.
It should be understood that the various elements of the system 200 for building a physical model for a power grid knowledge graph of the present invention may be implemented in whole or in part by software, hardware, firmware, or a combination thereof. The units may be embedded in a processor of the computer device in a hardware or firmware form or independent of the processor, or may be stored in a memory of the computer device in a software form for being called by the processor to execute operations of the units. Each of the units may be implemented as a separate component or module, or two or more units may be implemented as a single component or module.
It will be appreciated by those of ordinary skill in the art that the schematic diagram of the system 200 shown in fig. 4 is merely an illustrative block diagram of portions of structure associated with aspects of the present invention and does not constitute a limitation of the computer device, processor or computer program embodying aspects of the present invention. A particular computer device, processor or computer program may include more or fewer components or modules than shown in the figures, or may combine or split certain components or modules, or may have a different arrangement of components or modules.
In the present invention, a library of table schemas of table objects and fields thereof is provided, from which a table schema for defining the table objects and fields thereof is determined.
In the present invention, a library of relational patterns representing relationships between table objects is provided, from which relational patterns defining relationships between source table objects and target table objects are determined.
In the invention, an alias set library of the relationship among table objects, fields thereof and the table objects is arranged, the alias set library comprises aliases recorded in the past and the occurrence frequency thereof, the relationship among the table objects, the fields thereof and the table objects appearing in the first data source and the second data source is recorded in the alias library, and the occurrence frequency is accumulated; the relationship among the displayed table objects, the fields thereof and the table objects is the relationship among the table objects, the fields thereof and the table objects with the largest frequency of occurrence.
In a preferred embodiment, for the table object alias set library, the alias set library of its fields and the alias set library of the relationship between table objects, a tag is set for each record to distinguish between different acquisitions. In this way, alias libraries from different sources, such as different departments, may be merged, and if two records have the same label, they are considered to be from the same collection, not accumulated. The tags include, for example, date, time, random sequence. The date is in an 8-bit pattern such as 20201030 with the time accurate to minutes or seconds such as 1830 or 183025 and the random sequence is a random number of 6-10 bits for verification. Name transitions of table objects, fields thereof and relationships between table objects can be tracked by recording acquisition dates, generally showing the most popular and most massively used names, having a normative effect on uniform names.
As a third aspect of the invention, a computer-readable storage medium is provided, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of the first aspect of the invention. In one embodiment, the computer program is distributed across a plurality of computer devices or processors coupled by a network such that the computer program is stored, accessed, and executed by one or more computer devices or processors in a distributed fashion. A single method step/operation, or two or more method steps/operations, may be performed by a single computer device or processor or by two or more computer devices or processors. One or more method steps/operations may be performed by one or more computer devices or processors, and one or more other method steps/operations may be performed by one or more other computer devices or processors. One or more computer devices or processors may perform a single method step/operation, or perform two or more method steps/operations.
It will be appreciated by those skilled in the art that all or part of the steps of the method for establishing a physical model for a power grid knowledge graph of the present invention may be directed to associated hardware, such as a computer device or a processor, by a computer program, which may be stored in a non-transitory computer readable storage medium, which when executed performs the steps of the ancillary method of the present invention. Any reference herein to memory, storage, databases, or other media may include non-volatile and/or volatile memory, as appropriate. Examples of non-volatile memory include read-only memory (ROM), programmable ROM (prom), electrically programmable ROM (eprom), electrically erasable programmable ROM (eeprom), flash memory, magnetic tape, floppy disk, magneto-optical data storage device, hard disk, solid state disk, and the like. Examples of volatile memory include Random Access Memory (RAM), external cache memory, and the like.
The respective technical features described above may be arbitrarily combined. Although not all possible combinations of features are described, any combination of features should be considered to be covered by the present specification as long as there is no contradiction between such combinations.
While the present invention has been described in connection with the embodiments, it is to be understood by those skilled in the art that the foregoing description and drawings are merely illustrative and not restrictive of the broad invention, and that this invention not be limited to the disclosed embodiments. Various modifications and variations are possible without departing from the spirit of the invention.
Claims (10)
1. A method for building a physical model for a power grid knowledge graph, comprising:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information, and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of relationships between source table objects and target table objects, wherein the second data source comprises a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set;
based on the physical table set of the physical model and the relationship set of the physical model, a physical model including table objects, fields, and relationships is established.
2. The method of claim 1, the field determined according to a predefined field schema based on field-related information and field source-related information in the first data source, the field schema including a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system.
3. The method of claim 1, the table schema comprising a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data sourcing system, a table name of a data sourcing system, and a field list of a table object.
4. The method of claim 1, the relationship schema comprising a table name of a source table object, a table name of a target table object, an association between a source table object and a target table object, an association field between a source table object and a target table object, a subject field, and a secondary subject field.
5. The method of claim 4, generating table relationship information for a pair of source table objects and target table objects in the relationship schema based on the second data source comprises: for a pair of source table objects and target table objects and another pair of source table objects and target table objects in the second data source, if the table names of the respective source table objects, the table names of the target table objects, the association relationship between the source table objects and the target table objects and the association fields between the source table objects and the target table objects are all the same, then the relationship between the source table objects and the target table objects of the pair is judged to be the same as the relationship between the source table objects and the target table objects of the another pair, and for the same relationship, only one relationship is normalized and table relationship information of the corresponding source table objects and target table objects is generated according to the relationship mode.
6. A system for building a physical model for a power grid knowledge graph, comprising: a physical table set generating unit, a relation set generating unit and a processing unit,
wherein the physical table set generation unit is configured to:
determining a table schema for defining table objects and fields thereof;
receiving a first data source comprising a plurality of table objects, the first data source comprising table object related information, field related information, table object source related information, and/or field source related information;
for each table object, generating corresponding table information according to the table mode based on the first data source, thereby obtaining a table information set of all table objects included by the first data source, so as to generate a physical table set of the physical model including the table information set, wherein the table information at least indicates a table name, a field, a table object source and a field source of the table object;
wherein the relationship set generation unit is configured to:
determining a relationship schema for defining a relationship between a source table object and a target table object;
receiving a second data source comprising relationship related information of relationships between source table objects and target table objects, wherein the second data source comprises a plurality of pairs of source table objects and target table objects, and for each pair of source table objects and target table objects, generating table relationship information of the pair of source table objects and target table objects according to the relationship mode based on the second data source, so as to obtain a table relationship information set of all relationships included in the second data source, so as to generate a relationship set of the physical model including the table relationship information set;
wherein the processing unit is configured to:
based on the physical table set of the physical model and the relationship set of the physical model, a physical model including table objects, fields, and relationships is established.
7. The system of claim 6, the field determined according to a predefined field schema based on field related information and field source related information in the first data source, the field schema including a field name of the field, a field data type, a field description, a standard code, a data storage format, a hash column, a department of responsibility, a name of the data source system, a table name of the data source system, a field name of the data source system, and a field type of the data source system.
8. The system of claim 6, the table schema comprising a table name, a subject field, a secondary subject field, a table type, a table description, a department of responsibility, a name of a data source system, a table name and a field list of a data source system for a table object, the relationship schema comprising a table name for a source table object, a table name for a target table object, an association between a source table object and a target table object, an association field between a source table object and a target table object, a subject field and a secondary subject field.
9. The system of claim 8, generating table relationship information for a pair of source table objects and target table objects in the relationship schema based on the second data source comprises: for a pair of source table objects and target table objects and another pair of source table objects and target table objects in the second data source, if the table names of the respective source table objects, the table names of the target table objects, the association relations between the source table objects and the target table objects, and the association fields between the source table objects and the target table objects are all the same, it is determined that the relations between the pair of source table objects and the target table objects are the same as the relations between the another pair of source table objects and the target table objects, and for the same relations, only one relation is normalized and table relation information of the corresponding source table objects and target table objects is generated according to the relation mode.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011197189.XA CN112363996B (en) | 2020-10-30 | 2020-10-30 | Method, system and medium for establishing physical model of power grid knowledge graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011197189.XA CN112363996B (en) | 2020-10-30 | 2020-10-30 | Method, system and medium for establishing physical model of power grid knowledge graph |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112363996A true CN112363996A (en) | 2021-02-12 |
CN112363996B CN112363996B (en) | 2023-10-24 |
Family
ID=74512400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011197189.XA Active CN112363996B (en) | 2020-10-30 | 2020-10-30 | Method, system and medium for establishing physical model of power grid knowledge graph |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112363996B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113535739A (en) * | 2021-09-16 | 2021-10-22 | 国网浙江省电力有限公司信息通信分公司 | Data market layer table establishing method based on power grid energy data |
CN114168608A (en) * | 2021-12-16 | 2022-03-11 | 中科雨辰科技有限公司 | Data processing system for updating knowledge graph |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019825A (en) * | 2017-07-25 | 2019-07-16 | 华为技术有限公司 | A kind of method and device for analyzing data semantic |
US20190354544A1 (en) * | 2011-02-22 | 2019-11-21 | Refinitiv Us Organization Llc | Machine learning-based relationship association and related discovery and search engines |
CN111159365A (en) * | 2019-11-26 | 2020-05-15 | 国网湖南省电力有限公司 | Method, system and storage medium for implementing intelligent question-answering system of scheduling model body |
-
2020
- 2020-10-30 CN CN202011197189.XA patent/CN112363996B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190354544A1 (en) * | 2011-02-22 | 2019-11-21 | Refinitiv Us Organization Llc | Machine learning-based relationship association and related discovery and search engines |
CN110019825A (en) * | 2017-07-25 | 2019-07-16 | 华为技术有限公司 | A kind of method and device for analyzing data semantic |
CN111159365A (en) * | 2019-11-26 | 2020-05-15 | 国网湖南省电力有限公司 | Method, system and storage medium for implementing intelligent question-answering system of scheduling model body |
Non-Patent Citations (1)
Title |
---|
刘俊楠;刘海砚;陈晓慧;郭漩;郭文月;朱新铭;赵清波;: "面向多源地理空间数据的知识图谱构建", 地球信息科学学报, no. 07 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113535739A (en) * | 2021-09-16 | 2021-10-22 | 国网浙江省电力有限公司信息通信分公司 | Data market layer table establishing method based on power grid energy data |
CN113535739B (en) * | 2021-09-16 | 2021-12-07 | 国网浙江省电力有限公司信息通信分公司 | Data market layer table establishing method based on power grid energy data |
CN114168608A (en) * | 2021-12-16 | 2022-03-11 | 中科雨辰科技有限公司 | Data processing system for updating knowledge graph |
Also Published As
Publication number | Publication date |
---|---|
CN112363996B (en) | 2023-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223168B (en) | Label propagation anti-fraud detection method and system based on enterprise relationship map | |
CN111967761A (en) | Monitoring and early warning method and device based on knowledge graph and electronic equipment | |
CN111191125A (en) | Data analysis method based on tagging | |
CN113760891B (en) | Data table generation method, device, equipment and storage medium | |
CN109726393B (en) | Policy analysis system and method based on natural language processing technology | |
CN112000773B (en) | Search engine technology-based data association relation mining method and application | |
CN103605651A (en) | Data processing showing method based on on-line analytical processing (OLAP) multi-dimensional analysis | |
CN104573130A (en) | Entity resolution method based on group calculation and entity resolution device based on group calculation | |
CN108241867B (en) | Classification method and device | |
CN113204603B (en) | Category labeling method and device for financial data assets | |
CN111680506A (en) | External key mapping method and device of database table, electronic equipment and storage medium | |
CN116881430B (en) | Industrial chain identification method and device, electronic equipment and readable storage medium | |
CN114547346B (en) | Knowledge graph construction method and device, electronic equipment and storage medium | |
CN111192176A (en) | Online data acquisition method and device supporting education informatization assessment | |
CN111737477A (en) | Intellectual property big data-based intelligence investigation method, system and storage medium | |
CN106980639B (en) | Short text data aggregation system and method | |
CN112363996B (en) | Method, system and medium for establishing physical model of power grid knowledge graph | |
CN115794803B (en) | Engineering audit problem monitoring method and system based on big data AI technology | |
Widad et al. | Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis | |
Paraschiv et al. | A unified graph-based approach to disinformation detection using contextual and semantic relations | |
CN111190880A (en) | Database detection method and device and computer readable storage medium | |
CN113505117A (en) | Data quality evaluation method, device, equipment and medium based on data indexes | |
CN116049376B (en) | Method, device and system for retrieving and replying information and creating knowledge | |
CN112214615A (en) | Policy document processing method and device based on knowledge graph and storage medium | |
CN117876083A (en) | Intelligent analysis method and device for business opportunity, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |