CN115115227A

CN115115227A - Method for constructing product quality knowledge graph in papermaking field

Info

Publication number: CN115115227A
Application number: CN202210753820.2A
Authority: CN
Inventors: 满奕; 李继庚; 张欢欢; 洪蒙纳
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2022-09-27

Abstract

The invention discloses a method for constructing a product quality knowledge graph in the papermaking field, which is characterized in that related data of the product quality is generated based on the existing structured data and internet data in the papermaking field, and the acquired data forms basic data of the product quality in the papermaking field through collection, screening, analysis and summarization; performing word segmentation processing according to the acquired data information to form a product quality corpus of the papermaking field; selecting partial data from a corpus as a training set, and marking the partial data as training data in a manual mode; iteratively training the named entity recognition model by using the labeled training data to realize the extraction of knowledge; according to the method, relevant books, webpages, forums and other information are acquired, product quality relevant data information in the comprehensive papermaking field is obtained, a product quality knowledge classification system based on the papermaking field is established, and the product quality knowledge classification system is stored in the form of a graph database; the technical scheme provided by the invention can also be generalized from the papermaking industry to other industries.

Description

Method for constructing product quality knowledge graph in papermaking field

Technical Field

The invention relates to the technical field of knowledge graph construction, in particular to a method for constructing a product quality knowledge graph in the field of papermaking.

Background

The knowledge graph takes concepts and entities as centers and expresses the relationship between the concepts and the relationships between the entities. The knowledge graph can express knowledge with complex relations, and the incidence relation of influencing factors, the propagation path of variable links and the hierarchy of data can be clearly seen from the knowledge graph. Meanwhile, fault delay can be found through the change of the state of the knowledge graph. These properties of the knowledge-graph are just as useful in solving the problem of difficult expression of complex industrial process (e.g., paper making) relationships.

The defects and shortcomings of the prior art are as follows: at present, the application of the knowledge graph in the paper making industry is still blank. The scale of the paper making production process is huge, the structure is complex, and the coupling between production units is extremely strong, so that the difficulty of fault diagnosis in the paper making production process is increasingly high. Since an accurate mathematical model cannot be established in the paper industry process, the method of diagnosis using the mathematical model is not applicable; in the paper-making production process, various equipment variables have complicated incidence relations, and the data-based method cannot well express the incidence relations, so that the diagnosis capability is insufficient. The knowledge base capable of expressing complex association relations is needed for the fault diagnosis of the quality problems of the papermaking products and the fault diagnosis of the papermaking production process so as to help relevant personnel to carry out fault diagnosis. In view of this, there is a need for one or more methods for extracting knowledge and creating a knowledge map for complex industrial processes such as papermaking.

Disclosure of Invention

The invention provides a method for constructing a product quality knowledge graph in the papermaking field, aiming at solving the technical problem.

The purpose of the invention is realized by the following technical scheme:

a knowledge extraction and knowledge graph construction method for the papermaking field comprises the following steps:

step (1), collecting data:

generating relevant data of the papermaking industry based on the existing structured data of the papermaking industry, the Internet data and the book data, wherein the data comprises the structured data of existing relevant equipment and processes and relevant document information of the quality problems of the papermaking products acquired by a crawler at relevant websites of the papermaking enterprises, websites of the papermaking faults and relevant websites of the quality problems of the papermaking products; the paper-making product quality related document information comprises paper-making product quality standard document information, policy standard, patent, report and encyclopedia; the acquired data forms basic data of the product quality in the papermaking field after collection, screening, analysis and summarization;

step (2), data word segmentation:

performing word segmentation processing by using a word segmentation model according to the data information acquired in the step (1), and finally forming a product quality corpus in the papermaking field;

step (3), data annotation:

according to the corpus in the step (2), selecting partial data, manually labeling each character of Chinese as a recognition unit, and then taking the labeled data as a training set;

in the manual labeling process, the labeled classification comprises a fault type, a fault name, a fault equipment name, fault description (phenomenon), a fault reason and a fault solution.

Step (4), knowledge extraction

And (4) establishing a named entity recognition model according to the training set in the step (3), carrying out model training, and extracting knowledge of all documents by using the trained model.

Step (5), constructing a product quality knowledge map classification system

And constructing a concept and relationship classification system based on the product quality knowledge graph in the papermaking field by a top-down mode and a manual construction mode.

Constructing a concept and relationship classification system based on a product quality knowledge graph in the papermaking field, comprising the following steps of:

5.1, defining a knowledge classification system of the quality problem of the papermaking product, and designing 6 fault concepts which are respectively the product quality problem, the generation reason, the phenomenon, the solution, the position and the detection;

5.2, according to the fault concepts of the 6 types of defined papermaking product quality in the step 5.1, further subdividing the generalized concept co-occurrence relationship into concept relationships of 7 major classes and 14 minor classes according to semantic types;

7 the broad category of conceptual relationships includes: fault diagnosis, fault expression, action part, quality detection, fault part, detection result and diagnosis basis; the 14 subclasses of conceptual relationships include: the relation types between the product quality problem and the phenomenon, between the phenomenon and the generation reason, between the fault decision and the phenomenon are defined as fault expression; the relation type between the product quality problem and the occurrence part and between the part and the phenomenon is defined as a fault part; the type of the relationship between the product quality problem and the detection and the phenomenon is defined as the quality detection; the types of relationships between "product quality problems" and "cause of occurrence", "cause of occurrence" and "solution", "solution" and "product quality problems" are defined as "failure diagnosis"; the type of relationship between "detection" and "site", "site" and "solution" is defined as "site of action"; the type of the relationship between "detection" and "phenomenon" is defined as "detection result"; the type of relationship between "phenomenon" and "product quality problem" is defined as "diagnostic basis".

Step (6), knowledge storage

And correspondingly storing the extracted knowledge classification in a Neo4j database according to the constructed product quality concept and the relation classification system in the papermaking field.

The knowledge graph is used for describing various entities or concepts existing in the real world and relations of the entities or concepts, nodes represent the entities or concepts, and edges are formed by attributes or relations. An entity refers to something that is distinguishable and exists independently. A concept is a collection of entities with the same properties, such as product quality, equipment, process, etc. The concept relationship is used for describing the semantic relationship between two concepts and is an important constituent element of the structured knowledge.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

by adopting the technical scheme of the invention, basic data of the product quality in the papermaking field is obtained by acquiring information of related websites, books and the like, a concept and relationship classification system based on the product quality knowledge graph in the papermaking field is constructed, and the extracted knowledge classification is correspondingly stored in a Neo4j graph database to form the product quality knowledge graph based on the papermaking field; the technical scheme provided by the invention can also be generalized from the paper making industry to other complex process industries.

Drawings

FIG. 1 is an overall operational schematic of the present invention;

FIG. 2 is a diagram of a product quality failure knowledge graph concept and classification architecture of the present invention;

FIG. 3 is a schematic flow chart of an embodiment of the present invention;

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1 and fig. 3, the present invention provides a specific example of a method for constructing a product quality knowledge graph in the papermaking field, which comprises the following steps:

step (1), collecting data:

step (2), data word segmentation:

the description is given by taking a Thulac word segmentation model as an example, and the specific steps of the Thulac word segmentation model are as follows: firstly, introducing a Thulac toolkit; then reading the collected basic data of the quality of the products in the papermaking field; then calling a sulac module in a sulac toolkit to perform word segmentation processing on the collected basic data of the product quality in the papermaking field; then storing the data after word segmentation processing; finally, the stored data is subjected to punctuation removal and illegal character operation and is stored again in a text form;

step (3), data annotation:

Related words such as fault type, fault name, fault equipment name, fault description (phenomenon), fault reason and fault solution appearing in the material library are respectively marked as a fault type entity, a fault name entity, a fault equipment name entity, a fault description (phenomenon) entity, a fault reason entity and a fault solution entity. The manual labeling by the BMES four-bit sequence labeling method is taken as an example for explanation: b represents the beginning of a word, M represents a word, E is marked with the end of a word, and S represents a single word.

Step (4), knowledge extraction

The method is described by taking an example of training a named entity recognition model based on a bidirectional long-and-short time memory network and a conditional random field model (Bi-LSTM + CRF model for short) to realize extraction of knowledge: firstly, mapping the marked Chinese characters into word vectors as input of a model; then inputting the word vectors into a BilSTM layer, and outputting the score probability of each word corresponding to each label; then, obtaining a final prediction result by learning sequence dependence information among labels in a CRF layer, and outputting a predicted sequence label of each word; and finally, inputting the document data into a trained Bi-LSTM + CRF model for knowledge extraction.

Step (5), constructing a product quality knowledge map classification system

5.2, according to the fault concepts of the 6 types of defined papermaking product quality in the step 5.1, further subdividing the generalized concept co-occurrence relationship into concept relationships of 7 major classes and 14 minor classes according to semantic types; as shown in fig. 2, the 7 broad-class conceptual relationships include: fault diagnosis, fault expression, action part, quality detection, fault part, detection result and diagnosis basis; as shown in fig. 2, the conceptual relationship of 14 subclasses includes: the relation types between the product quality problem and the phenomenon, between the phenomenon and the generation reason, between the fault decision and the phenomenon are defined as fault expression; the relation type between the product quality problem and the occurrence part and between the part and the phenomenon is defined as a fault part; the type of the relationship between the product quality problem and the detection and the phenomenon is defined as the quality detection; the types of relationships between "product quality problems" and "cause of occurrence", "cause of occurrence" and "solution", "solution" and "product quality problems" are defined as "failure diagnosis"; the type of relationship between "detection" and "site", "site" and "solution" is defined as "site of action"; the type of the relationship between "detection" and "phenomenon" is defined as "detection result"; the type of relationship between the "phenomenon" and the "product quality problem" is defined as "diagnostic basis".

Step (6), knowledge storage

By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained: by adopting the technical scheme of the invention, basic data of the product quality in the papermaking field is obtained by acquiring information of related websites, books and the like, a concept and relationship classification system based on the product quality knowledge graph in the papermaking field is constructed, and the extracted knowledge classification is correspondingly stored in a Neo4j graph database to form the product quality knowledge graph based on the papermaking field; the technical scheme provided by the invention can also be generalized from the paper making industry to other complex process industries.

The principle and embodiments of the present invention have been described herein by way of specific examples, which are provided only to help understand the method and the core idea of the present invention, and the above is only a preferred embodiment of the present invention, and it should be noted that there are objectively infinite specific structures due to the limited character expressions, and it will be apparent to those skilled in the art that a plurality of modifications, decorations or changes can be made without departing from the principle of the present invention, and the above technical features can also be combined in a suitable manner; such modifications, variations, combinations, or adaptations of the invention in other instances, which may or may not be practiced, are intended to be within the scope of the present invention.

Claims

1. A method for constructing a product quality knowledge graph in the papermaking field is characterized by comprising the following steps:

step (1), collecting data:

generating a papermaking industry based on the existing papermaking industry structured data, the internet data and the book data; the acquired data forms basic data of the product quality in the papermaking field after collection, screening, analysis and summarization;

step (2), data word segmentation:

step (3), data annotation:

in the manual marking process, the marked classification comprises a fault type, a fault name, a fault equipment name, a fault description, a fault reason and a fault solution;

step (4), knowledge extraction

Establishing a named entity recognition model according to the training set in the step (3), carrying out model training, and extracting knowledge from all documents by using the trained model;

step (5) constructing a product quality knowledge graph classification system

Constructing a classification system of product quality concepts and relationships in the papermaking field by a top-down mode and a manual construction mode;

step (6), knowledge storage

2. The method for constructing the product quality knowledge graph in the papermaking field according to claim 1, wherein the related data in the step (1) comprises structured data of existing related equipment and processes, and papermaking product quality problem related document information collected by a crawler at a related papermaking enterprise website, a papermaking failure website and a papermaking product quality problem related website; the paper-making product quality related document information comprises paper-making product quality standard document information, policy standard, patent, report and encyclopedia.

3. A method for constructing a product quality knowledge-graph for use in the papermaking art according to claim 1, characterized in that step (5) comprises the following sub-steps:

5.1, defining a knowledge classification system of the quality problem of the papermaking product, and designing 6 types of concepts of papermaking faults, namely product quality problem, generation reason, phenomenon, solution, position and detection;

and 5.2, further subdividing the generalized concept co-occurrence relationship into concept relationships of 7 major classes and concept relationships of 14 minor classes according to semantic types according to the 6 defined concepts of the faults of the paper making process in the step 5.1.

4. A method of building a product quality knowledge graph for use in the papermaking arts as claimed in claim 3 wherein 7 broad categories of conceptual relationships comprise: fault diagnosis, fault expression, action part, quality detection, fault part, detection result and diagnosis basis;

the 14 subclasses of conceptual relationships include: the relation types between the product quality problem and the phenomenon, between the phenomenon and the generation reason, between the fault decision and the phenomenon are defined as fault expression; the relation type between the product quality problem and the occurrence part and between the part and the phenomenon is defined as a fault part; the type of the relationship between the product quality problem and the detection and the phenomenon is defined as the quality detection; the types of relationships between "product quality problems" and "cause of occurrence", "cause of occurrence" and "solution", "solution" and "product quality problems" are defined as "failure diagnosis"; the type of relationship between "detection" and "site", "site" and "solution" is defined as "site of action"; the type of the relationship between "detection" and "phenomenon" is defined as "detection result"; the type of relationship between "phenomenon" and "product quality problem" is defined as "diagnostic basis".