(Igor Melnyk, Pierre Dognin, Payel Das)
- end-to-end multi-stage Knowledge Graph(KG) generation system from textual inputs
Two stages
-
The graph nodes are generated using pretrained language model
-
The relationships are generated using available entity information
Challenges
-
non-unique graph representation
-
complex node and edge structure
-
Use sota language models pretrained on large textual corpora (node generation is the key to algorithm performance)
- fine tuning the pretrained Transformer based language model
-
partitioning of graph construction process into two steps ensure efficiency(generate each node and edge only once : Unique model)
- end to end trainable(avoid the need of any external NLP pipelines)
1. Node Generation: Text Nodes
-
objective: generate a set of unique nodes
-
Use pre-trained encoder-decoder language model(PLM, sequence-to-sequence)
-
this module additionally supplies node features for the downstream task of edge generation
-
greedy decode the generated string and utilize the seperation tokens <NODE_SEP> to delineate the node boundaries and mean-pool the hidden states of the decoder's last layer
- <NODE_SEP> : 노드의 경계를 구분하는 토큰
-
fix upfront the number of generated nodes and fill the missing ones with a special <NO_NODE> token
-
This approach ignores that the graph nodes are permutation invariant permutation invariant(한정된 집합의 요소(여기에서는 노드)를 임의의 순서로 배열해도 결과가 동일한 특성을 가짐)
-
node generation using traditional S2S model
2. Node Generation : Query Nodes
2.1 LEARNALBE NODE QUERIES
-
the decoder receives as input a set of learnable node queries, represented as an embedding matrix
-
disable causal masking to let Transformer attend to all the queries simultaneously
2.2 PERMUTATION MATRIX
-
avoid the system to memorize the particular target node order and enable permutation-invariance
-
the logits and features
s = 1,...,S / P : permutation matrix obtained using bipartite matching algorithm(이분 매칭 알고리즘) between the target and the greedy- decoded nodes
- node generation using learned query vectors
3. Edge Generation
-
use the generated set of node features in this module for the edge generation
- given a pair of node features, a prediction head decides the existence of an edge
-
Option 1: use a head similar to the one in the Node Generation : Query Nodes to generate edges as as a sequence of tokens.
- generate unseen edges during trainning, at the rsik of not matching the target edge token sequence exactly
- Option 2: use a classification head to predict the edges
- if the set of possible relationships is fixed and known, this option is efficient and accurate
4. Imbalance Edge Distribution
-
small savings : ignoring self-edges, igonring edges with the node <NO_NODE>
-
<NO_EDGE> : token that is given when there is no edge between two nodes
-
the number of acutal edges is small and <NO_EDGE> is large
- the generation and classification task is imbalnaced towards the <NO_EDGE> token/class
- to solve this problem, we can use modification of cross-entropy loss and change in the training pardigm
4.1 FOCAL LOSS
- replace the CE loss with F loss
4.2 SPARSE LOSS
- sparsifying the adjacency matrix to remove most of the <NO_EDGE> edges, re-balancing the classes artificially
-
the computational complexity of edge generation
-
Since nodes are generated using trasnformer-based models, the complexity of the attention mechanism leads to the limit on the size of the input text the system can handle
-
Only applied to ENGLISH domain