Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
imgs		imgs
misc		misc
model		model
scripts		scripts
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Repository files navigation

Knowledge Graph Generation From Text : Grapher(2022) 논문 요약정리 - 202127288 장석규

(Igor Melnyk, Pierre Dognin, Payel Das)

[Abstract]

end-to-end multi-stage Knowledge Graph(KG) generation system from textual inputs

Two stages

The graph nodes are generated using pretrained language model
The relationships are generated using available entity information

Challenges

non-unique graph representation
complex node and edge structure

[Properties]

Use sota language models pretrained on large textual corpora (node generation is the key to algorithm performance)
- fine tuning the pretrained Transformer based language model
partitioning of graph construction process into two steps ensure efficiency(generate each node and edge only once : Unique model)

end to end trainable(avoid the need of any external NLP pipelines)

[Method]

1. Node Generation: Text Nodes

objective: generate a set of unique nodes
Use pre-trained encoder-decoder language model(PLM, sequence-to-sequence)
this module additionally supplies node features for the downstream task of edge generation
greedy decode the generated string and utilize the seperation tokens <NODE_SEP> to delineate the node boundaries and mean-pool the hidden states of the decoder's last layer
- <NODE_SEP> : 노드의 경계를 구분하는 토큰
fix upfront the number of generated nodes and fill the missing ones with a special <NO_NODE> token
This approach ignores that the graph nodes are permutation invariant permutation invariant(한정된 집합의 요소(여기에서는 노드)를 임의의 순서로 배열해도 결과가 동일한 특성을 가짐)
node generation using traditional S2S model

2. Node Generation : Query Nodes

2.1 LEARNALBE NODE QUERIES

the decoder receives as input a set of learnable node queries, represented as an embedding matrix
disable causal masking to let Transformer attend to all the queries simultaneously

2.2 PERMUTATION MATRIX

avoid the system to memorize the particular target node order and enable permutation-invariance
the logits and features

s = 1,...,S / P : permutation matrix obtained using bipartite matching algorithm(이분 매칭 알고리즘) between the target and the greedy- decoded nodes

node generation using learned query vectors

3. Edge Generation

use the generated set of node features in this module for the edge generation
- given a pair of node features, a prediction head decides the existence of an edge
Option 1: use a head similar to the one in the Node Generation : Query Nodes to generate edges as as a sequence of tokens.
- generate unseen edges during trainning, at the rsik of not matching the target edge token sequence exactly

Option 2: use a classification head to predict the edges
- if the set of possible relationships is fixed and known, this option is efficient and accurate

4. Imbalance Edge Distribution

small savings : ignoring self-edges, igonring edges with the node <NO_NODE>
<NO_EDGE> : token that is given when there is no edge between two nodes
the number of acutal edges is small and <NO_EDGE> is large
- the generation and classification task is imbalnaced towards the <NO_EDGE> token/class
- to solve this problem, we can use modification of cross-entropy loss and change in the training pardigm

4.1 FOCAL LOSS

replace the CE loss with F loss
- F loss : down-weight the CE loss for ell classified sample and increase the CE loss for mis-classified ones(클래스 불균형에 대처)

4.2 SPARSE LOSS

sparsifying the adjacency matrix to remove most of the <NO_EDGE> edges, re-balancing the classes artificially

[LIMITATION]

the computational complexity of edge generation
Since nodes are generated using trasnformer-based models, the complexity of the attention mechanism leads to the limit on the size of the input text the system can handle
Only applied to ENGLISH domain

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Graph Generation From Text : Grapher(2022) 논문 요약정리 - 202127288 장석규

[Abstract]

[Properties]

[Method]

[LIMITATION]

About

Releases

Packages

Contributors 2

Languages

License

IBM/Grapher

Folders and files

Latest commit

History

Repository files navigation

Knowledge Graph Generation From Text : Grapher(2022) 논문 요약정리 - 202127288 장석규

[Abstract]

[Properties]

[Method]

[LIMITATION]

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages