From f516a3623545e3897d994241bdb7a90735c60970 Mon Sep 17 00:00:00 2001 From: wkdtjrrb <103736979+wkdtjrrb@users.noreply.github.com> Date: Wed, 15 Nov 2023 22:35:07 +0900 Subject: [PATCH 1/4] Delete README.md --- README.md | 96 ------------------------------------------------------- 1 file changed, 96 deletions(-) delete mode 100644 README.md diff --git a/README.md b/README.md deleted file mode 100644 index f88ce58..0000000 --- a/README.md +++ /dev/null @@ -1,96 +0,0 @@ -
- -
-The graph nodes are generated first using pretrained language model, such as T5.The input text is transformed into a sequence of text entities. The features corresponding to each entity (node) is extracted and then sent to the edge generation module. -- -
-Edge construction, using generation (e.g.,GRU) or a classifier head. Blue circles represent the features corresponding to the actual graph edges (solid lines) and the white circles are the features that are decoded into ⟨NO_EDGE⟩ (dashed line). -- -
- -## Environment -To run this code, please install PyTorch and Pytorch Lightning (we tested the code on Pytorch 1.13 and Pytorch Lightning 1.8.1) - - -## Setup -Install dependencies -```bash -# clone project -git clone git@github.com:IBM/Grapher.git - -# navigate to the directory -cd Grapher - -# clone an external repository for reading the data -git clone https://gitlab.com/webnlg/corpus-reader.git corpusreader - -# clone another external repositories for scoring the results -git clone https://github.com/WebNLG/WebNLG-Text-to-triples.git WebNLG_Text_to_triples - ``` -## Data - -WebNLG 3.0 dataset - ```bash -# download the dataset -git clone https://gitlab.com/shimorina/webnlg-dataset.git -``` - -## How to train -There are two scripts to run two versions of the algorithm -```bash -# naviagate to scripts directory -cd scripts - -# run Grapher with the edge generation head -bash train_gen.sh - -# run Grapher with the classifier edge head -bash train_class.sh -``` - -## How to test -```bash -# run the test on experiment "webnlg_version_1" using latest checkpoint last.ckpt -python main.py --run test --version 1 --default_root_dir output --data_path webnlg-dataset/release_v3.0/en - -# run the test on experiment "webnlg_version_1" using checkpoint at iteration 5000 -python main.py --run test --version 1 --default_root_dir output --data_path webnlg-dataset/release_v3.0/en --checkpoint_model_id 5000 -``` - -## How to run inference -```bash -# run inference on experiment "webnlg_version_1" using latest checkpoint last.ckpt -python main.py --run inference --version 1 --default_root_dir output --inference_input_text "Danielle Harris had a main role in Super Capers, a 98 minute long movie." -``` - -## Results -Results can be visualized in Tensorboard -```bash -tensorboard --logdir output -``` - -### Citation -``` -@inproceedings{grapher2022, - title={Knowledge Graph Generation From Text}, - author={Igor Melnyk, Pierre Dognin, Payel Das}, - booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP)}, - year={2022} -} -``` \ No newline at end of file From 4bda71d397877ed10453b0b0c460d5bf4c8e1fa1 Mon Sep 17 00:00:00 2001 From: wkdtjrrb <103736979+wkdtjrrb@users.noreply.github.com> Date: Wed, 15 Nov 2023 22:38:04 +0900 Subject: [PATCH 2/4] Create README.md --- README.md | 125 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 125 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..653ab9d --- /dev/null +++ b/README.md @@ -0,0 +1,125 @@ +# Knowledge Graph Generation From Text : Grapher(2022) +(Igor Melnyk, Pierre Dognin, Payel Das) + +## [Abstract] + +- end-to-end multi-stage Knowledge Graph(KG) generation system from textual inputs + +**Two stages** + +- The graph nodes are generated using pretrained language model + +- The relationships are generated using available entity information + +**Challenges** + +- non-unique graph representation + +- complex node and edge structure + +## [Properties] + +- Use sota language models pretrained on large textual corpora (node generation is the key to algorithm performance) + - fine tuning the pretrained Transformer based language model + +- partitioning of graph construction process into two steps ensure efficiency(generate each node and edge only once : Unique model) + + + + +- end to end trainable(avoid the need of any external NLP pipelines) + +## [Method] + + + +**1. Node Generation: Text Nodes** +- objective: generate a set of unique nodes + +- Use pre-trained encoder-decoder language model(PLM, sequence-to-sequence) + +- this module additionally supplies node features for the downstream task of edge generation + +- greedy decode the generated string and utilize the seperation tokens +