Knowledge Graph Generation From Text

Description

Grapher is an end-to-end multi-stage Knowledge Graph (KG) construction system, that separates the overall generation process into two stages.

The graph nodes are generated first using pretrained language model, such as T5.The input text is transformed into a sequence of text entities. The features corresponding to each entity (node) is extracted and then sent to the edge generation module.

Edge construction, using generation (e.g.,GRU) or a classifier head. Blue circles represent the features corresponding to the actual graph edges (solid lines) and the white circles are the features that are decoded into ⟨NO_EDGE⟩ (dashed line).

Environment

To run this code, please install PyTorch and Pytorch Lightning (we tested the code on Pytorch 1.13 and Pytorch Lightning 1.8.1)

Setup

Install dependencies

# clone project   
git clone [email protected]:IBM/Grapher.git

# navigate to the directory
cd Grapher

# clone an external repository for reading the data
git clone https://gitlab.com/webnlg/corpus-reader.git corpusreader

# clone another external repositories for scoring the results
git clone https://github.com/WebNLG/WebNLG-Text-to-triples.git WebNLG_Text_to_triples

Data

WebNLG 3.0 dataset

# download the dataset   
git clone https://gitlab.com/shimorina/webnlg-dataset.git

How to train

There are two scripts to run two versions of the algorithm

# naviagate to scripts directory
cd scripts

# run Grapher with the edge generation head
bash train_gen.sh

# run Grapher with the classifier edge head
bash train_class.sh

How to test

# run the test on experiment "webnlg_version_1" using latest checkpoint last.ckpt
python main.py --run test --version 1 --default_root_dir output --data_path webnlg-dataset/release_v3.0/en

# run the test on experiment "webnlg_version_1" using checkpoint at iteration 5000
python main.py --run test --version 1 --default_root_dir output --data_path webnlg-dataset/release_v3.0/en --checkpoint_model_id 5000

Results

Results can be visualized in Tensorboard

tensorboard --logdir output

Citation

@inproceedings{grapher2022,
  title={Knowledge Graph Generation From Text},
  author={Igor Melnyk, Pierre Dognin, Payel Das},
  booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (Findings of EMNLP)},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
imgs		imgs
misc		misc
model		model
scripts		scripts
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Graph Generation From Text

Description

Environment

Setup

Data

How to train

How to test

Results

Citation

About

Releases

Packages

Languages

License

Cemberk/Grapher

Folders and files

Latest commit

History

Repository files navigation

Knowledge Graph Generation From Text

Description

Environment

Setup

Data

How to train

How to test

Results

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages