GitHub - dice-group/ASSET at e451feb4d9c87225b3fc0c52ec911f79008fa9f0

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
notebooks		notebooks
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

ASSET: A Semi-supervised Approach for Entity Typing in Knowledge Graphs

This repository contains the implementation and dataset of our paper ASSET: A Sem-supervised Approach for Entity Typing in Knowledge Graphs.

Fig. 1 Pipleine of our Semantification Process

Summary

In this work, we propose a novel approach for KG entity typing in knowledge graphs that leverages semi-supervised learning from massive unlabeled data.
Our approach follows a teacher-student learning paradigm that allows combining a small amount of labeled data with a large amount of unlabeled data to boost the performance of downstream tasks: the teacher model annotates the unlabeled data with pseudo labels; then the student is trained on the pseudo-labeled data and a small amount of high-quality labeled data.
We conduct several experiments on two benchmarking datasets (FB15k-ET and YAGO43k-ET), our results demonstrate that our approach outperforms state-of-the-art baselines in entity-typing task.

Requirements

python 3.6
scikit-learn 0.24
tensorflow 2.0
scikit-multilearn 0.2.0 
pickle5 0.0.11
pykeen 1.5.0

Installation

You can install all requirements via pip install -r requirements.txt

Datasets

In the data folder, you can download the benchmarking datasets FB15k-ET and YAGO43K-ET.
For ConnectE embedding, we use the source code from its Github repository. For further details, we refer users to follow the installation instructions on the packages' websites.
If you want to experiment with another knowledge graph embedding model. We recommend Pykeen or Graphvite for generating embeddings for the datasets.
Furthermore, we provide the preprocessed files used in our experiments in the data/preprocessed files folder that can be used directly to evaluate the baselines and our approach.

How to run:

We provide the source code in Python in the folder src. Users can download the code and use it in their favorite IDE, configure it with different models and datasets.
As examples, we provide two jupyter noteboos with a description for FB15k-ET and YAGP43k-ET in the notebook folder. First, users should install the required libraries, then locate the data files (e.g., the file of pre-trained embedding models and groud-truth labels.)

Hyper-parameters

The following are our optimal values for the hyper-parameters used in the experiments:

epochs = 100            # Maximum number of epochs
patience = 3            # After how many iterations to stop the training
batch_size = 128        # How many sequences in each batch during training
lr = 0.001             # Learning rate of Adam optimizer
Dropout= 0.20          # Dropout rate in the Deep Neural Model

Contact

If you have any feedback or suggestions, feel free to send me an email to [email protected]

Cite

TBD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASSET: A Semi-supervised Approach for Entity Typing in Knowledge Graphs

Summary

Requirements

Installation

Datasets

How to run:

Hyper-parameters

Contact

Cite

About

Releases

Packages

Contributors 2

Languages

License

dice-group/ASSET

Folders and files

Latest commit

History

Repository files navigation

ASSET: A Semi-supervised Approach for Entity Typing in Knowledge Graphs

Summary

Requirements

Installation

Datasets

How to run:

Hyper-parameters

Contact

Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages