-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
18 changed files
with
640,508 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,54 @@ | ||
# ASSET | ||
A Semi-supervised Approach for Entity Typing in Knowledge Graphs | ||
## ASSET: A Semi-supervised Approach for Entity Typing in Knowledge Graphs | ||
This repository contains the implementation and dataset of our paper *ASSET: A Sem-supervised Approach for Entity Typing in Knowledge Graphs*. | ||
|
||
## Summary | ||
- In this work, we propose a novel approach for KG entity typing in knowledge graphs that leverages semi-supervised learning from massive unlabeled data. | ||
- Our approach follows a teacher-student learning paradigm that allows combining a small amount of labeled data with a large amount of unlabeled data to boost performance: The teacher model annotates the unlabeled data with pseudo labels; then the student is trained on the pseudo-labeled data and a small amount of high-quality labeled data. | ||
- We conduct several experiments on two benchmarking datasets (FB15k-ET and YAGO43k-ET). | ||
|
||
## Requirements | ||
``` | ||
python 3.6 | ||
scikit-learn 0.24 | ||
tensorflow 2.0 | ||
scikit-multilearn 0.2.0 | ||
pickle5 0.0.11 | ||
pykeen 1.5.0 | ||
``` | ||
## Installation | ||
You can install all requirements via ```pip install -r requirements.txt``` | ||
|
||
## Datasets | ||
|
||
* In the `data` folder, you can download the benchmarking datasets `FB15k-ET` and `YAGO43K-ET`. | ||
|
||
* For ConnectE embedding, we use the source code from its [Github repository](https://github.com/Adam1679/ConnectE). For further details, we refer users to follow the installation instructions on the packages' websites. | ||
|
||
* If you want to experiment with another knowledge graph embedding model. We recommend [Pykeen](https://pykeen.github.io/) or [Graphvite](https://graphvite.io/) for generating embeddings for the datasets. | ||
|
||
* Furthermore, we provide the preprocessed files used in our experiments in the ```data/preprocessed files``` folder that can be used directly to evaluate the baselines and our approach. | ||
|
||
## How to run | ||
- We provide a jupyter notebook with a description for each dataset in the ```notebook``` folder. First, users should install the required libraries, then locate the data files (e.g., the file of pre-trained embedding models and groud-truth labels.) | ||
- We provide also the source code in Python in the folder ```src```. Users can download the code and use it in their favorite IDE. | ||
- For fast running and reproduce our results, we provide some scripts that users can run in the command-line to reproduce our results easily. | ||
``` | ||
python scripts/FB15K_ET.py # for evaluation on FB15K dataset | ||
python scripts/YAGO_ET.py # for evaluation on YAG43K dataset | ||
``` | ||
## Hyper-parameters | ||
The following are our optimal values for the hyper-parameters used in the experiments: | ||
|
||
``` | ||
epochs = 100 # Maximum number of epochs | ||
patience = 3 # After how many iterations to stop the training | ||
batch_size = 128 # How many sequences in each batch during training | ||
lr = 0.001 # Learning rate of Adam optimizer | ||
Dropout= 0.25 # Dropout rate in the Deep Neural Model | ||
``` | ||
|
||
## Contact | ||
If you have any feedback or suggestions, feel free to send me an email to [email protected] | ||
|
||
## Cite | ||
TBD |
Oops, something went wrong.