Skip to content

Commit

Permalink
update the repository for
Browse files Browse the repository at this point in the history
  • Loading branch information
hzahera committed Sep 21, 2021
1 parent 289d8c2 commit 49fa98b
Show file tree
Hide file tree
Showing 18 changed files with 640,508 additions and 2 deletions.
56 changes: 54 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,54 @@
# ASSET
A Semi-supervised Approach for Entity Typing in Knowledge Graphs
## ASSET: A Semi-supervised Approach for Entity Typing in Knowledge Graphs
This repository contains the implementation and dataset of our paper *ASSET: A Sem-supervised Approach for Entity Typing in Knowledge Graphs*.

## Summary
- In this work, we propose a novel approach for KG entity typing in knowledge graphs that leverages semi-supervised learning from massive unlabeled data.
- Our approach follows a teacher-student learning paradigm that allows combining a small amount of labeled data with a large amount of unlabeled data to boost performance: The teacher model annotates the unlabeled data with pseudo labels; then the student is trained on the pseudo-labeled data and a small amount of high-quality labeled data.
- We conduct several experiments on two benchmarking datasets (FB15k-ET and YAGO43k-ET).

## Requirements
```
python 3.6
scikit-learn 0.24
tensorflow 2.0
scikit-multilearn 0.2.0
pickle5 0.0.11
pykeen 1.5.0
```
## Installation
You can install all requirements via ```pip install -r requirements.txt```

## Datasets

* In the `data` folder, you can download the benchmarking datasets `FB15k-ET` and `YAGO43K-ET`.

* For ConnectE embedding, we use the source code from its [Github repository](https://github.com/Adam1679/ConnectE). For further details, we refer users to follow the installation instructions on the packages' websites.

* If you want to experiment with another knowledge graph embedding model. We recommend [Pykeen](https://pykeen.github.io/) or [Graphvite](https://graphvite.io/) for generating embeddings for the datasets.

* Furthermore, we provide the preprocessed files used in our experiments in the ```data/preprocessed files``` folder that can be used directly to evaluate the baselines and our approach.

## How to run
- We provide a jupyter notebook with a description for each dataset in the ```notebook``` folder. First, users should install the required libraries, then locate the data files (e.g., the file of pre-trained embedding models and groud-truth labels.)
- We provide also the source code in Python in the folder ```src```. Users can download the code and use it in their favorite IDE.
- For fast running and reproduce our results, we provide some scripts that users can run in the command-line to reproduce our results easily.
```
python scripts/FB15K_ET.py # for evaluation on FB15K dataset
python scripts/YAGO_ET.py # for evaluation on YAG43K dataset
```
## Hyper-parameters
The following are our optimal values for the hyper-parameters used in the experiments:

```
epochs = 100 # Maximum number of epochs
patience = 3 # After how many iterations to stop the training
batch_size = 128 # How many sequences in each batch during training
lr = 0.001 # Learning rate of Adam optimizer
Dropout= 0.25 # Dropout rate in the Deep Neural Model
```

## Contact
If you have any feedback or suggestions, feel free to send me an email to [email protected]

## Cite
TBD
Loading

0 comments on commit 49fa98b

Please sign in to comment.