GEEER

Code supporting the paper Graph-Embedding Empowered Entity Retrieval

This repository contains resources developed within the following paper:

Graph-Embedding Empowered Entity Retrieval, Emma Gerritse, Faegheh Hasibi and Arjen de Vries

This repository is structured in the following way:

Code/ : Contains the code for computing scores (entity_score.py), a notebook for the visualisation (Embedding_quality.ipynb), and two scripts for scoring (rankscore.sh and ranklib_to_trec.py)
Data/ : Contains the linked entities used and the wikipedia redirects used
Runs/ : Contains all the runs used in the paper

Requirements

Running the code requires Python 3.

Embeddings

If one simply wants to download the embeddings, they can be accessed:

Wikipedia2vec embeddings with graph component, result of the following command:

wikipedia2vec train --min-entity-count 0 --disambi enwiki-20190701-pages-articles-multistream.xml.bz2 wikipedia2vec_trained

and Wikipedia2vec embeddings without graph component , result of the following command:

wikipedia2vec train --min-entity-count 0 --disambi --no-link-graph enwiki-20190701-pages-articles-multistream.xml.bz2 wikipedia2vec_trained

The embeddings can then be loaded in Python with Gensim:

import gensim
model = gensim.models.KeyedVectors.load("WKN-vectors.bin", mmap='r')

Quickstart

To download all the auxilary files (Ranklib, DBpedia Entity V2 and the embeddings), please use the following command:

bash build.sh

To then reproduce the results, first make sure to install all the neccessary packages with:

pip install -r requirements.txt

And then run

bash Code/reproduce.sh

The results will be stored in the folder /Output

Code for computing the embedding based score

To compute just the embedding based score, use the following function:

python Code/entity_score.py embeddingfile outputfile [pathtodbpedia]

python Code/entity_score.py src/WKN-vectors/WKN-vectors.bin output.txt src/DBpedia-Entity/runs/v2/bm25f-ca_v2.run

Code for scoring with Ranklib

If you want to run Ranklib with 5 folds afterwards, use the following function:

python Code/entity_score_folds.py embeddingfile outputfolder outputfile [pathtodbpedia]

So for example

python Code/entity_score_folds.py src/WKN-vectors/WKN-vectors.bin Outputfolder output.txt src/DBpedia-Entity/runs/v2/bm25f-ca_v2.run

To do the coordinate ascent and ranking of these files, please run the following script with the Outputfolder from the previous line:

bash Code/train_ranklib.sh Outputfolder

bash Code/score_ranklib.sh Outputfolder

The first script will train Ranklib, and the second script will score according to Ranklib and will result in the ranking and the trec_eval scores of the ranking.

Cite

@inproceedings{Gerritse:2020:GEEER, 
   author =    {Gerritse, Emma and Hasibi, Faegheh and De Vries, Arjen},
   title =     {Graph-Embedding Empowered Entity Retrieval},
   booktitle={European Conference on Information Retrieval},
   series =    {ECIR '20},
   year =      {2020},
   publisher = {Springer},
}

Contact

If you have any questions, please contact Emma Gerritse at [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GEEER

Requirements

Embeddings

Quickstart

Code for computing the embedding based score

Code for scoring with Ranklib

Cite

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Code		Code
Data		Data
Runs		Runs
README.md		README.md
build.sh		build.sh
requirements.txt		requirements.txt

informagi/GEEER

Folders and files

Latest commit

History

Repository files navigation

GEEER

Requirements

Embeddings

Quickstart

Code for computing the embedding based score

Code for scoring with Ranklib

Cite

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages