What is TaxoRef?

Taxoref is a methodology for Taxonomy Refinement via word embeddings. In order to select the best embedding for the refinement task, several vector models are generted and evaluated on the basis of their ability to represent taxonomic similarity relations. TaxoRef has been implemented in the following article: "TaxoRef: Embeddings Evaluation for AI-drivenTaxonomy Refinement" - ECML 2021

Plese cite TaxoRef as: Malandri, L., Mercorio, F., Mezzanzanica, M., & Nobani, N. TaxoRef: Embeddings Evaluation for AI-driven Taxonomy Refinement. Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2021).

@inproceedings{malandri2021taxoref,
  title={TaxoRef: Embeddings Evaluation for AI-drivenTaxonomy Refinement},
  author={Malandri, Lorenzo and Mercorio, Fabio and Mezzanzanica, Mario and Nobani, Navid},
  booktitle={Joint European Conference on Machine Learning and Knowledge Discovery in Databases},
  year={2021},
  organization={Springer}
}

TaxoRef for Labour Market

TaxoRef is a general tool that can be apllied to any taxonomy. The only pre-requisite is having a text corpus to train word embeddings. In our case, it has been used in the Labour Market domain to propose a refinement of the European Skills, Competences, Qualifications and Occupations (ESCO) taxonomy, training the embeddings on a corpus of 2.2M+ Online Job Vacancies (OJVs) related to the ICT market and collected in UK in 2018.

The TaxoRef framework

Quick start

The notebook TaxoRef.ipynb contains the code, complete and commented. You just need to clone this repository and run the notebook TaxoRef.ipynb.

The embedding generation and selection phase, thorougly described in the article, is not included, because it can be built with any pair corpus-taxonomy following the description in the paper.

Step1: data preprocessing

The vector model and the taxonomic groups, together with their belonging group in ESCO, denominated as IscoGroup, are imported. A new table is created a row for each taxonomic terms in the embeddings vocaboulary. The other columns are, in order, the "IscoGroup", the "word vector" and the "sample" column, a representation that we will use as input for the refinement, obtained concatenating the vector representation of the term with its class (IscoGroup).

Step2: Refinement

We perform the refinement by computing class probability through bayes formula and we produce a table comparing the origin class (the one in ESCo) and the destination one (the one with highest class probability, i.e. the TaxoRef suggestion)

Step3: Example of Refinement

In this sectionwe select a single ESCO group to better observe the refinement suggested by taxoref:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
ft_models		ft_models
README.md		README.md
TaxoRef.ipynb		TaxoRef.ipynb
vectors_2018.vec.zip		vectors_2018.vec.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is TaxoRef?

TaxoRef for Labour Market

The TaxoRef framework

Quick start

Step1: data preprocessing

Step2: Refinement

Step3: Example of Refinement

About

Releases

Packages

Languages

Crisp-Unimib/TaxoRef

Folders and files

Latest commit

History

Repository files navigation

What is TaxoRef?

TaxoRef for Labour Market

The TaxoRef framework

Quick start

Step1: data preprocessing

Step2: Refinement

Step3: Example of Refinement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages