What is this repo for?

This repository contains code and data to reproduce the results presented in the paper: Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings (Xypolopoulos et al. 2020). link to arXiv

Method

The number of senses of a given word, or polysemy, is a very subjective notion, which varies widely across annotators and resources.
We propose a novel method to estimate polysemy, based on simple geometry in the contextual embedding space. Our approach is fully unsupervised and purely data-driven, which makes it applicable to any language.
We show through rigorous experiments that our rankings are well correlated (with strong statistical significance) with 6 different rankings derived from famous human-constructed resources such as WordNet, OntoNotes, Oxford, Wikipedia etc., for 6 different standard metrics. We also visualize and analyze the correlation between the human rankings.
A valuable by-product of our method is the ability to sample, at no extra cost, sentences containing different senses of a given word.

External Resources

WordNet
Wikipedia
Ontonotes (requires application)
WordNet-Domains (requires application)
Deep contextualized word embeddings

We used AllenAI's Embeddings from Language Models (ELMo) and specifically the command line tool provided by the authors.

Cite 👍

If you use some of the code in this repository, or simply if you want to refer to our paper, please cite:

@article{xypolopoulos2020unsupervised,
    title={Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings},
    author={Christos Xypolopoulos and Antoine J. -P. Tixier and Michalis Vazirgiannis},
    year={2020},
    eprint={2003.10224},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Xypolopoulos, Christos, Antoine J-P. Tixier, and Michalis Vazirgiannis. "Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings." arXiv preprint arXiv:2003.10224 (2020).

Acknowledgements 👍

We thank Giannis Nikolentzos for helpful discussions about pyramid matching.
This work was supported by the LinTo project.
We gratefully acknowledge the support of NVidia Corporation that donated the Titan V GPU used for this research.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

What is this repo for?

Method

External Resources

Cite 👍

Acknowledgements 👍

Files

README.md

Latest commit

History

README.md

File metadata and controls

What is this repo for?

Method

External Resources

Cite 👍

Acknowledgements 👍