GloVe/experimental at master · kaarinaaho/GloVe

History

Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
corpora		corpora
embeddings		embeddings
multi_fold		multi_fold
README.md		README.md
utils.py		utils.py

README.md

GloVe/experimental

Additional functionality added to GloVe repo, to compare embedding initialisations across a range of text corpora. The focus of this project is child-directed speech, and as such many of the corpora of interest are child-oriented. Others (enwik8/text8) are used as a point of comparison. Scripts for processing corpora into an appropriate format for the GloVe algorithm (text pre-processing; one document per line etc) are found in experimental/corpora.

Much of this code exists to run repeated instantiations of the GloVe algorithm on different subsets of corpora, to get estimates of embedding stability across corpus samples. Such sampling and embedding scripts are found in experimental/multi_fold

Comparisons of resultant embeddings to the pre-trained embeddings from Stanford NLP are also key here. These are included as text files in experimental/assets. As we were interested in statistics across early/late acquired concepts differentially, we also use age-of-acquisition data from Frank et al. This is included in experimental/assets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experimental

experimental

README.md