The optimality of word lengths

This is the supporting repository of two articles, which are the product of the master level course Introduction to Quantitative Linguistics (IQL) at Universitat Politècnica de Catalunya (spring semester, 2022). Specifically:

Direct and indirect evidence of compression of word lengths. Zipf's law of abbreviation revisited (arXiv:2303.10128)
The optimality of word lengths. Theoretical foundations and an empirical study (arXiv:2208.10384)

Authors

Sonia Petrini
Antoni Casas-i-Muñoz
Jordi Cluet-i-Martinell
Mengxue Wang
Christian Bentz
Ramon Ferrer-i-Cancho

Repository organization

The repository contains the following folders:

code: all the R and Python code developed to preprocess and analyze the data (running R code requires being located in the parent directory)
data: Common Voice Forced Alignments and Parallel Universal Dependencies datasets, both filtered (filtered subfolder) and not filtered (non_filtered subfolder) as described in the paper. The other subfolder contains other material used throughout the project
figures: figures produced for the paper, both using the filtered data (filtered subfolder) and the non-filtered data (non_filtered subfolder)
latex_tables: latex tables produced for the paper, both using the filtered data (filtered subfolder) and the non-filtered data (non_filtered subfolder)
results: csv files obtained from the analysis, both using the filtered data (filtered subfolder) and the non-filtered data (non_filtered subfolder)

Branches

The two branches are related to the first and the second article respectively. The data for pud differs slightly between branches, as we improved its preprocessing after the publication of the first article. However, the changes are minimal, only concern few languages, and do not impact the qualitative results.

Notes

Throughout the whole repository pud stands for the Parallel Universal Dependencies collection and cv stands for the Common Voice Forced Alignments collection.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
code		code
data		data
figures		figures
latex_tables		latex_tables
results		results
.RData		.RData
.Rhistory		.Rhistory
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The optimality of word lengths

Authors

Repository organization

Branches

Notes

About

Releases

Packages

Contributors 7

Languages

IQL-course/IQL-Research-Project-21-22

Folders and files

Latest commit

History

Repository files navigation

The optimality of word lengths

Authors

Repository organization

Branches

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages