Skip to content

Testing plural to singular mapping for art thesaurus terms that need OpenRefine reconciliation

License

Notifications You must be signed in to change notification settings

fuzheado/plural-singular-mapping

Repository files navigation

Testing plural to singular mapping for art thesaurus terms that need OpenRefine reconciliation (by Andrew Lih, User:Fuzheado)

In the GLAM domain, popular art vocabularies typically use plural (and/or capitalized) terms, such as "Mice" vs "mouse" or "Tapestries" vs "tapestry."

When these plurals and capped terms are used in OpenRefine and reconciled against Wikidata, the results are pretty unsatisfactory as they tend to erroneously match proper nouns, such as titles of artworks or geographical locations rather than the general term.

As an experiment, I had a museum keyword set in plural form and ran a Python script using CLIPS pattern Python library (https://www.clips.uantwerpen.be/pattern) to change those plurals to singular and lowercase. The results of reconciliation were much improved. Ideally, there would be a module or setting in OpenRefine to automatically make this adjustment for better matching.

This might also have some uses for Wikidata Mix'n'match, where we see a similar problem when we import a data set that is plural and capitalized, and get suboptimal recommended matches.

About

Testing plural to singular mapping for art thesaurus terms that need OpenRefine reconciliation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages