-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give usage stats for typical properties and sort them #139
Comments
"Typical" in SQID really means something quite different from absolute counts. It shows properties that are significantly more relevant to a class of things than to other things on Wikidata. For example, Freebase ID is one of the most common properties overall, and across all classes, but it is not particularly typical for anything. SQID orders properties by "average typicality" (across all classes), which is -- I agree -- not the best approach since it shuffles properties from page to page. To give an example, the most typical properties for "lighthouses" are "light characteristic of lighthouse", "Admiralty number", and "focal height" (https://tools.wmflabs.org/sqid/#/view?id=Q39715) but these are surely not the most frequent (supposedly, every single lighthouse has coordinates). So it does work well, but is not the best heuristic for ordering. In fact, I think ordering should be more manually controlled still, e.g., you want birth and death to end up close to one another and in some fixed order, but I don't think mere statistics would ever achieve this. To answer to the actual issue report: sorting typical properties by usage would put things first that are not "typical" at all, and would make the same properties be the top ranking ones across large parts of data. |
Thanks for the detailed explanation what "typical" actually means in SQID. But what's the use case for this information? Given the "typical" properties one could infer what class an item without P31 statement best belongs to (duck-typing). My use case is creation or extension of items with a known class. If an editor curates an item of a lighthouse he/she should first know that almost every lighthouse has a country and a coordinate. I'd propose to rename "typical properties" to "distinguishing" or "designating" properties and introduce the most used properties as "typical properties" or "frequent properties". |
The "typical properties" would be more helpful if sorted by number of usage. What percentage of instances actually use these properties? See the SPARQL query in this thread to find out: https://twitter.com/fagerving/status/1068229258491846656
The text was updated successfully, but these errors were encountered: