Skip to content
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.

Missing UID because Sakha (Yakuti(y)a) Republic is inconsistently spelled #2603

Closed
sollyucko opened this issue Jun 2, 2020 · 3 comments
Closed

Comments

@sollyucko
Copy link

The location is spelled as "Sakha (Yakutiya) Republic, Russia" in https://github.com/CSSEGISandData/COVID-19/blob/web-data/data/cases.csv, but spelled as "Sakha (Yakutia) Republic, Russia" in https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv. The difference is very minor ("Yakutiya" vs "Yakutia") but it causes the UID lookup to fail and return an empty value, which causes my data processing to fail.

The cases.csv entry was added in commit d9cccd6, and the UID_ISO_FIPS_LookUp_Table.csv entry was introduced in commit b844f57.

Could you please fix one of them? According to Wikipedia, either spelling is acceptable.

Related: #2509

@Lucas-Czarnecki
Copy link

Agreed, this is an issue that should be fixed. More specifically, the issue appears to be with the Lookup Table's Combined_Key, (which refers to Yakutiya as "Sakha (Yakutia) Republic, Russia"). Whereas, the Lookup Table's Province_State refers to the same place differenty (i.e., "Sakha (Yakutiya) Republic"). The way JHU refers to Yakutiya is, however, consistent across the Lookup table as well as the web-data and the daily reports as far as the Province_State variable is concerned.

I've seen this issue occur before and I would wager it will come up again. I build around it by always recreating Combined_Key according to relevant string values.

@CSSEGISandData
Copy link
Owner

Corrected. Apologies for the typo.

@Lucas-Czarnecki
Copy link

Thank you! Y'all are rock stars. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants