-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NOTICE: Updates and (non-structural) Changes are Coming #2
Comments
The update provides multiple data cleanining operation to daily reports and time-series data. See #2
I have addressed inconsistencies in JHU's older daily reports that contained both states and counties in However, JHU used to report on various municipalities before committing to reporting according to Note that other countries present similar problems. With Canada, for example, JHU used to report data on municipalities/provinces (e.g., Calgary, AB and Edmonton, AB) before committing to provinces (e.g., Alberta). As with US data, I am keeping the data in a format that records JHU's original intentions. Note that if you want to aggregate data on a provincial level you must combine daily cases from cities like Calgary and Edmonton. |
Edit: These changes are now in effect.
Starting next week (~April 20th) I will be introducing some changes to the daily report CSVs and cleaned data (i.e., CSSE_DailyReports). Most of these changes are intended to address frequently mentioned issues pertaining to CSSEGISandData's COVID-19 data. The changes WILL NOT affect variable names and SHOULD NOT break anyone's code. The guiding philosophy here is to provide an update that addresses obvious issues while ensuring a minimal amount of change to data structure. Incoming changes are documented below as a heads up.
Daily Reports (CSVs):
Active
cases will be recalculated (i.e., Active = Confirmed - Deaths - Recoveries ) to correct for errors and to replace missing values in older daily reports. A sanity check will also ensure that active cases are no fewer than zero; cases where JHU reports negative active cases will be reported as missing values.Country_Region
andProvince_State
such that each location will have a unique name. For example, "Korea, South", and "Republic of Korea" will become "South Korea" across all CSVs.Province_State
such as values referring to provinces and states alongside cities and counties (e.g., "Los Angeles, CA"). For US data these values will be split intoAdmin2
(e.g., "Los Angeles) andProvince_State
(e.g., California).Combined_Key
will be provided that addresses various inconsistencies (e.g., "France" and ",,France").Latitude
andLongitude
will be matched to regions, replacing missing values for older daily reports and ensuring that coordinates are consistent for each region (addressing known issues with countries having conflicting coordinates).FIPS
codes in JHU's Lookup Table will be fixed (to address known issues pertaining to leading zeros) and then mapped to daily reports.Cleaned Data:
The text was updated successfully, but these errors were encountered: