From the 4th of June onwards, the cases linelist is accessible as a single file via Amazon S3 in parquet format. Prior to the 4th of June, the cases linelist was split into chunks of 500,000 cases each to manage file size - we have ported it over to Amazon S3 to avoid this practice. The S3 URLs function in the same way as GitHub raw endpoints, and can be accessed directly from (for example) a Jupyter Notebook.
All aggregated data on cases is derived from this linelist.
date
: yyyy-mm-dd format; date of casedays_doseN
number of days between the positive sample date and the individual's first/second/third dose (if any); values 0 or less are nulledbrandN
:p
= Pfizer,s
= Sinovac,a
= AstraZeneca,c
= Cansino,m
= Moderna,h
= Sinopharm,j
= Janssen,u
= unverified (pending sync with VMS)import
: binary variable with 1 denoting an imported case and 0 denoting local transmissioncluster
: binary variable with 1 denoting cluster-based transmission and 0 denoting an unlinked casesymptomatic
: binary variable with 1 denoting an individual presenting with symptoms at the point of testingstate
: state of residence, coded as an integer (refer toparam_geo.csv
)district
: district of residence, coded as an integer (refer toparam_geo.csv
)age
: age as an integer, with-1
denoting missing datamale
: binary variable with 1 denoting male and 0 denoting femalemalaysian
: binary variable with 1 denoting Malaysian and 0 denoting non-Malaysiancomorb
: binary variable with 1 denoting that the individual has comorbidities and 0 denoting no comorbidities declared
All aggregated data on deaths is derived from this linelist, which can also be accessed via Amazon S3 in parquet format.
Note: The deaths linelist was released prior to the cases linelist. As such, it is formatted differently, because several optimisations had to be made to reduce the size of the cases linelist (in particular, coding as many things as possible as integers). In order not to break anyone's scripts, we are not changing the original format of the deaths linelist.
date
: yyyy-mm-dd format; date of deathdate_announced
: date on which the death was announced to the public (i.e. registered in the public linelist)date_positive
: date of positive sampledate_doseN
: date of the individual's first/second/third dose (if any)brandN
:p
= Pfizer,s
= Sinovac,a
= AstraZeneca,c
= Cansino,m
= Moderna,h
= Sinopharm,j
= Janssen,u
= unverified (pending sync with VMS)state
: state of residenceage
: age as an integer; note that it is possible for age to be 0, denoting infants less than 6 months oldmale
: binary variable with 1 denoting male and 0 denoting femalebid
: binary variable with 1 denoting brought-in-dead and 0 denoting an inpatient deathmalaysian
: binary variable with 1 denoting Malaysian and 0 denoting non-Malaysiancomorb
: binary variable with 1 denoting that the individual has comorbidities and 0 denoting no comorbidities declared