Authors: Jonathan de Bruin and Frans de Liagre Böhl
This repository contains the material for the workshop "Data Engineering: Clean and Integrate Your Data!". The workshop is part of the Data Science Day of the Utrecht University. The workshop covers the basics of data cleaning and data integration. Examples of data cleaning, imputing and integrating are demonstrated with Open Refine, Python and R. In the workshop, we study the relation between mortality and diseases. This is done by linking a mortality register dataset with a dataset containing medical (diagnosis) records. The datasets are fictitious. A research questions can be: Is there a relation between (surviving) Salmonella infection and the age of death?