Contains all software and notebooks to reproduce the figures in Wiedermann et al., (2023). Unfortunately, the input data is not yet publicly available. Please see the data availability statement in the aforementioned paper for details. Package structure is based on an earlier version of my personal data science template.
Parse wearable sensors and survey data collected within the Corona Data Donation project (Corona Datenspende, click here for more information). The Corona Data Donation Project is one of the largest citizen science initiatives worldwide. From 2020 to 2022, more than 120,000 German residents donated continuous daily measurements of resting heart rate, physical activity and sleep timing for the advancement of public health research. These data streams were collected passively through a dedicated smartphone app, seamlessly connecting with participants’ fitness trackers and smartwatches. Additionally, participants actively engaged in regular surveys, sharing insights into their health and well-being during the COVID-19 pandemic.
This package analyzes the positive effects of vaccinations against COVID-19 for mitigating long-term effects of an infection with SARS-CoV-2. For this purpose, it parses daily resting hearte rate, step count and sleep duration measured from wearable devices (e.g, Apple Watch, Fitbit, Garmin) and survey data on COVID-19 test results, test dates and vaccination status.
The package requires poetry. Make sure to have it installed and run make install
after cloning this repository to install all dependencies.
.
├── Makefile # setup, download data and run analysis
├── README.md # README file as displayed on github
├── data #
│ └── 00_external # required external input data
│ └── ... #
├── long_covid # package source code to be used in notebooks
│ ├── __init__.py #
│ ├── colors.py # some custom colors
│ ├── compute.py # compute results
│ ├── load_from_db.py # helper functions for connecting to a PostgreSQL database
│ ├── load_raw_data.py # load wearable data from database
│ ├── preprocess.py # data cleaning and preprocessing
│ ├── styling.py # custom styling for figures
│ └── surveydataIO.py # load survey data from database
├── notebooks # notebooks for analysis
│ ├── 1.01-plot_example_timeseries.ipynb #
│ ├── 1.02-plot_average_trajectories.ipynb #
│ ├── ... #
├── output # store output
├── poetry.lock # poetry configurations
├── pyproject.toml #
└── scripts # bash scripts
└── execute_notebooks.sh # run all jupyter notebooks from the command line
After gaining data access you will receive instructions for setting up a VPN. You can then interact with the database by creating a file named .env
in the root of the repository using the following template and filling in your credentials. Do not add this file to your git repository.
HOST =
PORT =
DBNAME =
DBUSER =
PASSWORD =
You can run the entire analysis pipeline using make pipeline
which executes the following commands in order:
make download
which downloads all required input data from the database (make sure you have your credentials set as explained above). Seelong_covid/load_raw_data.py
for details.make preprocess
which performs the necessary preprocessing steps for later analysing the data. This includes dropping apple users with invalid sleep data or certain devices with only few users as well as computing the age of all users from their approximate birth year. Seelong_covid/preprocess.py
for details.make compute
which computes all final data that is required for plotting figures. This includes the computation of all baselines and the corresponding weekly deviations inv vital data. Seelong_covid/compute.py
for details.make output
which executes alljupyter
-notebooks in thenotebooks
folder that create figures for the final paper. Each notebook creates one specific (set of) figure(s). See the content ofnotebooks
for details.
Afterwards all figures that are necessary to reproduce the paper should be places in output
and all corresponding input and processed data can be found in data
.
The package relies on some external that ships with this repository. All such data is found in data/00_external/
. This folder holds one file statistic_id1365_bevoelkerung-deutschlands-nach-relevanten-altersgruppen-2020.xlsx
. It contains the age distribution of the German population as of December 2020 and was downloaded from https://de.statista.com/statistik/daten/studie/1365/umfrage/bevoelkerung-deutschlands-nach-altersgruppen/
.
If one now opens the aforementioned website and downloads the data again one is provided with a newer estimate from December 2022 so that the old file shipped with this repository is no longer officially available. For the sake of reproducibility with the paper we refrain from updating the file to the new version and stick with the estimates from December 2020.