Skip to content

ProspectivePulse/datasharing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

How to share data with a statistician

This is a guide for anyone who needs to share data with a statistician. The target audiences I have in mind are:

  • Scientific collaborators who need statisticians to analyze data for them
  • Students or postdocs in scientific disciplines looking for consulting advice
  • Junior statistics students whose job it is to collate/clean data sets

The goals of this guide are to provide some instruction on the best way to share data to avoid the most common pitfalls and sources of delay in the transition from data collection to data analysis. The Leek group works with a large number of collaborators and the number one source of variation in the speed to results is the status of the data when they arrive at the Leek group. Based on my conversations with other statisticians this is true nearly universally.

My strong feeling is that statisticians should be able to handle the data in whatever state they arrive. It is important to see the raw data, understand the steps in the processing pipeline, and be able to incorporate hidden sources of variability in one's data analysis. On the other hand, for many data types, the processing steps are well documented and standardized. So the work of converting the data from raw form to directly analyzable form can be performed before calling on a statistician. This can dramatically speed the turnaround time, since the statistician doesn't have to work through all the pre-processing steps first.

What you should deliver to the statistician

For maximum speed in the analysis this is the information you should pass to a statistician:

  1. The raw data.
  2. A tidy data set
  3. An explicit and exact recipe you used to go from 1 -> 2

Let's look at each part of the data package you will transfer.

What you should expect from a statistician

About

The Leek group guide to data sharing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published