- Andy Teucher @ateucher
- Richie Cotton @richierocks
- Claudia Vitolo @cvitolo
- Jakub Nowosad @Nowosad
- Joe Stachelek @jsta
Most of us are involved in teaching R in some way, and it is always a struggle to find suitable datasets with which to teach, especially across domain expertise. There are many packages that have data, but finding them and knowing what is in them is a struggle due to inadequate documentation.
- Make it easy to discover suitable data
- Write some guidance on documenting data in packages
- Google Doc which describes best practices for documentation.
Checklist of things to document.
Make sure your documentation answers as many of these questions as possible.
- What does the data represent?
- What format is the data in?
- How big is the dataset?
- Where does the come from?
- How has the data been processed?
- What does the data look like?
- How do you analyze the data?
- Where is this data used?
- Is there a paper, or other external resource discussing this dataset?
-
A patch for
usethis::use_readme_rmd()
to display datasets in package README files. -
A flexdashboard with a searchable table that shows metadata on datasets from many CRAN packages. It has information for over 4000 datasets.
-
Installing and loading packages
https://docs.google.com/document/d/1xhJmt0v4p49jpwINNak9N7AMMb5yohTwwNOXH8WzqqQ/edit?usp=sharing
https://twitter.com/rstatsdata
-
Crawl Biocondunctor
-
Examine
inst/extdata
folders
- Use Github URLs to pull geo-location of package maintainers
-
Scoring the quality of data in a package
-
Creating badges to advertise data quality
-
Contact package authors with data quality deficiencies