forked from bcaffo/courses
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
134 lines (90 loc) · 3.84 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title : Organizing a Data Analysis
subtitle :
author : Roger D. Peng, Associate Professor of Biostatistics
job : Johns Hopkins Bloomberg School of Public Health
logo : bloomberg_shield.png
framework : io2012 # {io2012, html5slides, shower, dzslides, ...}
highlighter : highlight.js # {highlight.js, prettify, highlight}
hitheme : zenburn #
url:
lib: ../../libraries
assets: ../../assets
widgets : [mathjax] # {mathjax, quiz, bootstrap}
mode : selfcontained # {standalone, draft}
---
## Data analysis files
* Data
* Raw data
* Processed data
* Figures
* Exploratory figures
* Final figures
* R code
* Raw / unused scripts
* Final scripts
* R Markdown files
* Text
* README files
* Text of analysis / report
---
## Raw Data
<img class=center src=../../assets/img/medicalrecord.png height='400'/>
* Should be stored in your analysis folder
* If accessed from the web, include url, description, and date accessed in README
---
## Processed data
<img class=center src=../../assets/img/excel.png height='400'/>
* Processed data should be named so it is easy to see which script generated the data.
* The processing script - processed data mapping should occur in the README
* Processed data should be [tidy](http:https://vita.had.co.nz/papers/tidy-data.pdf)
---
## Exploratory figures
<img class=center src=../../assets/img/example10.png height='400'/>
* Figures made during the course of your analysis, not necessarily part of your final report.
* They do not need to be "pretty"
---
## Final Figures
<img class=center src=../../assets/img/figure1final.png height='400'/>
* Usually a small subset of the original figures
* Axes/colors set to make the figure clear
* Possibly multiple panels
---
## Raw scripts
<img class=center src=../../assets/img/rawcode.png height='350'/>
* May be less commented (but comments help you!)
* May be multiple versions
* May include analyses that are later discarded
---
## Final scripts
<img class=center src=../../assets/img/finalscript2.png height='300'/>
* Clearly commented
* Small comments liberally - what, when, why, how
* Bigger commented blocks for whole sections
* Include processing details
* Only analyses that appear in the final write-up
---
## R markdown files
<img class=center src=../../assets/img/rmd.png height='400'/>
* [R markdown](http:https://www.rstudio.com/ide/docs/authoring/using_markdown) files can be used to generate reproducible reports
* Text and R code are integrated
* Very easy to create in [Rstudio](http:https://www.rstudio.com/)
---
## Readme files
<img class=center src=../../assets/img/readme.png height='400'/>
* Not necessary if you use R markdown
* Should contain step-by-step instructions for analysis
* Here is an example [https://github.com/jtleek/swfdr/blob/master/README](https://github.com/jtleek/swfdr/blob/master/README)
---
## Text of the document
<img class=center src=../../assets/img/swfdr.png height='350'/>
* It should include a title, introduction (motivation), methods (statistics you used), results (including measures of uncertainty), and conclusions (including potential problems)
* It should tell a story
* _It should not include every analysis you performed_
* References should be included for statistical methods
---
## Further resources
* Information about a non-reproducible study that led to cancer patients being mistreated: [The Duke Saga Starter Set](http:https://simplystatistics.org/2012/02/27/the-duke-saga-starter-set/)
* [Reproducible research and Biostatistics](http:https://biostatistics.oxfordjournals.org/content/10/3/405.full)
* [Managing a statistical analysis project guidelines and best practices](http:https://www.r-statistics.com/2010/09/managing-a-statistical-analysis-project-guidelines-and-best-practices/)
* [Project template](http:https://projecttemplate.net/) - a pre-organized set of files for data analysis