-
-
Notifications
You must be signed in to change notification settings - Fork 76
/
README.Rmd
473 lines (391 loc) · 17.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
---
output: github_document
---
```{r, warning=FALSE, message=FALSE,echo=FALSE}
knitr::opts_chunk$set(
dpi = 300,
collapse = TRUE,
warning = FALSE,
message = FALSE,
fig.path = "man/figures/",
fig.width = 8,
fig.height = 5
)
```
# *easystats*: An R Framework for Easy Statistical Modeling, Visualization, and Reporting
<!-- [![publication](https://img.shields.io/badge/Cite-Unpublished-yellow)](https://github.com/easystats/easystats/blob/master/inst/CITATION) -->
[![downloads](https://cranlogs.r-pkg.org/badges/easystats)](https://cran.r-project.org/package=easystats)
[![total](https://cranlogs.r-pkg.org/badges/grand-total/easystats)](https://cranlogs.r-pkg.org/)
## What is *easystats*?
*easystats* is a collection of R packages, which aims to provide a unifying
and consistent framework to tame, discipline, and harness the scary R statistics
and their pesky models.
However, there is not (yet) an *unique* "easystats" way of doing data
analysis. Instead, start with one package and, when you'll face a new
challenge, do check if there is an *easystats* answer for it in other packages.
You will slowly uncover how using them together facilitates your life. And, who
knows, you might even end up using them all.
```{r, echo=FALSE, out.width="433px", fig.align="center"}
knitr::include_graphics("man/figures/logo_wall.png")
```
## Installation
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/easystats)](https://cran.r-project.org/package=easystats) [![insight status badge](https://easystats.r-universe.dev/badges/easystats)](https://easystats.r-universe.dev) [![R-CMD-check](https://github.com/easystats/easystats/workflows/R-CMD-check/badge.svg?branch=main)](https://github.com/easystats/easystats/actions)
Type | Source | Command
---|---|---
Release | CRAN | `install.packages("easystats")`
Development | r-universe | `install.packages("easystats", repos = "https://easystats.r-universe.dev")`
Development | GitHub | `remotes::install_github("easystats/easystats")`
Finally, as *easystats* sometimes depends on some additional packages for specific functions that are not downloaded by default. If you want to benefit from the **full *easystats* experience** without any hiccups, simply run the following:
```{r, eval=FALSE}
easystats::install_suggested()
```
## Citation
To cite the package, run the following command:
```{r, comment=""}
citation("easystats")
```
If you want to do this only for certain packages in the ecosystem,
have a look at this article on how you can do so!
<https://easystats.github.io/easystats/articles/citation.html>
## Getting started
Each *easystats* package has a different scope and purpose. This means your
**best way to start** is to explore and pick the one(s) that you feel might be
useful to you. However, as they are built with a "bigger picture" in mind, you
will realize that using more of them creates a smooth workflow, as these
packages are meant to work together. Ideally, these packages work in unison to
cover all aspects of statistical analysis and data visualization.
- [**report**](https://easystats.github.io/report/): 📜 🎉 Automated statistical
reporting of objects in R
- [**correlation**](https://easystats.github.io/correlation/): 🔗 Your all-in-one
package to run correlations
- [**modelbased**](https://easystats.github.io/modelbased/): 📈 Estimate effects,
group averages and contrasts between groups based on statistical models
- [**bayestestR**](https://easystats.github.io/bayestestR/): 👻 Great for
beginners or experts of Bayesian statistics
- [**effectsize**](https://easystats.github.io/effectsize/): 🐉 Compute, convert,
interpret and work with indices of effect size and standardized parameters
- [**see**](https://easystats.github.io/see/): 🎨 The plotting companion to
create beautiful results visualizations
- [**parameters**](https://easystats.github.io/parameters/): 📊 Obtain a table
containing all information about the parameters of your models
- [**performance**](https://easystats.github.io/performance/): 💪 Models' quality
and performance metrics (R2, ICC, LOO, AIC, BF, ...)
- [**insight**](https://easystats.github.io/insight/): 🔮 For developers, a
package to help you work with different models and packages
- [**datawizard**](https://easystats.github.io/datawizard/): 🧙 Magic potions to clean and transform your data
## Frequently Asked Questions
**How is *easystats* different from the *tidyverse*?**
You've probably already heard about the
[**tidyverse**](https://www.tidyverse.org/), another very popular collection of
packages (*ggplot*, *dplyr*, *tidyr*, ...) that also makes using R easier. So,
should you pick the *tidyverse* or *easystats*? **Pick both!**
Indeed, these two ecosystems have been designed with very different goals in
mind. The *tidyverse* packages are primarily made to create a new R experience,
where data manipulation and exploration is intuitive and consistent. On the
other hand, **easystats** focuses more on the final stretch of the analysis:
understanding and interpreting your results and reporting them in a manuscript
or a report, while following best practices. You can definitely use the
*easystats* functions in a *tidyverse* workflow!
> **easystats + tidyverse =** ❤️
**Can *easystats* be useful to advanced users and/or developers?**
Yes, definitely! **easystats** is built in terms of modules that are general
enough to be used inside other packages. For instance, the *insight* package is
made to easily implement support for post-processing of pretty much all
regression model packages under the sun. We use it in all the *easystats*
packages, but it is also used in other non-easystats packages, such as
[**ggstatsplot**](https://github.com/IndrajeetPatil/ggstatsplot), [**modelsummary**](https://github.com/vincentarelbundock/modelsummary/),
[**ggeffects**](https://github.com/strengejacke/ggeffects), and more.
**So why not in yours**?
Moreover, the *easystats* packages are very lightweight, with a minimal set of
dependencies, which again makes it great if you want to rely on them.
## Documentation
### Websites
Each `easystats` package has a dedicated website.
For example, website for `parameters` is
<https://easystats.github.io/parameters/>.
### Blog
In addition to the websites containing documentation for these packages, you can
also read posts from `easystats` blog:
<https://easystats.github.io/blog/posts/>.
### Other learning resources
In addition to these websites and blog posts, you can also check out the
following presentations and talks to learn more about this ecosystem:
<https://easystats.github.io/easystats/articles/resources.html>
## Dependencies
*easystats* packages are designed to be lightweight, *i.e.*, they don't have any third-party hard dependencies, other than base-R packages or other *easystats* packages!
If you develop R packages, this means that you can safely use *easystats* packages as dependencies in your own packages, without the risk of entering the [dependency hell](https://en.wikipedia.org/wiki/Dependency_hell).
```{r depnetwork, out.width="100%"}
library(deepdep)
plot_dependencies("easystats", depth = 2, show_stamp = FALSE)
```
As we can see, the only exception is the [`{see}`](https://easystats.github.io/see/) package, which is responsible for plotting and creating
figures and relies on `{ggplot2}`, which does have a substantial number of dependencies.
## Usage
```{r, include=TRUE, results="hide", echo=FALSE}
library(zoo)
library(cranlogs)
library(see)
library(datawizard)
# tidverse packages
library(lubridate)
library(ggplot2)
pkgsnms <- easystats::easystats_packages()
# Packages data
downloads <- lapply(pkgsnms, cranlogs::cran_downloads, from = "2019-02-26")
downloads_all <- do.call("rbind", downloads)
# Combine all data
data <- data_rename(downloads_all, "package", "Package")
tmp <- lapply(split(data, data$Package), function(x) {
x$cumulative_count <- cumsum(x$count)
x$average_count <- zoo::rollmean(x$count, k = 7, fill = NA)
x$day_num <- as.numeric(x$date) - min(as.numeric(x$date))
x$day <- weekdays(x$date)
x$month <- months(x$date)
x$quarters <- quarters(x$date)
x$year_month <- paste0(lubridate::year(x$date), "-", months(x$date))
x$month_day <- lubridate::mday(x$date)
as.data.frame(x)
})
data <- do.call(rbind, tmp)
data <- data[order(data$Package, data$date), ]
data$Package <- reorder(data$Package, data$count, sum, decreasing = TRUE)
# Set packages colours
packages_colours <- c(
insight = unname(see::material_colors("orange")),
datawizard = unname(see::material_colors("cyan")),
bayestestR = unname(see::material_colors("pink")),
effectsize = unname(see::pizza_colors("tomato")),
performance = unname(see::material_colors("purple")),
parameters = unname(see::material_colors("red")),
modelbased = unname(see::material_colors("indigo")),
correlation = unname(see::material_colors("amber")),
report = unname(see::material_colors("green")),
see = unname(see::material_colors("blue")),
easystats = unname(see::material_colors("brown"))
)
data$download_label <- NA
monnb <- function(d) {
lt <- as.POSIXlt(as.Date(d, origin = "1900-01-01"))
lt$year * 12 + lt$mon
}
mondf <- function(d1, d2) {
monnb(d2) - monnb(d1)
}
average_monthly_downloads <- lapply(split(data, data$Package), function(d) {
tmp <- d[d$count > 0, ]
months_on_cran <- mondf(tmp$date[1], tmp$date[nrow(tmp)])
if (!length(months_on_cran) || months_on_cran < 1) months_on_cran <- 1
data.frame(
package = d$Package[1],
monthly_downloads = round(sum(tmp$count) / months_on_cran)
)
})
average_monthly_downloads <- do.call(rbind, average_monthly_downloads)
```
### Total downloads
```{r, fig.align='center', echo=FALSE}
# TIME TREND
dl_table <- data
tmp <- lapply(split(dl_table, dl_table$Package), function(x) {
x$`Total Downloads` <- max(x$cumulative_count)
x[1, c("Package", "Total Downloads")]
})
dl_table <- do.call(rbind, tmp)
dl_table <- dl_table[order(dl_table$`Total Downloads`, decreasing = TRUE), ] |>
t() |>
as.data.frame()
dl_table <- dl_table[-1, ]
dl_table[] <- lapply(dl_table, as.numeric)
dl_table$Total <- rowSums(dl_table[1, ])
dl_table <- dl_table[, c(12, 1:11)]
rownames(dl_table) <- NULL
dl_table[] <- lapply(dl_table, format, big.mark = ",")
knitr::kable(dl_table)
```
### Trend
```{r, fig.align='center', echo=FALSE, out.width="100%"}
# TIME TREND - total downloads by month
tmp <- lapply(split(data, data$Package), function(x) {
x[-nrow(x), ]
})
tmp <- do.call(rbind, tmp)
tmp$Package <- relevel(tmp$Package, "bayestestR", names(packages_colours))
out <- data_summary(
tmp,
monthly_downloads = sum(count, na.rm = TRUE),
by = c("year_month", "Package")
)
out$date <- out$year_month
out$date <- gsub("Januar", "01-01", out$date, fixed = TRUE)
out$date <- gsub("Februar", "02-01", out$date, fixed = TRUE)
out$date <- gsub("März", "03-01", out$date, fixed = TRUE)
out$date <- gsub("April", "04-01", out$date, fixed = TRUE)
out$date <- gsub("Mai", "05-01", out$date, fixed = TRUE)
out$date <- gsub("Juni", "06-01", out$date, fixed = TRUE)
out$date <- gsub("Juli", "07-01", out$date, fixed = TRUE)
out$date <- gsub("August", "08-01", out$date, fixed = TRUE)
out$date <- gsub("September", "09-01", out$date, fixed = TRUE)
out$date <- gsub("Oktober", "10-01", out$date, fixed = TRUE)
out$date <- gsub("November", "11-01", out$date, fixed = TRUE)
out$date <- gsub("Dezember", "12-01", out$date, fixed = TRUE)
out$date <- as.Date(out$date)
out |>
ggplot(aes(x = date, color = Package)) +
geom_hline(yintercept = c(25000, 50000, 100000, 150000, 200000), color = "#EEEEEE") +
geom_line(aes(y = monthly_downloads), linewidth = 0.9, alpha = 0.6) +
geom_smooth(aes(y = monthly_downloads),
method = "lm",
linetype = "dotted",
linewidth = 0.75,
se = FALSE
) +
see::theme_modern() +
scale_x_date(
date_breaks = "4 months",
labels = scales::date_format("%Y-%m")
) +
scale_colour_manual(values = packages_colours) +
xlab("") +
ylab("Monthly CRAN Downloads\n") +
coord_cartesian(ylim = c(0, max(out$monthly_downloads) + 1000), expand = FALSE) +
scale_y_sqrt() +
theme(axis.text.x = element_text(angle = 90))
```
```{r, fig.align='center', echo=FALSE, eval=FALSE, out.width="100%"}
# TIME TREND - daily average per month
tmp <- lapply(split(data, data$Package), function(x) {
x[-nrow(x), ]
})
tmp <- do.call(rbind, tmp)
tmp$Package <- relevel(tmp$Package, "bayestestR", names(packages_colours))
out <- data_summary(
tmp,
daily_avg = mean(count, na.rm = TRUE),
by = c("year_month", "Package")
)
out$date <- out$year_month
out$date <- gsub("Januar", "01-01", out$date, fixed = TRUE)
out$date <- gsub("Februar", "02-01", out$date, fixed = TRUE)
out$date <- gsub("März", "03-01", out$date, fixed = TRUE)
out$date <- gsub("April", "04-01", out$date, fixed = TRUE)
out$date <- gsub("Mai", "05-01", out$date, fixed = TRUE)
out$date <- gsub("Juni", "06-01", out$date, fixed = TRUE)
out$date <- gsub("Juli", "07-01", out$date, fixed = TRUE)
out$date <- gsub("August", "08-01", out$date, fixed = TRUE)
out$date <- gsub("September", "09-01", out$date, fixed = TRUE)
out$date <- gsub("Oktober", "10-01", out$date, fixed = TRUE)
out$date <- gsub("November", "11-01", out$date, fixed = TRUE)
out$date <- gsub("Dezember", "12-01", out$date, fixed = TRUE)
out$date <- as.Date(out$date)
out |>
ggplot(aes(x = date, color = Package)) +
geom_hline(yintercept = c(1000, 2000, 3000, 4000, 5000, 6000), color = "#EEEEEE") +
geom_line(aes(y = daily_avg), linewidth = 0.9, alpha = 0.6) +
geom_smooth(aes(y = daily_avg),
method = "lm",
linetype = "dotted",
linewidth = 0.75,
se = FALSE
) +
see::theme_modern() +
scale_x_date(
date_breaks = "4 months",
labels = scales::date_format("%Y-%m")
) +
scale_colour_manual(values = packages_colours) +
xlab("") +
ylab("Daily CRAN Downloads\n") +
coord_cartesian(ylim = c(0, max(out$daily_avg) + 100), expand = FALSE) +
scale_y_sqrt() +
theme(axis.text.x = element_text(angle = 90))
```
```{r, fig.align='center', echo=FALSE, eval=FALSE, out.width="100%"}
# TIME TREND - daily counts
tmp <- lapply(split(data, data$Package), function(x) {
x[-nrow(x), ]
})
tmp <- do.call(rbind, tmp)
tmp$Package <- relevel(tmp$Package, "bayestestR", names(packages_colours))
tmp |>
ggplot(aes(x = date, color = Package)) +
geom_hline(yintercept = c(1000, 2000, 3000, 4000), color = "#EEEEEE") +
# geom_line(aes(y = count), linewidth = 0.5, alpha = 0.1) +
geom_line(aes(y = average_count), linewidth = 0.9, alpha = 0.6) +
geom_smooth(aes(y = count),
method = "lm",
linetype = "dotted",
linewidth = 0.75,
se = FALSE
) +
see::theme_modern() +
scale_x_date(
date_breaks = "4 months",
labels = scales::date_format("%Y-%m")
) +
scale_colour_manual(values = packages_colours) +
xlab("") +
ylab("Daily CRAN Downloads\n") +
coord_cartesian(ylim = c(0, max(data$count) + 100), expand = FALSE) +
scale_y_sqrt() +
theme(axis.text.x = element_text(angle = 90))
```
```{r, fig.align='center', echo=FALSE, eval=FALSE}
### Cumulative downloads
tmp <- lapply(split(data, list(data$Package, data$year_month)), function(x) {
x$download_label <- x$cumulative_count[[1]]
x
})
data <- do.call(rbind, tmp)
data$download_label <- sprintf("%.1fk", data$download_label / 1000)
data$download_label[duplicated(data$download_label)] <- NA
ggplot(
data,
aes(
x = date,
y = cumulative_count,
# label = download_label,
color = Package
)
) +
geom_line(linewidth = 0.75) +
# geom_label_repel(show.legend = FALSE) +
labs(x = "", y = "Cumulative CRAN Downloads", label = NULL) +
see::theme_modern() +
scale_x_date(date_breaks = "1 month", labels = scales::date_format("%Y-%m")) +
scale_colour_manual(values = packages_colours) +
scale_y_sqrt() +
theme(axis.text.x = element_text(angle = 90))
```
<!-- ### Average monthly downloads -->
```{r, fig.align='center', echo=FALSE, eval=FALSE, out.width="100%"}
average_monthly_downloads$package2 <- factor(average_monthly_downloads$package, levels = rev(sort(average_monthly_downloads$package)))
plot_range <- unique(sort(c(1000, 2500, 5000, pretty(average_monthly_downloads$monthly_downloads))))
tmp <- average_monthly_downloads
tmp$package2 <- reorder(tmp$package2, tmp$monthly_downloads, max)
tmp |>
ggplot(aes(x = package2, y = monthly_downloads, colour = package)) +
geom_point(size = 3) +
geom_linerange(mapping = aes(ymin = 0, ymax = monthly_downloads), linewidth = 1) +
see::theme_modern() +
scale_colour_manual(values = packages_colours) +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = NULL, y = "Average monthly downloads", colour = "Package") +
scale_x_discrete(labels = NULL) +
scale_y_sqrt(breaks = plot_range, limits = c(0, max(plot_range))) +
theme(
axis.text.x = element_text(angle = 0),
panel.grid.major.x = element_line(colour = "#aaaaaa", linetype = "dotted")
) +
coord_flip()
```
## Contributing
We are happy to receive bug reports, suggestions, questions, and (most of all)
contributions to fix problems and add features. Pull Requests for contributions are encouraged.
Here are some simple ways in which you can contribute (in the increasing order
of commitment):
- Read and correct any inconsistencies in the documentation
- Raise issues about bugs or wanted features
- Review code
- Add new functionality
## Code of Conduct
Please note that the 'easystats' project is released with a [Contributor Code of Conduct](https://easystats.github.io/easystats/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms.