Merge pull request #49 from mayer79/structuring_vignette

Add headers to main vignette
mayer79 · May 26, 2023 · 9453a1e · 9453a1e
2 parents 7059856 + 326ef31
commit 9453a1e
Showing 1 changed file with 12 additions and 6 deletions.
diff --git a/vignettes/missRanger.Rmd b/vignettes/missRanger.Rmd
@@ -71,14 +71,18 @@ irisImputed <- missRanger(irisWithNA, num.trees = 100, verbose = 0)
 head(irisImputed)
 ```
 
+### Predictive mean matching
+
 It worked! Unfortunately, the new values look somewhat unnatural due to different rounding. If we would like to avoid this, we just set the `pmm.k` argument to a positive number. All imputations done during the process are then combined with a predictive mean matching (PMM) step, leading to more natural imputations and improved distributional properties of the resulting values:
 
 ``` {r}
 irisImputed <- missRanger(irisWithNA, pmm.k = 3, num.trees = 100, verbose = 0)
 head(irisImputed)
 ```
 
-Note that `missRanger()` offers a `...` argument to pass options to `ranger()`, e.g. `num.trees` or `min.node.size`. How would we use its "extremely randomized trees" variant with 50 trees?
+### Controlling the random forests
+
+`missRanger()` offers a `...` argument to pass options to `ranger()`, e.g. `num.trees` or `min.node.size`. How would we use its "extremely randomized trees" variant with 50 trees?
 
 ``` {r}
 irisImputed_et <- missRanger(
@@ -93,6 +97,8 @@ head(irisImputed_et)
 
 It is as simple!
 
+### Use in Pipe
+
 {missRanger} also plays well together with the pipe:
 
 ```r
@@ -102,6 +108,8 @@ iris |>
  head()
 ```
 
+### Formula interface
+
 By default `missRanger()` uses all columns in the data set to impute all columns with missings. To override this behaviour, you can use an intuitive formula interface: The left hand side specifies the variables to be imputed (variable names separated by a `+`), while the right hand side lists the variables used for imputation.
 
 ``` {r}
@@ -138,7 +146,7 @@ m <- missRanger(irisWithNA, . ~ 1, verbose = 0)
 head(m)
 ```
 
-## Imputation takes too much time. What can I do?
+### Imputation takes too much time. What can I do?
 
 `missRanger()` is based on iteratively fitting random forests for each variable with missing values. Since the underlying random forest implementation `ranger()` uses 500 trees per default, a huge number of trees might be calculated. For larger data sets, the overall process can take very long.
 
@@ -156,7 +164,7 @@ Here are tweaks to make things faster:
 
 - Use a low `max.iter`, e.g. 1 or 2.
 
-### Examples evaluated on a normal laptop (not run here)
+Evaluated on a normal laptop:
 
 ```r
 library(ggplot2) # for diamonds data
@@ -185,12 +193,10 @@ system.time(
 )
 ```
 
-## Trick: Use `case.weights` to weight down contribution of rows with many missings
+### Trick: Use `case.weights` to weight down contribution of rows with many missings
 
 Using the `case.weights` argument, you can pass case weights to the imputation models. This might be useful to weight down the contribution of rows with many missings.
 
-### Example
-
 ``` {r}
 # Count the number of non-missing values per row
 non_miss <- rowSums(!is.na(irisWithNA))