22 June 2019
Recently I have completed the Business Analysis With R online course focused on applied data and business science with R, which introduced me to a couple of new modelling concepts and approaches. One that especially captured my attention is parsnip
and its attempt to implement a unified modelling and analysis interface (similar to python's scikit-learn
) to seamlessly access several modelling platforms in R.
parsnip
is the brainchild of RStudio's Max Khun (of caret
fame) and Davis Vaughan and forms part of tidymodels
, a growing ensemble of tools to explore and iterate modelling tasks that shares a common philosophy (and a few libraries) with the tidyverse
.
Although there are a number of packages at different stages in their development, I have decided to take tidymodels
"for a spin", so to speak, and create and execute a "tidy" modelling workflow to tackle a classification problem. My aim is to show how easy it is to fit a simple logistic regression in R's glm
and quickly switch to a cross-validated random forest using the ranger
engine by changing only a few lines of code.
For this post in particular I'm focusing on four different libraries from the tidymodels
suite: rsample
for data sampling and cross-validation, recipes
for data preprocessing, parsnip
for model set up and estimation, and yardstick
for model assessment.
You can find the final article on my website
I've also published the article on Towards Data Science