Quick Crisp Code Tricks in R for Machine Learning modelling

Vaishali Saraswat
Analytics Vidhya
Published in
3 min readJun 16, 2020

If you are into data science you will at some point or for most of the time deal with training models.Though R is not everyone’s favourite language it has few really easy ways to train and test models.This article uses recipes and broom packages to build machine learning model.

Problem Statement: To build a multiple ML model with y as response variable and x1,x2….x70 predictors, where the models should be grouped based on variable x1.

Firstly, it is important to perform pre-processing. This would reduce unwanted predictors. As part of pre-processing it is important to deal with missing values,near zero values, skewed data, normalization of data and dealing with any of the linear combos.Doing this individually would not be a good coding!

Here recipes package reduces the code to just few lines of code.

The recipe needs to created and prepared and finally executed using juice command.The output is the data set with the pre-processed data. In addition to above features there are other pre-processing tasks that can be performed using recipe.

According to second part of problem statement there have to be more models that are trained as per the individual groups based on the predictor x1. To achieve this, either take the long road by individually sub-setting the data for grouped values and then modelling individually. But isn’t it too tedious and repetitive?

Here broom comes handy. Broom package has ‘nest’ and ‘map’ functions that can be used together with the mutate function from dplyr package to train the model as per the groups. train() function from caret package is used to train the model. Below code uses SVM algorithm for training data. In similar way with different parameters other algorithms can be used. ‘x1Group’ is the variable that has the categories after grouping x1. ‘finaldata’ is the pre-processed data-set received earlier.

This trains the model. But how to see the model fit? For this simply use below code:

Even predicting the data is an easy task. It works normally like it would do with the caret package. Just the correct model needs to be called. If say there are two groups the model$fit[[1]] is the model for first group. ‘test_x1G1’ is the test data with one of the group category.

Thus, by using broom and recipe huge chunks of repetitive code can be avoided.

Happy Learning!!

--

--

Vaishali Saraswat
Analytics Vidhya

Viewing things from Data & QA lens, sometimes I like to talk in Stats! Based in Cork.