Nerd For Tech
Published in

Nerd For Tech

Introduction to Deep Learning with R

Hands-on image classification using TensorFlow and Keras in R programming for beginners.

Photo by Green Chameleon on Unsplash

Deep learning is a subfield of machine learning that is based on artificial neural network architectures. Deep learning is used in many fields such as natural language processing, computer vision, and bioinformatics.

In short, I’ll cover the following topics:

  • What is deep learning?
  • What is Keras?
  • How to install TensorFlow?
  • How to do deep learning analyses with Keras?

Please don’t forget to follow on my youtube channel where I create content about ai, data science, machine learning, and deep learning. 👇

Let’s get started.

What is Deep Learning?

Deep Learning is a subfield of machine learning inspired by the structure of the human brain. Deep learning tackles complex tasks such as classifying billions of images, recommending the best videos, or learning to beat the world champion at the game of Go.

Modern deep learning often involves tens or even hundreds of successive layers of representations and they’ve all learned automatically from exposure to training data.

Artificial Neural Network

How to Work Deep Learning Networks

A deep learning model is composed of one input layer, two or more hidden layers, and one final layer. The input layer receives input data and passes the inputs to the first hidden layer. The hidden layers perform mathematical computations on inputs.

The “Deep” in Deep Learning refers to having more than one hidden layer. The output layer returns the output data. Each connection between neurons is associated with a weight. This weight shows the importance of the input value. The initial weights are set randomly. Each neuron has an activation function. Once a batch of input data has passed through all the layers of the neural network, it returns the output data through the output layer.

The loss or cost function measures the difference between the actual result and the predicted result. To build a good model, you want your loss function to be close to zero. To find the minimum cost function, you use the gradient descent technique. Gradient descent works by changing the weights in small increments after each data set iteration.

Ok, we saw how to work deep learning. Let’s take a look at Keras API to implement a deep learning model.

What is Keras?

Keras is a high-level Deep Learning API that allows you to easily build, train, evaluate, and execute all sorts of neural networks.

Keras was released as an open-source project in March 2015. Due to its ease of use, flexibility, and beautiful design, it quickly gained popularity. You can use Keras on TensorFlow, Theano, and Mxnet. Note that TensorFlow has Keras API.

How to Install Keras?

When you install TensorFlow, Keras automatically comes to your computer. Let’s install TensorFlow in RStudio. It is so easy to install TensorFlow. Let’s install TensorFlow:


Next let’s load TensorFlow package.


Note that on Windows you need a working installation of Anaconda. Now we can use install_tensorflow method


After a few minutes both TensorFlow and Keras were installed. To check it out, let’s print hello world using TensorFlow.

tf$constant(“Hello TensorFlow”)

If codes run without errors, TensorFlow was installed without any problem. If you would prefer to install the GPU version, you can use the gpu=TRUE argument as shown below.


Let’s take a look at the classification problem using the MNIST dataset.

Classification Problem

To show the classification problem, I’m going to use the classic MNIST data set. MNIST data set contains handwritten digits and consists of 70,000 gray-scale images of 28*28 pixels each with 10 classes. MNIST data set has a set of 60,000 training images and 10,000 test images. The MNIST dataset comes preloaded in Keras. First, I’m going to import Keras.


Now let’s load the MNIST data set.

mnist <- dataset_mnist()

Next, let’s take a look structure of the dataset.

Structure of our data set

Note that the MNIST dataset consists of training and test set. The model is fitted using the train set and is evaluated using the test set. The values of the pixels are integers between 0 and 255. Let me convert them to floats between 0 and 1.

mnist$train$x <- mnist$train$x/255
mnist$test$x <- mnist$test$x/255

It’s time to build the model.


Using the training data you’ll feed the neural network and then you’ll make predictions using the test set. After that, you’ll verify whether these predictions match the labels from the labels of the test set.

To define the model, you can use sequential API or functional API. As it is easy to use, I’m going to use sequential API.


Layers are sorted linearly in the sequential API. Let’s build a Keras model using the sequential API.

model <­ keras_model_sequential() %>%               (1)
layer_flatten(input_shape = c(28, 28)) %>% (2)
layer_dense(units = 128, activation = “relu”) %>% (3)
layer_dropout(0.2) %>% (4)
layer_dense(10, activation = “softmax”) (5)

(1) pipe (%>%) operator is used to add layers to a network. This operator comes from the magrittr package. Using pipe operator makes codes more readable. You can insert pipe operator using the Ctrl+­Shift­+M keyboard shortcut.

(2) I’ve specify input data using the layer_flatten method. In our case, we have images, 28*28 dimensions. Here I’ve converted the input into one dimension.

(3) I’ve added a layer to our network.

(4) You can specify the number of neurons in the layer based on your analysis. Deep learning models tend to overfit. If a model has an overfitting problem, it has difficulty accurately predicting new data. To tackle this problem, you can use the dropout technique. Dropout is one of the most effective and most commonly used regularization techniques for neural networks. Dropout, applied to a layer, consists of randomly dropping out (setting to zero) a number of output features of the layer during training. Here, I’ve used the dropout method with a dropout rate.

(5) Lastly, I’ve added an output layer. As we have 10 classes we can use 10 neurons. For the activation function, I’ve set the softmax function. Don’t forget your network should end with a softmax activation function. In our case, the softmax function returns an array of 10 probability scores. Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

After defining the model, you can see information about layers, number of parameters, etc with the summary function.

Summary of our model


After building the model, you must compile it. When you compile the model you need to define loss, optimizer, and metrics arguments.

network %>% compile(
loss = “sparse_categorical_crossentropy”, (1)
optimizer = “rmsprop”, (2)
metrics = “accuracy” (3)

(1) To handle labels in multiclass classification, I’ve used categorical_crossentropy or sparse_categorical_crossentropy. To encode the labels via one-hot encoding categorical_crossentropy is used and to encode the labels as integers is used sparse_categorical_crossentropy.

(2) Here, I’m going to specify an optimizer. The optimizer determines how learning proceeds. The optimizer uses loss value to update the network’s weights. There are various functions for the optimizer. The rmsprop optimizer is generally a good enough choice, whatever your problem.

(3) Evaluation metrics are used to measure how well our model is. As we handle the classification problem I’ve used the accuracy metric.

Note that evaluation metrics to be used for regression differ from those used for classification. A common regression metric is mean absolute error (MAE).


Now we’re ready to train the network. Let’s call fit() method and we fit the model to training data:

model %>% fit(
x = mnist$train$x, y = mnist$train$y,
epochs = 5, (1)
validation_split = 0.3 (2)

(1) Each iteration over all the training data is called an epoch. I’ve set 5 for epochs.

(2) To adjust hyperparameters of model validation set is used. To do this, you can split the data into a training set and a validation set using the validation_split argument. I’ve set 0.3. So I’ve split our data into 30 percent validation and 70 percent training.

Let’s execute these codes.

Our model training

As you can see, the training loss decreases with every epoch and the training accuracy increase with every epoch. That’s what you would expect when running a gradient descent optimization.


After building a model, you can make predictions with the model using the predict function. The graph of the training was automatically plotted.

The predict method returns a probability distribution over all 10 classes.

predictions <- predict(model, mnist$test$x)

By default predict will return the output of the last Keras layer. Let’s take a look at the first two values of the prediction variable. To do this, let me use the head method.

head(predictions, 2)

Also, you can use the predict_classes method to generate the class.


You can access the model performance on a different dataset using the evaluate function. Let’s evaluate our model on the test set.

model %>% evaluate(mnist$test$x, mnist$test$y)
Model evaluation on the test set

Our model has around 90 % accuracy on the test set.


If you want, you can save your model. To do this, you can use the save_model_tf method.

save_model_tf(object = model, filepath = “model”)

To reload the model, you can use the load_model_tf method.

reloaded_model <- load_model_tf(“model”)

Now you can use the model uploaded.

That’s it. I hope you enjoy this post. Thanks for reading.

Please clap 👏 if you like this blog post. Also, don’t forget to follow us on our Tirendaz Academy YouTube 📺, Twitter 😎, Medium 📚, LinkedIn 👍

See you in the next post …



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store