Neural Network on Beer Dataset

Introduction

Published in

The Startup

8 min readAug 1, 2020

Artificial neural networks (ANNs), usually simply called neural networks (NNs), are computing systems vaguely inspired by the biological neural networks that constitute animal brains.

An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons. An artificial neuron that receives a signal then processes it and can signal neurons connected to it. The “signal” at a connection is a real number, and the output of each neuron is computed by some non-linear function of the sum of its inputs. The connections are called edges. Neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that threshold. Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the layers multiple times.

Neural networks learn (or are trained) by processing examples, each of which contains a known “input” and “result,” forming probability-weighted associations between the two, which are stored within the data structure of the net itself. The training of a neural network from a given example is usually conducted by determining the difference between the processed output of the network (often a prediction) and a target output. This is the error. The network then adjusts it’s weighted associations according to a learning rule and using this error value. Successive adjustments will cause the neural network to produce output which is increasingly similar to the target output. After a sufficient number of these adjustments the training can be terminated based upon certain criteria. This is known as [[supervised learning]].

Let’s work

Install Packages

packages <- c("xts","zoo","PerformanceAnalytics", "GGally", "ggplot2", "ellipse", "plotly")
newpack  = packages[!(packages %in% installed.packages()[,"Package"])]

if(length(newpack)) install.packages(newpack)
a=lapply(packages, library, character.only=TRUE)

Load dataset

beer <- read.csv("MyData.csv")

head(beer)

summary(beer)Clase               Color        BoilGravity        IBU        
 Length:1000        Min.   : 1.99   Min.   : 1.0   Min.   :  0.00  
 Class :character   1st Qu.: 5.83   1st Qu.:27.0   1st Qu.: 32.90  
 Mode  :character   Median : 7.79   Median :33.0   Median : 47.90  
                    Mean   :13.45   Mean   :33.8   Mean   : 51.97  
                    3rd Qu.:12.57   3rd Qu.:39.0   3rd Qu.: 67.77  
                    Max.   :50.00   Max.   :90.0   Max.   :144.53  
      ABV        
 Min.   : 2.390  
 1st Qu.: 5.240  
 Median : 5.990  
 Mean   : 6.093  
 3rd Qu.: 6.810  
 Max.   :10.380

Visualization of Iris Data Set

You can also embed plots, for example:

pairs(beer[2:5], 
      main = "Craft Beer Data -- 5 types",
      pch = 21, bg = c("red", "green", "blue", "orange", "yellow"))

library(GGally)

pm <- ggpairs(beer,lower=list(combo=wrap("facethist",  
binwidth=0.5)),title="Craft Beer", mapping=aes(color=Clase))
pm

library(PerformanceAnalytics)

chart.Correlation2 <- function (R, histogram = TRUE, method = NULL, ...)
    {
        x = checkData(R, method = "matrix")
        if (is.null(method)) #modified
            method = 'pearson'
  
        use.method <- method #added
        panel.cor <- function(x, y, digits = 2, prefix = "", 
                              use = "pairwise.complete.obs", 
                              method = use.method, cex.cor, ...) 
        { #modified
        usr <- par("usr")
        on.exit(par(usr))
        par(usr = c(0, 1, 0, 1))
        r <- cor(x, y, use = use, method = method)
        txt <- format(c(r, 0.123456789), digits = digits)[1]
        txt <- paste(prefix, txt, sep = "")
        if (missing(cex.cor)) 
            cex <- 0.8/strwidth(txt)
        test <- cor.test(as.numeric(x), as.numeric(y), method = method)
        Signif <- symnum(test$p.value, corr = FALSE, na = FALSE, 
                         cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),
                         symbols = c("***","**", "*", ".", " "))
        text(0.5, 0.5, txt, cex = cex * (abs(r) + 0.3)/1.3)
        text(0.8, 0.8, Signif, cex = cex, col = 2)
        }
    f <- function(t)
        {
        dnorm(t, mean = mean(x), sd = sd.xts(x))
        }
    dotargs <- list(...)
    dotargs$method <- NULL
    rm(method)
    hist.panel = function(x, ... = NULL) 
        {
        par(new = TRUE)
        hist(x, col = "light gray", probability = TRUE, axes = FALSE, 
             main = "", breaks = "FD")
        lines(density(x, na.rm = TRUE), col = "red", lwd = 1)
        rug(x)
        }
    if (histogram) 
        pairs(x, gap = 0, lower.panel = panel.smooth, 
              upper.panel = panel.cor, diag.panel = hist.panel)
    else pairs(x, gap = 0, lower.panel = panel.smooth, upper.panel = panel.cor)
    }

#if method option not set default is 'pearson'
chart.Correlation2(beer[,2:5], histogram=TRUE, pch="21")

library(plotly)
pm <- GGally::ggpairs(beer, aes(color = Clase), lower=list(combo=wrap("facethist",  
binwidth=0.5)))
class(pm)
pm

‘gg’
‘ggmatrix’

Setup and Train the Neural Network for Beer Data

Neural Network emulates how the human brain works by having a network of neurons that are interconnected and sending stimulating signal to each other.

In the Neural Network model, each neuron is equivalent to a logistic regression unit. Neurons are organized in multiple layers where every neuron at layer i connects out to every neuron at layer i+1 and nothing else.

The tuning parameters in Neural network includes the number of hidden layers, number of neurons in each layer, as well as the learning rate.

There are no fixed rules to set these parameters and depends a lot in the problem domain. My default choice is to use a single hidden layer and set the number of neurons to be the same as the input variables. The number of neurons at the output layer depends on how many binary outputs need to be learned. In a classification problem, this is typically the number of possible values at the output category.

The learning happens via an iterative feedback mechanism where the error of training data output is used to adjusted the corresponding weights of input. This adjustment will be propagated back to previous layers and the learning algorithm is known as back-propagation.

library(neuralnet)beer <- beer%>%
    select("IBU","ABV","Color","BoilGravity","Clase")
head(beer)

# Binarize the categorical output
beer <- cbind(beer, beer$Clase == 'ALE')
beer <- cbind(beer, beer$Clase == 'IPA')
beer <- cbind(beer, beer$Clase == 'PALE')
beer <- cbind(beer, beer$Clase == 'STOUT')
beer <- cbind(beer, beer$Clase == 'PORTER')

names(beer)[6] <- 'ALE'
names(beer)[7] <- 'IPA'
names(beer)[8] <- 'PALE'
names(beer)[9] <- 'STOUT'
names(beer)[10] <- 'PORTER'

head(beer)

set.seed(101)
beer.train.idx <- sample(x = nrow(beer), size = nrow(beer)*0.5)
beer.train <- beer[beer.train.idx,]
beer.valid <- beer[-beer.train.idx,]

Visulization of the Neural Network on Beer Data

Here is the plot of the Neural network we learn

Neural network is very good at learning non-linear function and also multiple outputs can be learnt at the same time. However, the training time is relatively long and it is also susceptible to local minimum traps. This can be mitigated by doing multiple rounds and pick the best learned model.

nn <- neuralnet(ALE+IPA+PALE+STOUT+PORTER ~ IBU+ABV+Color+BoilGravity, data=beer.train, hidden=c(5))plot(nn, rep = "best")

Result

beer.prediction <- compute(nn, beer.valid[-5:-10])
idx <- apply(beer.prediction$net.result, 1, which.max)
predicted <- c('ALE','IPA', 'PALE', 'STOUT', 'PORTER')[idx]
table(predicted, beer.valid$Clase)predicted ALE IPA PALE PORTER STOUT
    ALE    17   3   12      0     0
    IPA     1 203   21      0     2
    PALE   29  26   84      1     0
    STOUT   0   4    0     30    67

Accuracy of model is calculated as follows

((17+203+84+0+67)/nrow(beer.valid))*100

74.2

# nn$result.matrixstr(nn)List of 14
 $ call               : language neuralnet(formula = ALE + IPA + PALE + STOUT + PORTER ~ IBU + ABV + Color +      BoilGravity, data = beer.train, hidden = c(5))
 $ response           : logi [1:500, 1:5] FALSE FALSE FALSE FALSE FALSE FALSE ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:500] "841" "825" "430" "95" ...
  .. ..$ : chr [1:5] "ALE" "IPA" "PALE" "STOUT" ...
 $ covariate          : num [1:500, 1:4] 62.3 27.1 39 72.3 67.8 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:500] "841" "825" "430" "95" ...
  .. ..$ : chr [1:4] "IBU" "ABV" "Color" "BoilGravity"
 $ model.list         :List of 2
  ..$ response : chr [1:5] "ALE" "IPA" "PALE" "STOUT" ...
  ..$ variables: chr [1:4] "IBU" "ABV" "Color" "BoilGravity"
 $ err.fct            :function (x, y)  
  ..- attr(*, "type")= chr "sse"
 $ act.fct            :function (x)  
  ..- attr(*, "type")= chr "logistic"
 $ linear.output      : logi TRUE
 $ data               :'data.frame':	500 obs. of  10 variables:
  ..$ IBU        : num [1:500] 62.3 27.1 39 72.3 67.8 ...
  ..$ ABV        : num [1:500] 5.9 5.07 6.57 5.7 6.86 5.21 4.22 5.57 5.76 7.76 ...
  ..$ Color      : num [1:500] 5.61 32.07 39.92 9.62 8.29 ...
  ..$ BoilGravity: int [1:500] 37 25 40 37 31 28 19 27 30 44 ...
  ..$ Clase      : chr [1:500] "IPA" "PORTER" "STOUT" "PALE" ...
  ..$ ALE        : logi [1:500] FALSE FALSE FALSE FALSE FALSE FALSE ...
  ..$ IPA        : logi [1:500] TRUE FALSE FALSE FALSE TRUE FALSE ...
  ..$ PALE       : logi [1:500] FALSE FALSE FALSE TRUE FALSE TRUE ...
  ..$ STOUT      : logi [1:500] FALSE FALSE TRUE FALSE FALSE FALSE ...
  ..$ PORTER     : logi [1:500] FALSE TRUE FALSE FALSE FALSE FALSE ...
 $ exclude            : NULL
 $ net.result         :List of 1
  ..$ : num [1:500, 1:5] 0.00942 0.01859 0.01845 0.00916 0.00478 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:500] "841" "825" "430" "95" ...
  .. .. ..$ : NULL
 $ weights            :List of 1
  ..$ :List of 2
  .. ..$ : num [1:5, 1:5] -10.8295 0.0944 0.9985 -0.1776 0.0445 ...
  .. ..$ : num [1:6, 1:5] 0.0576 -0.058 -0.4324 0.4371 -0.0437 ...
 $ generalized.weights:List of 1
  ..$ : num [1:500, 1:20] -0.08239 -0.000124 -0.000822 -0.082905 -0.093232 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:500] "841" "825" "430" "95" ...
  .. .. ..$ : NULL
 $ startweights       :List of 1
  ..$ :List of 2
  .. ..$ : num [1:5, 1:5] -0.5 1.832 -0.329 0.261 -1.112 ...
  .. ..$ : num [1:6, 1:5] 0.341 1.107 0.689 0.471 -1.64 ...
 $ result.matrix      : num [1:58, 1] 8.02e+01 8.76e-03 7.37e+04 -1.08e+01 9.44e-02 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:58] "error" "reached.threshold" "steps" "Intercept.to.1layhid1" ...
  .. ..$ : NULL
 - attr(*, "class")= chr "nn"beer.net <- neuralnet(ALE+IPA+PALE+STOUT+PORTER ~ IBU+ABV+Color+BoilGravity, 
                      data=beer.train, hidden=c(5),  err.fct = "ce", 
                      linear.output = F, lifesign = "minimal", 
                      threshold = 0.1)hidden: 5    thresh: 0.1    rep: 1/1    steps: 
  86036
	error: 431.94881
	time: 24.02 secsplot(beer.net, rep="best")

Predicting Result

beer.prediction <- compute(beer.net, beer.valid[-5:-10])
idx <- apply(beer.prediction$net.result, 1, which.max)
predicted <- c('ALE','IPA', 'PALE', 'STOUT', 'PORTER')[idx]
table(predicted, beer.valid$Clase)predicted ALE IPA PALE PORTER STOUT
   ALE     26   4    9      0     0
   IPA      0 197   30      1     3
   PALE    21  33   78      0     0
   PORTER   0   1    0     10     6
   STOUT    0   1    0     20    60

Accuracy of model is calculated as follows

((26+197+78+10+60)/nrow(beer.valid))*100

74.2

Conclusion

As you can see the accuracy is equal!

I hope it will help you to develop your training.

Never give up!

See you in Linkedin!