Neural Network on Beer Dataset
Introduction
Artificial neural networks (ANNs), usually simply called neural networks (NNs), are computing systems vaguely inspired by the biological neural networks that constitute animal brains.
An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons. An artificial neuron that receives a signal then processes it and can signal neurons connected to it. The “signal” at a connection is a real number, and the output of each neuron is computed by some non-linear function of the sum of its inputs. The connections are called edges. Neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that threshold. Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the layers multiple times.
Neural networks learn (or are trained) by processing examples, each of which contains a known “input” and “result,” forming probability-weighted associations between the two, which are stored within the data structure of the net itself. The training of a neural network from a given example is usually conducted by determining the difference between the processed output of the network (often a prediction) and a target output. This is the error. The network then adjusts it’s weighted associations according to a learning rule and using this error value. Successive adjustments will cause the neural network to produce output which is increasingly similar to the target output. After a sufficient number of these adjustments the training can be terminated based upon certain criteria. This is known as [[supervised learning]].
Let’s work
Install Packages
packages <- c("xts","zoo","PerformanceAnalytics", "GGally", "ggplot2", "ellipse", "plotly")
newpack = packages[!(packages %in% installed.packages()[,"Package"])]
if(length(newpack)) install.packages(newpack)
a=lapply(packages, library, character.only=TRUE)
Load dataset
beer <- read.csv("MyData.csv")
head(beer)
summary(beer)Clase Color BoilGravity IBU
Length:1000 Min. : 1.99 Min. : 1.0 Min. : 0.00
Class :character 1st Qu.: 5.83 1st Qu.:27.0 1st Qu.: 32.90
Mode :character Median : 7.79 Median :33.0 Median : 47.90
Mean :13.45 Mean :33.8 Mean : 51.97
3rd Qu.:12.57 3rd Qu.:39.0 3rd Qu.: 67.77
Max. :50.00 Max. :90.0 Max. :144.53
ABV
Min. : 2.390
1st Qu.: 5.240
Median : 5.990
Mean : 6.093
3rd Qu.: 6.810
Max. :10.380
Visualization of Iris Data Set
You can also embed plots, for example:
pairs(beer[2:5],
main = "Craft Beer Data -- 5 types",
pch = 21, bg = c("red", "green", "blue", "orange", "yellow"))
library(GGally)
pm <- ggpairs(beer,lower=list(combo=wrap("facethist",
binwidth=0.5)),title="Craft Beer", mapping=aes(color=Clase))
pm
library(PerformanceAnalytics)
chart.Correlation2 <- function (R, histogram = TRUE, method = NULL, ...)
{
x = checkData(R, method = "matrix")
if (is.null(method)) #modified
method = 'pearson'
use.method <- method #added
panel.cor <- function(x, y, digits = 2, prefix = "",
use = "pairwise.complete.obs",
method = use.method, cex.cor, ...)
{ #modified
usr <- par("usr")
on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- cor(x, y, use = use, method = method)
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste(prefix, txt, sep = "")
if (missing(cex.cor))
cex <- 0.8/strwidth(txt)
test <- cor.test(as.numeric(x), as.numeric(y), method = method)
Signif <- symnum(test$p.value, corr = FALSE, na = FALSE,
cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),
symbols = c("***","**", "*", ".", " "))
text(0.5, 0.5, txt, cex = cex * (abs(r) + 0.3)/1.3)
text(0.8, 0.8, Signif, cex = cex, col = 2)
}
f <- function(t)
{
dnorm(t, mean = mean(x), sd = sd.xts(x))
}
dotargs <- list(...)
dotargs$method <- NULL
rm(method)
hist.panel = function(x, ... = NULL)
{
par(new = TRUE)
hist(x, col = "light gray", probability = TRUE, axes = FALSE,
main = "", breaks = "FD")
lines(density(x, na.rm = TRUE), col = "red", lwd = 1)
rug(x)
}
if (histogram)
pairs(x, gap = 0, lower.panel = panel.smooth,
upper.panel = panel.cor, diag.panel = hist.panel)
else pairs(x, gap = 0, lower.panel = panel.smooth, upper.panel = panel.cor)
}
#if method option not set default is 'pearson'
chart.Correlation2(beer[,2:5], histogram=TRUE, pch="21")
library(plotly)
pm <- GGally::ggpairs(beer, aes(color = Clase), lower=list(combo=wrap("facethist",
binwidth=0.5)))
class(pm)
pm
- ‘gg’
- ‘ggmatrix’
Setup and Train the Neural Network for Beer Data
Neural Network emulates how the human brain works by having a network of neurons that are interconnected and sending stimulating signal to each other.
In the Neural Network model, each neuron is equivalent to a logistic regression unit. Neurons are organized in multiple layers where every neuron at layer i connects out to every neuron at layer i+1 and nothing else.
The tuning parameters in Neural network includes the number of hidden layers, number of neurons in each layer, as well as the learning rate.
There are no fixed rules to set these parameters and depends a lot in the problem domain. My default choice is to use a single hidden layer and set the number of neurons to be the same as the input variables. The number of neurons at the output layer depends on how many binary outputs need to be learned. In a classification problem, this is typically the number of possible values at the output category.
The learning happens via an iterative feedback mechanism where the error of training data output is used to adjusted the corresponding weights of input. This adjustment will be propagated back to previous layers and the learning algorithm is known as back-propagation.
library(neuralnet)beer <- beer%>%
select("IBU","ABV","Color","BoilGravity","Clase")
head(beer)
# Binarize the categorical output
beer <- cbind(beer, beer$Clase == 'ALE')
beer <- cbind(beer, beer$Clase == 'IPA')
beer <- cbind(beer, beer$Clase == 'PALE')
beer <- cbind(beer, beer$Clase == 'STOUT')
beer <- cbind(beer, beer$Clase == 'PORTER')
names(beer)[6] <- 'ALE'
names(beer)[7] <- 'IPA'
names(beer)[8] <- 'PALE'
names(beer)[9] <- 'STOUT'
names(beer)[10] <- 'PORTER'
head(beer)
set.seed(101)
beer.train.idx <- sample(x = nrow(beer), size = nrow(beer)*0.5)
beer.train <- beer[beer.train.idx,]
beer.valid <- beer[-beer.train.idx,]
Visulization of the Neural Network on Beer Data
Here is the plot of the Neural network we learn
Neural network is very good at learning non-linear function and also multiple outputs can be learnt at the same time. However, the training time is relatively long and it is also susceptible to local minimum traps. This can be mitigated by doing multiple rounds and pick the best learned model.
nn <- neuralnet(ALE+IPA+PALE+STOUT+PORTER ~ IBU+ABV+Color+BoilGravity, data=beer.train, hidden=c(5))plot(nn, rep = "best")
Result
beer.prediction <- compute(nn, beer.valid[-5:-10])
idx <- apply(beer.prediction$net.result, 1, which.max)
predicted <- c('ALE','IPA', 'PALE', 'STOUT', 'PORTER')[idx]
table(predicted, beer.valid$Clase)predicted ALE IPA PALE PORTER STOUT
ALE 17 3 12 0 0
IPA 1 203 21 0 2
PALE 29 26 84 1 0
STOUT 0 4 0 30 67
Accuracy of model is calculated as follows
((17+203+84+0+67)/nrow(beer.valid))*100
74.2
# nn$result.matrixstr(nn)List of 14
$ call : language neuralnet(formula = ALE + IPA + PALE + STOUT + PORTER ~ IBU + ABV + Color + BoilGravity, data = beer.train, hidden = c(5))
$ response : logi [1:500, 1:5] FALSE FALSE FALSE FALSE FALSE FALSE ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:500] "841" "825" "430" "95" ...
.. ..$ : chr [1:5] "ALE" "IPA" "PALE" "STOUT" ...
$ covariate : num [1:500, 1:4] 62.3 27.1 39 72.3 67.8 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:500] "841" "825" "430" "95" ...
.. ..$ : chr [1:4] "IBU" "ABV" "Color" "BoilGravity"
$ model.list :List of 2
..$ response : chr [1:5] "ALE" "IPA" "PALE" "STOUT" ...
..$ variables: chr [1:4] "IBU" "ABV" "Color" "BoilGravity"
$ err.fct :function (x, y)
..- attr(*, "type")= chr "sse"
$ act.fct :function (x)
..- attr(*, "type")= chr "logistic"
$ linear.output : logi TRUE
$ data :'data.frame': 500 obs. of 10 variables:
..$ IBU : num [1:500] 62.3 27.1 39 72.3 67.8 ...
..$ ABV : num [1:500] 5.9 5.07 6.57 5.7 6.86 5.21 4.22 5.57 5.76 7.76 ...
..$ Color : num [1:500] 5.61 32.07 39.92 9.62 8.29 ...
..$ BoilGravity: int [1:500] 37 25 40 37 31 28 19 27 30 44 ...
..$ Clase : chr [1:500] "IPA" "PORTER" "STOUT" "PALE" ...
..$ ALE : logi [1:500] FALSE FALSE FALSE FALSE FALSE FALSE ...
..$ IPA : logi [1:500] TRUE FALSE FALSE FALSE TRUE FALSE ...
..$ PALE : logi [1:500] FALSE FALSE FALSE TRUE FALSE TRUE ...
..$ STOUT : logi [1:500] FALSE FALSE TRUE FALSE FALSE FALSE ...
..$ PORTER : logi [1:500] FALSE TRUE FALSE FALSE FALSE FALSE ...
$ exclude : NULL
$ net.result :List of 1
..$ : num [1:500, 1:5] 0.00942 0.01859 0.01845 0.00916 0.00478 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:500] "841" "825" "430" "95" ...
.. .. ..$ : NULL
$ weights :List of 1
..$ :List of 2
.. ..$ : num [1:5, 1:5] -10.8295 0.0944 0.9985 -0.1776 0.0445 ...
.. ..$ : num [1:6, 1:5] 0.0576 -0.058 -0.4324 0.4371 -0.0437 ...
$ generalized.weights:List of 1
..$ : num [1:500, 1:20] -0.08239 -0.000124 -0.000822 -0.082905 -0.093232 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:500] "841" "825" "430" "95" ...
.. .. ..$ : NULL
$ startweights :List of 1
..$ :List of 2
.. ..$ : num [1:5, 1:5] -0.5 1.832 -0.329 0.261 -1.112 ...
.. ..$ : num [1:6, 1:5] 0.341 1.107 0.689 0.471 -1.64 ...
$ result.matrix : num [1:58, 1] 8.02e+01 8.76e-03 7.37e+04 -1.08e+01 9.44e-02 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:58] "error" "reached.threshold" "steps" "Intercept.to.1layhid1" ...
.. ..$ : NULL
- attr(*, "class")= chr "nn"beer.net <- neuralnet(ALE+IPA+PALE+STOUT+PORTER ~ IBU+ABV+Color+BoilGravity,
data=beer.train, hidden=c(5), err.fct = "ce",
linear.output = F, lifesign = "minimal",
threshold = 0.1)hidden: 5 thresh: 0.1 rep: 1/1 steps:
86036
error: 431.94881
time: 24.02 secsplot(beer.net, rep="best")
Predicting Result
beer.prediction <- compute(beer.net, beer.valid[-5:-10])
idx <- apply(beer.prediction$net.result, 1, which.max)
predicted <- c('ALE','IPA', 'PALE', 'STOUT', 'PORTER')[idx]
table(predicted, beer.valid$Clase)predicted ALE IPA PALE PORTER STOUT
ALE 26 4 9 0 0
IPA 0 197 30 1 3
PALE 21 33 78 0 0
PORTER 0 1 0 10 6
STOUT 0 1 0 20 60
Accuracy of model is calculated as follows
((26+197+78+10+60)/nrow(beer.valid))*100
74.2
Conclusion
As you can see the accuracy is equal!
I hope it will help you to develop your training.
Never give up!
See you in Linkedin!