ANN Classification with ‘nnet’ Package in R

Rizka Yolanda
5 min readJun 26, 2019

--

Hello! Welcome back young data scientist❤

Artificial Neural Network (ANN) is a network of groups of small processing units that are modeled based on the behavior of human neural networks (Wikipedia). ANN algorithm was born from the idea of a psychologist Warren McCulloch and Walter Pitts in 1943 that describes how the neural network with the electronic network devices. In the world of seismic exploration, ANN algorithms have been quite popularly applied, including noise identification, wavelet estimation, speed analysis, shear wave analysis, autotracking reflector, hydrocarbon prediction, reservoir characterization, etc. In simple terms, the ANN architecture can be described as follows :

The application of ANN with Fishing dataset using the nnet package

Fishing dataset is fishing data in the United States with several variables, including :
• Modes: Recreational modes, there are beach,pier, boat and charter
• Price / Price: Price for the chosen alternative
• Catch / Catch: Catch rate for selected alternatives
• Check: Price for beach mode
• Ppier: Price for pier mode
• Pboat: Price for private motorboat mode
• Pcharter: Prices for rental boat mode
• Cbeach: Catch rate for beach mode
• Cpier: Catch rate for piermode
• Cboat: Catch rate for private boat mode
• Charter: Catch rate for chartered boat mode
• Income: The amount of income

From the data about recreational fishing options, we will use some variables as follows :
1. Mode as an independent variable (Y)
2. Price, Catch, and Income variables as dependet variable (X)

Where the target are four types of modes used as fishing mode, beach (1), boat (2), charter (3), and pier (4).

— — — — — — — — — — Let’s go to the tutorial! — — — — — — — — — —

  1. Sort the data according to the mode in Excel
  2. Package activating and data input
library("nnet")
fishing<- read.csv("/cloud/project/Fishing.csv")
dim(fishing)
> dim(fishing)
[1] 1182 13
## The data consists of 1182 columns and 13 lines.

3. Pre Processing Data

Fisrt of all, we should select the variables we gonna use. In this example we use the mode variable as an independent variable (Y), and the dependent variable is price, catch, and income. Then the variable that has been changed to the data type is entered into the new data frame.

mode<- factor(data$mode)
price <- as.numeric(data$price)
catch<- as.numeric(data$catch)
income<- as.numeric(data$income)
fishing<- data.frame(mode, price, catch, income)

4. Splitting Data Training and Data Testing

Data will be divided into 50% training data and 50% testing data based on the number of each mode where the distribution of training and testing data based on the mode, beach mode was taken 67 of 134 for training data, boat mode was taken 209 of 418 for training data, charter mode was taken 226 from 452 for training data, and mode pier is taken 89 of 178 for training data, the rest is testing data. The data train and test distribution is done randomly, so it is not uncommon for laptops to be different in results.

sampel   <-   c(sample(1:134,67),sample(135:552,209),   sample(553:1004,226), sample(1005:1182,89))sampelfishing.train <-fishing[sampel,] fishing.test<-fishing[-sampel,]

5. ANN Machine with ‘nnet’ Package

nnet package on r can be used to create an ANN to see the accuracy of the model and make predictions on input data which will be classified later.

fishing1<-nnet(mode~.,data=fishing.train,size=5, decay=5e-4, maxit=200)

The ‘~. ‘Command is used to enter all independent variables, i.e. The ‘nnet’ command is run with data train. Size describes the number of nodes that will be used in the hidden layer, in this case 5 nodes are used. Decay illustrates how quickly it decreases in gradient descent. Maxit is the maximum iteration to be carried out, in this case the maximum iteration to be carried out is 200 iterations.

Based on the model above it can be explained that we have structured the ANN network with 3 inputs, 5 hidden layers and 4 outputs with a weight of 44. There are 43variables that must be input, there are price, catch and income in order to produce target output, namely mode. For more details, you can see the network structure in the picture below:

In the plot we can see this network has 3 neurons / inputs from the input layer where each input node represents the variables used, namely price, catch, and income. Then there is one hidden layer which has 5 neurons / nodes, and has 4 nodes in the output layer or 4 outputs, namely beach, boat, charter, and pier. The neurons shown by B1 and B2 above are biases that have a weight on each neuron. The lines connecting each neuron are synapse which has its own weight (seen in the plot).

Classified new data

After getting the model, then we try to make predictions with the model that is obtained by using new data consisting of:
Price = 50.32
Catch = 0.0451
Income = 4583,332
From the new data that is formed we will see whether the choice of recreational mode is fishing.

price <- 50.32catch<- 0.0451income<- 4583.332fishing.baru <- data.frame(price, catch, income) prediksi <- predict(fishing1, fishing.baru)

From the results of the prediction output of the new data that is formed, using R software that is obtained the probability for beach mode is 0.0517 or 5.17%, boat mode probability value is 0.1954 or 10.95%, charter mode has a probability value of 60.79%, and pier mode is 7.31%.

So that it can be concluded that the choice of recreational fishing mode with price = 50.32, catch = 0.0451, and income = 4583,332 entered in charter mode which has the largest probability value of the other modes 0.6797 or 60.79%.

That’s all about ANN Cluster for today, see u on my next post! 😉

--

--