Using Learning Vector Quantization for Classification in R

Akshit Singh
Jun 20, 2019 · 3 min read

The Learning Vector Quantization algorithm (LVQ) is an artificial neural network algorithm that lets you choose how many training instances you wish to work with and learns exactly what those instances should look like.

In this article,we’ll look at the following key points:


  • The representation used by the LVQ algorithm that you actually save to a file.
  • The procedure that you can use to make predictions with a learned LVQ model.
  • How to learn an LVQ model from training data.

LVQ Model Representation

LVQ is best understood as a classification algorithm. It supports both binary and multi-class classification problems.

The representation for LVQ is a collection of codebook vectors. LVQ model creates codebook vectors by learning training dataset. Codebook vectors represent class regions. For example, if your problem is a binary classification with classes 0 and 1, and the inputs Car insurance, Health insurance & Home insurance, then a codebook vector would be comprised of all four attributes: Car insurance, Health insurance & Home insurance and class.

The model representation is a fixed pool of codebook vectors that look like training instances, but the values of each attribute have been adapted based on the learning procedure.

LVQ

Building an LVQ Model

Install the ‘class’ package : ‘class’ library provides a required function for this classification. There are modified versions of LVQ function such as lvq1(), olvq1(), lvq2(), and lvq3(). We use olqv1(), optimized LVQ function in this tutorial.

library(class)
library(caret)
//Preparing Dataset.seed(88)
n = 10000
a = sample(1:10, n, replace = T)
b = sample(10:20, n, replace = T)
f = ifelse(a > 5 & b > 10, "red",
ifelse(a < 3 | b < 4, "yellow", "green"))

df = data.frame(a = a, b = b, flag = as.factor(f))
head(df)a b flag
1 3 13 green
2 8 13 red
3 5 19 green
4 9 13 red
5 10 11 red
6 1 13 yellow
//Splitting Data in Training & Test Set.set.seed(88)
split<-(df$flag,SplitRatio=0.8)
train_d<-subset(df,split==TRUE)
test_d<-subset(df,split==FALSE)
//Convert Split Datasets into a matrix type. train = data.matrix(train_d[, c("a","b")])
test = data.matrix(test_d[, c("a","b")])

train_label = factor(train_d[, "flag"])
test_label = test_d$flag
//Building a codebook for LVQcodeBook = lvqinit(train, train_label, size = 100)

olvq1() represents the training set in a codebook.

buildCodeBook = olvq1(train, train_label, codeBook)//Prediction phasepredict = lvqtest(buildCodeBook, test)

Now follow the common practice of Creating Confusion Matrix to check the Accuracy

confusionMatrix(test_label, predict)
Confusion Matrix and Statistics

Reference
Prediction green red yellow
green 703 0 0
red 0 896 0
yellow 0 0 399

Overall Statistics

Accuracy : 1
95% CI : (0.9982, 1)
No Information Rate : 0.4484
P-Value [Acc > NIR] : < 2.2e-16

Kappa : 1
Mcnemar's Test P-Value : NA

Statistics by Class:

Class: green Class: red Class: yellow
Sensitivity 1.0000 1.0000 1.0000
Specificity 1.0000 1.0000 1.0000
Pos Pred Value 1.0000 1.0000 1.0000
Neg Pred Value 1.0000 1.0000 1.0000
Prevalence 0.3519 0.4484 0.1997
Detection Rate 0.3519 0.4484 0.1997
Detection Prevalence 0.3519 0.4484 0.1997
Balanced Accuracy 1.0000 1.0000 1.0000

In this brief article we worked on how to classify data in R using LVQ & i hope you found it useful.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade