Some easy R-examples for support vector maschines

Holger Aust
Feb 17, 2018 · 4 min read

In this post I will explain the principles of support vector machines (SVM). It may be a bit simplified as SVM is a complex topic with some theory behind.

There is some hype nowadays about SVM as there is with many concepts of maschine learning. Basically SVM is a binary classifier, i.e. it sorts data points in two buckets. We can extend SVM to more buckets by running the algorithm several times on a one-against-one comparison and chose the best fit.

Nothing exciting so far. What is remarkable is how SVM handle non-linearity, i.e. by projecting in a higher-dimensional space where the classification-problem is actually linear.

So, how does SVM do the classification? Basically it identifies the hyperplane, i.e. a plane one dimension below the data space, that separated the two classes best. This means maximizing the distance to the points close to the separation border (that is where the name support vectors comes from). Now this works well for linear separable classes but this prerequisit is not very realistic. To overcome this problem the data space is transformed via some kernel function to a higher dimensional space where linear separability is possible

Enough theory, let the games begin. We will use the R-package e1071 that is an interface to libsvm, a C++ implementation of SVM. Let’s create a 3-dim sample data set, i.e. two numeric dimensions, the third one consisting of the two categories we want to classify.

Here is the R-code:

############################################
# Support vector maschines: Examples
############################################

library(e1071)
library(rpart)

n = 1000
testSize <- 0.33

#create data set
set.seed(1)

df <- data.frame(x=runif(n,-3,3),y=runif(n,-3,3))

#Example 1a: linear split in 1 dimension
df$Class <- as.factor(ifelse(df$x>1,”red”,”blue”))
#Example 1b: linear split in 2 dimensions
df$Class <- as.factor(ifelse(df$x+df$y>1,”red”,”blue”))
#Example 1c: polynomial split in 2 dimensions
df$Class <- as.factor(ifelse(df$x²+df$y²>1,”red”,”blue”))

## split data into a train and test set
index <- 1:nrow(df)
testIndex <- sample(index, trunc(n*testSize))
testSet <- df[testIndex,]
trainSet <- df[-testIndex,]

plot(df$x,df$y,col=as.character(df$Class))

# svm
svm.model <- svm(Class ~x+y, data = trainSet, cost = 100, gamma = 1)
svm.pred <- predict(svm.model, testSet[,-10])
plot(testSet$x,testSet$y,col=as.character(svm.pred))

## compute svm confusion matrix
table(pred = svm.pred, true = testSet$Class)
sum(svm.pred==testSet$Class)/nrow(testSet)

Let’s look at the different examples:

Example 1a: linear split in 1 dimension

df$Class <- as.factor(ifelse(df$x>1,”red”,”blue”))

Example 1a: The created two-dimensional dataset with classes blue and red
Example 1a: Here is the classified test-set

We see that the SVM is very good in this simple classification, the accuracy is ~99%

Example 1b: linear split in 2 dimensions

df$Class <- as.factor(ifelse(df$x+df$y>1,”red”,”blue”))

Example 1b: original dataset
Example 1b: The predicted testset

Example 1c: non-linear split in 2 dimensions (circle)

df$Class <- as.factor(ifelse(df$x²+df$y²>1,”red”,”blue”))

Example 1c: The original dataset
Example 1c: The classified testset

Example 2:

Now let’s get more complex and define a plane that splits the two groups

df$z <- 3*df$x³-2*df$y²-1
df$Class <- as.factor(ifelse(df$z>0,”red”,”blue”))

In 3D the original dataset looks like this
Just looking at x, y and class in 2D
And the classified testset

Here is the R-code for the 3D-plot

library(plot3D)
scatter3D(x = df$x,y = df$y,z = df$z,phi=20,theta=20,bty=”b2")

So this gives a little impression what SVMs are capable of. I hope to provide some realistic setting soon.