[Tensorflow] Ch4: Support Vector Machines

PJ Wang
CS Note
Published in
7 min readApr 19, 2018

Previous

We introduce how to useVariable , Constant , Placeholder , and the operations of tensor. Finally, discuss theActivation functions.

[Tensorflow] CH1: Getting Started With Tensorflow

The Graph , Loss , Optimizer , Regression , Classificationwas discussed as link below.

[Tensorflow] Ch2: The Tensorflow Way

Discussed the different methods of Linear regression as like Matrix Inverse Method , Decopmosition Method , Deming Regression , Lasso and Ridge Regression , Elastic Net Regression , and Logistic Regression.

[Tensorflow] Ch3: Linear Regression

What is Support Vector Machines (SVM)?

Support vector machine(SVM) is supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.

SVM introduction

Computing the (soft-margin) SVM classifier amounts to minimizing an expression of the form.

SVM Expression

Working with a Linear SVM

Create the Graph and data.

sess = tf.Session()
iris = datasets.load_iris()
x_vals = np.array([[x[0], x[3]] for x in iris.data])
y_vals = np.array([1 if y==0 else -1 for y in iris.target])
x_vals

Build the setosa data.

setosa_x = [d[1] for i,d in enumerate(x_vals) if y_vals[i]==1]
setosa_y = [d[0] for i,d in enumerate(x_vals) if y_vals[i]==1]
not_setosa_x = [d[1] for i,d in enumerate(x_vals) if y_vals[i]==-1]
not_setosa_y = [d[0] for i,d in enumerate(x_vals) if y_vals[i]==-1]

Split the training data and testing data.

train_indices = np.random.choice(len(x_vals), round(len(x_vals)*0.8), replace=False)
test_indices = np.array(list(set(range(len(x_vals))) - set(train_indices)))
x_vals_train = x_vals[train_indices]
x_vals_test = x_vals[test_indices]
y_vals_train = y_vals[train_indices]
y_vals_test = y_vals[test_indices]

Output the shape of data.

x_vals_train:(120, 2), x_vals_test:(30, 2), y_vals_train:(120,), y_vals_test:(30,)

Create the placeholder and Variable .

batch_size = 100
x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
A = tf.Variable(tf.random_normal(shape=[2,1]))
b = tf.Variable(tf.random_normal(shape=[1,1]))

Declare the model output.

model_output = tf.subtract(tf.matmul(x_data, A), b)

Declare the necessary components for the maximum margin loss.

l2_norm = tf.reduce_sum(tf.square(A))
alpha = tf.constant([0.1])
classification_term = tf.reduce_mean(tf.maximum(0., tf.subtract(1.,tf.multiply(model_output, y_target))))
loss = tf.add(classification_term, tf.multiply(alpha, l2_norm))

Declare the prediction and accuracy functions.

prediction = tf.sign(model_output)
accuracy = tf.reduce_mean(tf.cast(tf.equal(prediction, y_target),
tf.float32))

Declare the optimizer.

my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)
init = tf.initialize_all_variables()
sess.run(init)

Train.

loss_vec = []
train_accuracy = []
test_accuracy = []
for i in range(500):
rand_index = np.random.choice(len(x_vals_train), size=batch_size)
X = x_vals_train[rand_index]
Y = np.transpose([y_vals_train[rand_index]])
sess.run(train_step, feed_dict={x_data: X, y_target: Y})
temp_loss = sess.run(loss, feed_dict={x_data: X, y_target: Y})
loss_vec.append(temp_loss)
train_acc_temp = sess.run(accuracy, feed_dict={x_data: x_vals_train, y_target: np.transpose([y_vals_train])})
train_accuracy.append(train_acc_temp)
test_acc_temp = sess.run(accuracy, feed_dict={x_data: x_vals_test, y_target: np.transpose([y_vals_test])})
test_accuracy.append(test_acc_temp)

Output, 500 epochs

Output: 5000 epochs

Reduction to Linear Regression

Support vector machines can be used to t linear regression. The loss function will similar to

ε is half of the width of the margin.

Create the Graph and data.

sess = tf.Session()
iris = datasets.load_iris()
x_vals = np.array([x[3] for x in iris.data])
y_vals = np.array([y[0] for y in iris.data])
train_indices = np.random.choice(len(x_vals), round(len(x_vals)*0.8), replace=False)
test_indices = np.array(list(set(range(len(x_vals))) - set(train_indices)))
x_vals_train = x_vals[train_indices]
x_vals_test = x_vals[test_indices]
y_vals_train = y_vals[train_indices]
y_vals_test = y_vals[test_indices]
x_vals and y_vals

Declare the batch_size , placeholder , and Variable .

batch_size = 50
x_data = tf.placeholder(shape=[None, 1], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
A = tf.Variable(tf.random_normal(shape=[1,1]))
b = tf.Variable(tf.random_normal(shape=[1,1]))
model_output = tf.add(tf.matmul(x_data, A), b)

Create the epsilon and set 0.5.

epsilon = tf.constant([0.5])
loss = tf.reduce_mean(tf.maximum(0., tf.subtract(tf.abs(tf.subtract(model_
output, y_target)), epsilon)))

Declare the optimizer.

my_opt = tf.train.GradientDescentOptimizer(0.075)
train_step = my_opt.minimize(loss)
init = tf.initialize_all_variables()
sess.run(init)

Train.

train_loss = []
test_loss = []
for i in range(200):
rand_index = np.random.choice(len(x_vals_train), size=batch_
size)
X = np.transpose([x_vals_train[rand_index]])
Y = np.transpose([y_vals_train[rand_index]])
sess.run(train_step, feed_dict={x_data: X, y_target: Y})
temp_train_loss = sess.run(loss, feed_dict={x_data:
np.transpose([x_vals_train]), y_target: np.transpose([y_vals_train])})
train_loss.append(temp_train_loss)
temp_test_loss = sess.run(loss, feed_dict={x_data:np.transpose([x_vals_test]), y_target: np.transpose([y_vals_test])})
test_loss.append(temp_test_loss)

Output

Output

Working with Kernels in TensorFlow

Introduce how to changer kernels and separate non-linear separable data. We will discuss how to implement the Gaussian kernel.

Create the data

(x_vals, y_vals) = datasets.make_circles(n_samples=500, factor=.5,noise=.1)
y_vals = np.array([1 if y==1 else -1 for y in y_vals])
class1_x = [x[0] for i,x in enumerate(x_vals) if y_vals[i]==1]
class1_y = [x[1] for i,x in enumerate(x_vals) if y_vals[i]==1]
class2_x = [x[0] for i,x in enumerate(x_vals) if y_vals[i]==-1]
class2_y = [x[1] for i,x in enumerate(x_vals) if y_vals[i]==-1]

Output

The distribution of the points

Declare the batch_size , placeholder , and Variable .

batch_size = 250
x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.float32)
b = tf.Variable(tf.random_normal(shape=[1,batch_size]))

Create the Gaussian kernel.

gamma = tf.constant(-50.0)
dist = tf.reduce_sum(tf.square(x_data), 1)
dist = tf.reshape(dist, [-1,1])
sq_dists = tf.add(tf.subtract(dist, tf.multiply(2., tf.matmul(x_data,tf.transpose(x_data)))), tf.transpose(dist))
my_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_dists)))

Declare the dual problem.

model_output = tf.matmul(b, my_kernel)
first_term = tf.reduce_sum(b)
b_vec_cross = tf.matmul(tf.transpose(b), b)
y_target_cross = tf.matmul(y_target, tf.transpose(y_target))
second_term = tf.reduce_sum(tf.multiply(my_kernel, tf.multiply(b_vec_cross,y_target_cross)))
loss = tf.negative(tf.subtract(first_term, second_term))

Create the prediction and accuracy function.

rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1])
rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1])
pred_sq_dist = tf.add(tf.subtract(rA, tf.multiply(2., tf.matmul(x_data,tf.transpose(prediction_grid)))), tf.transpose(rB))
pred_kernel = tf.exp(tf.multiply(gamma, tf.abs(pred_sq_dist)))
prediction_output = tf.matmul(tf.multiply(tf.transpose(y_target),b),pred_kernel)
prediction = tf.sign(prediction_output-tf.reduce_mean(prediction_output))
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.squeeze(prediction),tf.squeeze(y_target)), tf.float32))

Declare the optimizer.

my_opt = tf.train.GradientDescentOptimizer(0.001)
train_step = my_opt.minimize(loss)
init = tf.initialize_all_variables()
sess.run(init)

Train.

loss_vec = []
batch_accuracy = []
for i in range(1000):
rand_index = np.random.choice(len(x_vals), size=batch_size)
X = x_vals[rand_index]
Y = np.transpose([y_vals[rand_index]])
sess.run(train_step, feed_dict={x_data: X, y_target: Y})
temp_loss = sess.run(loss, feed_dict={x_data: X, y_target: Y})
loss_vec.append(temp_loss)
acc_temp = sess.run(accuracy, feed_dict={x_data: X,y_target: Y,prediction_grid:X})
batch_accuracy.append(acc_temp)

Output

Implementing a Non-Linear SVM

Create the Graph and data.

sess = tf.Session()
iris = datasets.load_iris()
x_vals = np.array([[x[0], x[3]] for x in iris.data])
y_vals = np.array([1 if y==0 else -1 for y in iris.target])
class1_x = [x[0] for i,x in enumerate(x_vals) if y_vals[i]==1]
class1_y = [x[1] for i,x in enumerate(x_vals) if y_vals[i]==1]
class2_x = [x[0] for i,x in enumerate(x_vals) if y_vals[i]==-1]
class2_y = [x[1] for i,x in enumerate(x_vals) if y_vals[i]==-1]

Plot the points.

Declare the batch_size , placeholder , and Variable

batch_size = 100
x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)
prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.float32)
b = tf.Variable(tf.random_normal(shape=[1,batch_size]))

Declare the Gaussian kernel.

gamma = tf.constant(-10.0)
dist = tf.reduce_sum(tf.square(x_data), 1)
dist = tf.reshape(dist, [-1,1])
sq_dists = tf.add(tf.subtract(dist, tf.multiply(2., tf.matmul(x_data,tf.transpose(x_data)))), tf.transpose(dist))
my_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_dists)))

Compute the loss for the dual optimization problem.

model_output = tf.matmul(b, my_kernel)
first_term = tf.reduce_sum(b)
b_vec_cross = tf.matmul(tf.transpose(b), b)
y_target_cross = tf.matmul(y_target, tf.transpose(y_target))
second_term = tf.reduce_sum(tf.multiply(my_kernel, tf.multiply(b_vec_cross,y_target_cross)))
loss = tf.negative(tf.subtract(first_term, second_term))

Create the prediction function.

rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1])
rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1])
pred_sq_dist = tf.add(tf.subtract(rA, tf.multiply(2., tf.matmul(x_data,
tf.transpose(prediction_grid)))), tf.transpose(rB))
pred_kernel = tf.exp(tf.multiply(gamma, tf.abs(pred_sq_dist)))
prediction_output = tf.matmul(tf.multiply(tf.transpose(y_target),b),pred_kernel)
prediction = tf.sign(prediction_output-tf.reduce_mean(prediction_output))
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.squeeze(prediction),tf.squeeze(y_target)), tf.float32))

Declare the optimizer function.

my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)
init = tf.initialize_all_variables()
sess.run(init)

Train.

loss_vec = []
batch_accuracy = []
for i in range(1000):
rand_index = np.random.choice(len(x_vals), size=batch_size)
X = x_vals[rand_index]
Y = np.transpose([y_vals[rand_index]])
sess.run(train_step, feed_dict={x_data: X, y_target:Y})
temp_loss = sess.run(loss, feed_dict={x_data: X, y_target: Y})
loss_vec.append(temp_loss)
acc_temp = sess.run(accuracy, feed_dict={x_data: X,y_target: Y,prediction_grid:X})
batch_accuracy.append(acc_temp)

Output

Implementing a Multi-Class SVM

Create the Graph and data.

sess = tf.Session()
iris = datasets.load_iris()
x_vals = np.array([[x[0], x[3]] for x in iris.data])
y_vals1 = np.array([1 if y==0 else -1 for y in iris.target])
y_vals2 = np.array([1 if y==1 else -1 for y in iris.target])
y_vals3 = np.array([1 if y==2 else -1 for y in iris.target])
y_vals = np.array([y_vals1, y_vals2, y_vals3])
class1_x = [x[0] for i,x in enumerate(x_vals) if iris.target[i]==0]
class1_y = [x[1] for i,x in enumerate(x_vals) if iris.target[i]==0]
class2_x = [x[0] for i,x in enumerate(x_vals) if iris.target[i]==1]
class2_y = [x[1] for i,x in enumerate(x_vals) if iris.target[i]==1]
class3_x = [x[0] for i,x in enumerate(x_vals) if iris.target[i]==2]
class3_y = [x[1] for i,x in enumerate(x_vals) if iris.target[i]==2]

Plot the points.

Declare the batch_size , placeholder , and Variable .

batch_size = 50
x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32)
y_target = tf.placeholder(shape=[3, None], dtype=tf.float32)
prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.float32)
b = tf.Variable(tf.random_normal(shape=[3,batch_size]))

Declare the Gaussian kernel.

gamma = tf.constant(-10.0)
dist = tf.reduce_sum(tf.square(x_data), 1)
dist = tf.reshape(dist, [-1,1])
sq_dists = tf.add(tf.subtract(dist, tf.multiply(2., tf.matmul(x_data,tf.transpose(x_data)))), tf.transpose(dist))
my_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_dists)))

We will end up with three-dimensional matrices and we will want to broadcast matrix multiplication across the third index.

def reshape_matmul(mat):
v1 = tf.expand_dims(mat, 1)
v2 = tf.reshape(v1, [3, batch_size, 1])
return(tf.matmul(v2, v1))

Copmute the dual loss function.

model_output = tf.matmul(b, my_kernel)
first_term = tf.reduce_sum(b)
b_vec_cross = tf.matmul(tf.transpose(b), b)
y_target_cross = reshape_matmul(y_target)
second_term = tf.reduce_sum(tf.multiply(my_kernel, tf.multiply(b_vec_cross,y_target_cross)),[1,2])
loss = tf.reduce_sum(tf.negative(tf.subtract(first_term, second_term)))

Create the prediction kernel.

rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1])
rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1])
pred_sq_dist = tf.add(tf.subtract(rA, tf.multiply(2., tf.matmul(x_data,tf.transpose(prediction_grid)))), tf.transpose(rB))
pred_kernel = tf.exp(tf.multiply(gamma, tf.abs(pred_sq_dist)))

Create the prediction.

prediction_output = tf.matmul(tf.multiply(y_target,b), pred_kernel)
prediction = tf.arg_max(prediction_output-tf.expand_dims(tf.reduce_mean(prediction_output,1), 1), 0)
accuracy = tf.reduce_mean(tf.cast(tf.equal(prediction,tf.argmax(y_target,0)), tf.float32))

Declare the optimizer function.

my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)
init = tf.initialize_all_variables()
sess.run(init)

Train.

loss_vec = []
batch_accuracy = []
for i in range(500):
rand_index = np.random.choice(len(x_vals), size=batch_size)
X = x_vals[rand_index]
Y = y_vals[:,rand_index]
sess.run(train_step, feed_dict={x_data: X, y_target:Y})
temp_loss = sess.run(loss, feed_dict={x_data: X, y_target: Y})
loss_vec.append(temp_loss)
acc_temp = sess.run(accuracy, feed_dict={x_data: X, y_target: Y, prediction_grid:X})
batch_accuracy.append(acc_temp)

Output

Reference

[1] https://en.wikipedia.org/wiki/Support_vector_machine

[2] https://www.tensorflow.org/

[3] Tensorflow machine learning cookbook

--

--

PJ Wang
CS Note

台大資工所碩畢 / 設計思考教練 / 系統思考顧問 / 資料科學家 / 新創 / 科技 + 商業 + 使用者