Week 4- Breast Cancer Detection

Yahya Koçak
bbm406f19
Published in
2 min readDec 29, 2019

Hello everyone!

This is our fourth blog about our Machine Learning Course Project on Breast Cancer Detection. This week we used the Support Vector Machine(SVM) classifier algorithm on our data set.

What is the Support Vector Machine(SVM) classifier algorithm?

Brief description: The objective of the support vector machine algorithm is to find a hyperplane in N-dimensional space(N — the number of features) that distinctly classifies the data points.

Possible hyperplanes

Kernel Trick:

SVM tries to classify the data linearly, but in some cases, this is not possible. To get rid of this situation, we use the Kernel Trick.

If we can create a new dimension, we may be able to classify it linearly. For example, if we lift the red dots a bit (z-axis) and create a 3rd dimension, we can create a linear line with SVM.

We applied the support vector machine algorithm and we did not get a good accuracy value(0.61961). We wanted to do standardization to get better results with this algorithm.

So what is standardization?

Standardization of a data set is a common requirement for many machine learning estimators: if individual features do not normally resemble distributed standard data, they may behave badly.

For example, many elements used in the objective function of a learning algorithm assume that all properties are centered around 0 and have variance in the same order. If one feature has a variance with a greater order of magnitude than the others, it can dominate the objective function and ensure that the estimator cannot learn correctly from other features as expected.

We standardize the dataset using StandardScaler from sklearn and we re-run the Support Vector Machine algorithm with our standardized dataset. This time our accuracy was better(0.964879).

See you next blog…

References

--

--