SVM — Breast Cancer— Start to Finished

A Complete Colab Notebook Using the Breast Cancer Data Set from UCI — AISeries — Episode #06

J3
Jungletronics
5 min readMay 26, 2021

--

Hi, we going to apply the SVM — Support Vector Machine in the UCI database that comes into the scikit-learn: colab notebook link.

sklearn.datasets.load_breast_cancer

This is a copy from the University of California, Irvine (UCI) Machine Learning Repository dataset.

Let’s get started!

01#Step — Open your Google colab and type this:

02#Step — Let’s download the Breast Cancer Dataset from the Skit-Learn:

03#Step — Let’s see the dictionaries available:

04#Step — Let’s create a Pandas’ Dataframe to work with:

Click to see better…

05#Step — Train Test Split:

06#Step —Let’s grab & train the support vector classifier mode:

07#Step —Prediction:

08#Step — Confusion Matrix & Classification Report:

Let’s get the graph:

Analyzing Confusion Matrix:

From 56 + 10= 66 people that have malignant cancer, 10 was misclassified (15%)

From 3 + 102= 105 people that have benign cancer, 3 was misclassified (3%)

Let’s see if we can do better…

09#Step —Let’s use GridSearch & Train:

A grid search allows you to find the right parameters such as what C or gamma values to use and finding those right parameters is usually a tricky task.

But luckily we can be a little lazy and just try a bunch of combinations and see what works best.

A large C value gives you low bias and high variance in the model or vice versa.

a large Gamma value is going to lead to a high bias and low variance in the model or vice versa.

So if the Gamma is large then the variance is small implying that the support vector does not have a widespread influence.

So that has to do with that bias-variance tradeoff.

10#Step — Predictions and Confusion Matrix & Classification Report:

Now, The Graph:

A bit better:/

Analyzing Confusion Matrix:

From 59 + 7= 66 people that have malignant cancer, 7 was misclassified (10%)

From 4+ 101= 105 people that have benign cancer, 4 was misclassified (3%)

OK! That’s all!

I hope you enjoyed that lecture.

If you find this post helpful, please click the applause button and subscribe to the page for more articles like this one.

Until next time!

I wish you an excellent day!

Download The File For This Project

28_breast_cancer_svm.ipynb

Credits & References

Based on: Python for Data Science and Machine Learning Bootcamp by Jose Portilla

sklearn.datasets.load_breast_cancerThe breast cancer dataset is a classic and very easy binary classification dataset. Download: skit-learn page

Related Posts

00#Episode — AISeries — ML — Machine Learning Intro — What Is It and How It Evolves Over Time?

01#Episode — AISeries — Huawei ML FAQ — How do I get an HCIA certificate?

02#Episode — AISeries — Huawei ML FAQ Again — More annotation from Huawei Mock Exam

03#Episode — AISeries — AI In Graphics — Getting Intuition About Complex Math & More

04#Episode — AISeries — Huawei ML FAQ — Advanced — Even More annotation from Huawei Mock Exam

05#Episode — AISeries — SVM — Credit Card — Start to Finished — A Complete Colab Notebook Using the Default of Credit Card Clients Data Set from UCI

06#Episode — AISeries — SVM — Breast Cancer — Start to Finished— A Complete Colab Notebook Using the Default of Credit Card Clients Data Set from UCI (this one)

07#Episode — AISeries — SVM — Cupcakes or Muffins? — Start To Finished — Based on Alice Zhao post

Take the road less traveled!

--

--

J3
Jungletronics

Hi, Guys o/ I am J3! I am just a hobby-dev, playing around with Python, Django, Ruby, Rails, Lego, Arduino, Raspy, PIC, AI… Welcome! Join us!