Breast cancer classification

Shyamal Krishna Agrawal
4 min readSep 16, 2021

--

Dataset description:-

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets”, Optimization Methods and Software 1, 1992, 23–34].

This database is also available through the UW CS ftp server:
ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/

Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

Attribute Information:

1) ID number
2) Diagnosis (M = malignant, B = benign)
3–32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter² / area — 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension (“coastline approximation” — 1)

The mean, standard error and “worst” or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Class distribution: 357 benign, 212 malignant

Dataset link:- https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

Code:-

Import

Visualization

Data preprocessing

Model

Model architecture

Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_24 (Dense) (None, 60) 1860
_________________________________________________________________
dense_25 (Dense) (None, 30) 1830
_________________________________________________________________
dense_26 (Dense) (None, 15) 465
_________________________________________________________________
dense_27 (Dense) (None, 8) 128
_________________________________________________________________
dense_28 (Dense) (None, 4) 36
_________________________________________________________________
dense_29 (Dense) (None, 1) 5
=================================================================
Total params: 4,324
Trainable params: 4,324
Non-trainable params: 0

Optimizer

Train model

Result

The model reached a validation accuracy of 98.25% which is quite decent. So it’s quite a good model.

Model predicted outcome

array([[9.99997735e-01],
[1.79545786e-02],
[9.89942928e-04],
[1.37593998e-02],
[1.67914943e-04],
[2.22653919e-03],
[7.36697344e-04],
[7.44805322e-04],
[1.09268774e-04],
[3.87035652e-05],
[2.21965760e-01],
[5.84700368e-02],
[6.61581216e-05],
[9.73413944e-01],
[1.08698592e-01],
[9.99996066e-01],
[2.22989227e-04],
[1.00000000e+00],
[1.00000000e+00],
[1.00000000e+00],
[9.99999166e-01],
[9.99958754e-01],
[7.25446967e-03],
[1.93601160e-03],
[1.00000000e+00],
[6.26074267e-04],
[7.31270484e-05],
[9.99980927e-01],
[1.93954387e-03],
[1.00000000e+00],
[1.21797944e-04],
[1.00000000e+00],
[8.00432637e-02],
[9.99995947e-01],
[9.39193978e-06],
[9.99996424e-01],
[8.69324803e-03],
[9.99999404e-01],
[7.68216467e-03],
[9.99996543e-01],
[9.98348832e-01],
[8.83970279e-05],
[9.99269903e-01],
[1.21040619e-04],
[1.90102197e-02],
[1.00000000e+00],
[4.84036855e-06],
[1.77725144e-02],
[5.51173231e-04],
[1.00000000e+00],
[1.00000000e+00],
[9.98476207e-01],
[9.99999881e-01],
[1.40011753e-03],
[3.16245668e-03],
[6.61303115e-04],
[1.05300581e-03]], dtype=float32)

On the basis of value set the element of array to be true if greater then 0.5 else false.

array([[ True],
[False],
[False],
[False],
[False],
[False],
[False],
[False],
[False],
[False],
[False],
[False],
[False],
[ True],
[False],
[ True],
[False],
[ True],
[ True],
[ True],
[ True],
[ True],
[False],
[False],
[ True],
[False],
[False],
[ True],
[False],
[ True],
[False],
[ True],
[False],
[ True],
[False],
[ True],
[False],
[ True],
[False],
[ True],
[ True],
[False],
[ True],
[False],
[False],
[ True],
[False],
[False],
[False],
[ True],
[ True],
[ True],
[ True],
[False],
[False],
[False],
[False]])

Confusion matrix

--

--

Shyamal Krishna Agrawal

Student at International Institute of Information Technology, Naya Raipur