Breast cancer classification

4 min readSep 16, 2021

Dataset description:-

Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: “Robust Linear Programming Discrimination of Two Linearly Inseparable Sets”, Optimization Methods and Software 1, 1992, 23–34].

This database is also available through the UW CS ftp server:
ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/

Also can be found on UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

Attribute Information:

1) ID number
2) Diagnosis (M = malignant, B = benign)
3–32)

Ten real-valued features are computed for each cell nucleus:

a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter² / area — 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension (“coastline approximation” — 1)

The mean, standard error and “worst” or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.

All feature values are recoded with four significant digits.

Class distribution: 357 benign, 212 malignant

Dataset link:- https://www.kaggle.com/uciml/breast-cancer-wisconsin-data

Code:-

Import

Visualization

Data preprocessing

Model

Model architecture

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_24 (Dense)             (None, 60)                1860      
_________________________________________________________________
dense_25 (Dense)             (None, 30)                1830      
_________________________________________________________________
dense_26 (Dense)             (None, 15)                465       
_________________________________________________________________
dense_27 (Dense)             (None, 8)                 128       
_________________________________________________________________
dense_28 (Dense)             (None, 4)                 36        
_________________________________________________________________
dense_29 (Dense)             (None, 1)                 5         
=================================================================
Total params: 4,324
Trainable params: 4,324
Non-trainable params: 0

Optimizer

Train model

Result

The model reached a validation accuracy of 98.25% which is quite decent. So it’s quite a good model.

Model predicted outcome

array([[9.99997735e-01],
       [1.79545786e-02],
       [9.89942928e-04],
       [1.37593998e-02],
       [1.67914943e-04],
       [2.22653919e-03],
       [7.36697344e-04],
       [7.44805322e-04],
       [1.09268774e-04],
       [3.87035652e-05],
       [2.21965760e-01],
       [5.84700368e-02],
       [6.61581216e-05],
       [9.73413944e-01],
       [1.08698592e-01],
       [9.99996066e-01],
       [2.22989227e-04],
       [1.00000000e+00],
       [1.00000000e+00],
       [1.00000000e+00],
       [9.99999166e-01],
       [9.99958754e-01],
       [7.25446967e-03],
       [1.93601160e-03],
       [1.00000000e+00],
       [6.26074267e-04],
       [7.31270484e-05],
       [9.99980927e-01],
       [1.93954387e-03],
       [1.00000000e+00],
       [1.21797944e-04],
       [1.00000000e+00],
       [8.00432637e-02],
       [9.99995947e-01],
       [9.39193978e-06],
       [9.99996424e-01],
       [8.69324803e-03],
       [9.99999404e-01],
       [7.68216467e-03],
       [9.99996543e-01],
       [9.98348832e-01],
       [8.83970279e-05],
       [9.99269903e-01],
       [1.21040619e-04],
       [1.90102197e-02],
       [1.00000000e+00],
       [4.84036855e-06],
       [1.77725144e-02],
       [5.51173231e-04],
       [1.00000000e+00],
       [1.00000000e+00],
       [9.98476207e-01],
       [9.99999881e-01],
       [1.40011753e-03],
       [3.16245668e-03],
       [6.61303115e-04],
       [1.05300581e-03]], dtype=float32)

On the basis of value set the element of array to be true if greater then 0.5 else false.

array([[ True],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [ True],
       [ True],
       [ True],
       [ True],
       [False],
       [False],
       [ True],
       [False],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [ True],
       [False],
       [ True],
       [False],
       [False],
       [ True],
       [False],
       [False],
       [False],
       [ True],
       [ True],
       [ True],
       [ True],
       [False],
       [False],
       [False],
       [False]])