Classifying Breast Cancer %98.18 accurate with KERAS

Tayyip Gören
Sep 1, 2018 · 2 min read
Neural Network

The dataset I’m using in this project is

Breast Cancer Wisconsin (Original) Data Set

by

Dr. WIlliam H. Wolberg (physician)
University of Wisconsin Hospitals
Madison, Wisconsin, USA

Creating, reshaping, scaling and splitting the data

Link to the data can be found in here.

Imports

import numpy as np
from sklearn import preprocessing, cross_validation
import pandas as pd

Reading the data

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data')

Reshaping

Adding feature colums to the dataframe

df.columns = ['id','clump_thickness','unif_cell_size','unif_cell_shape','marg_adhesion','single_epith_size','bare_nuclei','bland_chrom','norm_nucleoli','mitoses','class']

Dropping id column because is has no correlation with the class

df.drop(['id'], inplace=True, axis=1)

Replacing empty data with -99999 to be a outlier

df.replace('?', -99999, inplace=True)

Mapping class values to binary, it is 2 and 4 in our data. (2 for benign, 4 for malignant)

df['class'] = df['class'].map(lambda x: 1 if x == 4 else 0)

Final dataframe

Final dataframe

Scaling the data

Creating X(features) and y(classes)

X = np.array(df.drop(['class'], axis=1))
y = np.array(df['class'])

Creating scaler instance

scaler = preprocessing.MinMaxScaler()

Finally scaling the data

X = scaler.fit_transform(X)

Splitting the data

X_train, X_test, y_train, y_test = cross_validation.train_test_split(
X, y, test_size=0.2)

Creating the model and training

Usual imports

from __future__ import print_function
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
import tensorflow as tf

Creating the model

Creating the model instance

model = Sequential()

Adding Layers to the model

model.add(Dense(9, activation='sigmoid', input_shape=(9,)))
model.add(Dense(27, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Dense(54, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Dense(27, activation='sigmoid'))
model.add(Dropout(0.25))
model.add(Dense(1, activation='sigmoid'))

Compiling the model

model.compile(optimizer=keras.optimizers.Adam(), loss=keras.losses.mean_squared_logarithmic_error)

I’m using Adam as optimizer and mean squared logarithmic error as loss function.

Traning the model

model.fit(X_train, y_train, batch_size=30, epochs=2000, verbose=1, validation_data=(X_test, y_test))Output:
Epoch 2000/2000
558/558 [==============================] - 0s 320us/step - loss: 0.0104 - val_loss: 0.0182

Evaluating results

loss = model.evaluate(X_test, y_test, verbose=1, batch_size=30)print("Final result is {}".format(100 - loss*100))Output:
Final result is 98.18395614690546

Final result is %98.18 accuracy.

All the code can be found in the notebook on colab.

Tayyip Gören

Written by

SOFTWARE ENGINEER, DEVELOPER, COMPUTER ENTHUSIAST, SPORTSMAN

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade