Practical Deep Neural Network in Keras on PIMA Diabetes Data set

Vidit Shah
3 min readSep 4, 2018

--

Hey Everyone ,today we will learn how to code a Neural Network on -practical data set .

Well if you have no clue what Neural Networks are I recommend you to watch this Video by Brandon Rohrer:(OPTIONAL)

Now lets get started,If you would have noticed I have used the word “practical”.As newbie when we used to learn Deep Neural Networks(DNN) we directly used to get the data ready in training and testing for example (MNIST hand written data set )to pass in our Neural Networks but here we will prepare out coming data and fit in our Neural Network

STEP-1:GET THE DATA

Here we will get the data which is in CSV(coma separated Value).The data can be downloaded from here

Now lets study what is this Data about :

The data set is about is a binary classification dataset. Several constraints were placed on the selection of instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage. The dataset is utilized as it is from the UCI repository.

Basically we are given dataset of women and we have to predict whether she has diabetes or not .Now lets Dive in to fun part THE CODE

CODE

Here I will walk you with the Code tutorial how to make the DNN.

First we need to import all the useful libraries which we will need to work upon

import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
from sklearn.model_selection import train_test_split

Here we use pandas for reading in dataset and performing row and coloumn operation.Keras to build our DNN,numpy for making arrays and sklearn split our data into training and testing

Now lets visualize our data

dataframe = pd.read_csv("diabetes.csv")
dataframe.head()

The Following Code reads in the data and results into printing top 5 values of our datset which is:

Data set

We can see our data set above.Now we need to split our features and labels.

We can See the Outcome Column determines whether patient has diabetes or not (1-Diabetes,0-No Diabetes).Remaining other columns determine features which are essential for predicting .

We will split our data into features[X] and labels[Y] and we will remove and missing Values in our Dataset.The following code does the required task.

df_label = dataframe['Outcome']
df_features = dataframe.drop('Outcome', 1)
df_features.replace('?', -99999, inplace=True)

Now we will one hot encode all the the type of labels.For example we will make 1 into [1,0] and 0 into [0,1].

label = []
for lab in df_label:
if lab == 1:
label.append([1, 0]) # class 1
elif lab == 0:
label.append([0, 1]) # class 0
example of one hot encoding

Now we will split our data into Training and Testing sets.The following line does this for us

x_train, x_test, y_train, y_test = train_test_split(data, label, test_size=0.2, random_state=42)

Building our Neural Network

Now Lets look at the code of building our Neural Network

model = Sequential()
model.add(Dense(500, input_dim=8, activation='sigmoid'))
model.add(Dense(100, activation='sigmoid'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
model.fit(x_train,y_train, epochs=1000, batch_size=70, validation_data=(x_test, y_test))

Over here we create simple Keras neural network with 2 hidden layers one with 500 and and other with 100 with 8 input features and 2 labels to predict on

You can download the full source code with prediction from my GitHub account here

You can follow me on LinkedIn to get more insights.

Cheers! :)

--

--