Classification problem with Tensorflow.

Mohammed Rampurawala
MindOrks
Published in
4 min readNov 17, 2018
credits: wikipedia.org

What is classification?
Why is classification used?
When is classification used?
How to achieve classfication in tensorflow?

What is classification?

Classification is the process of determining/predicting the class of given data points by using the existing points labeling. It comes under the category of supervised learning in which the model learns from the given data points and then uses this learning to classify new observations, this data can be bi-class(spam or not spam) or multi-class(Grade A, Grade B, Grade C or Grade D). A classification problem is when the output variable is a category, such as “green” or “red” and “spam” or “not spam”.

There are various applications in classification in many domains such as in Medical diagnosis, Grading system, Scores predication, etc.
There are multiple ways using which we can achieve data classification with below given algorithms and these are the most famous ones like:

1. Linear Classifier.
2. Neural Networks.
3. Support Vector Machines.
4. Decision Trees and Many more.

Today I will discuss more on LinearClassifier and its implementation and how to switch from LinearClassifier to Dense Neural Network.

How to perform Linear classification with Tensorflow using LinearClassifier?

I will use Diabetes dataset to check if the patient is diabetic or not(bi-class) and I will be using pandas python library to import the data.

head of data after importing diabetes dataset.

First things first: For our model to work correctly we need a correct data therefore we will normalize our data first. To make sure our features are on a similar scale.

Data Normalization

For data normalization we will use the Mean Normalization with formula

𝑥 = 𝑥 — min(𝑥) ∕ max(𝑥) — min(𝑥)

where 𝑥 is feature value

After data normalization is done we need feature_columns to work with tensorflow estimators and create the numeric column for every column, If you have large number of columns you can use for loop but now we will do manually for each column which will be like:

categorial column can also be used with hash bucket where it creates list of possible categories automatically with hash bucket size

Now we have normalized our data and converted all columns to the tensorflow feature columns to work correctly with estimator API, we are going to create a array of all the feature columns as given below:

Now you can see we have all the data present here we can perform the model training. We will be predicting the Class either 0 or 1. So we will drop the column Class from the data frame.

To perform training use sklearn library’s train_test_split method.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x_data, labels, test_size=0.3, random_state=101)

x_data is the data frame after dropping Class column from the original dataframe. labels is the Class column

Now I have got the X_train, X_test, y_train and y_test. We will start training the model with tensorflow’s estimator api using the LinearClassifier.

Lets get started

I will create the LinearClassifier Model with tensorflow’s estimator API and pass the feature columns we have created earlier and number of classes as 2 by default.

model=tf.estimator.LinearClassifier(feature_columns=feature_columns,n_classes=2)

To train the model, we will need the data from sklearn’s train_test_split we performed. input_func will be created using pandas_input because we are using pandas data frame. This input_func will be given as input to LinearClassifier’s#train method. Batch_size will be 10 and it will run for 100 epochs and we will shuffle the data in every epoch.

input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,batch_size=10,num_epochs=100,shuffle=True)
model.train(input_fn=input_func)

Depending on machine this process will take time to train the model.

After the model has been trained we need to evaluate that how well our model has performed, to perform this we will be creating a evaluate function which will y_train and y_test data from train test split of sklearn.

This evaluation will be performed on model object.

eval_inpu_func = tf.estimator.inputs.pandas_input_fn(x=X_test,y=y_test,batch_size=10,num_epochs=1,shuffle=False)
results = model.evaluate(eval_inpu_func)
result of the above model training.

Now our model is trained on diabetes data set and given a number of feature columns. This can be applied to any data which can be classified into categories now its just 0 or 1.

That’s it for now.

I will be continuing this series and in the next article, I will write about using Dense Neural Network on the same diabetes dataset. You can download dataset from here.

Github link: Linear Classification with TensorFlow

If you like it then put a clap (👏 ) on it.

--

--

Mohammed Rampurawala
MindOrks

Senior Android Engineer @DeliveryHero | Ex-Zalando | Machine learning https://mohammedr.me