Analytics Vidhya
Published in

Analytics Vidhya

Getting Started with TensorFlow the Easy Way (Part 3)

This is part 3 — “ Implementing a Regression Example in Tensorflow” in a series of articles on how to get started with TensorFlow.

Linear regression is usually always the first algorithm we learn. It seems to be the benchmark for getting into data science! And while other, more effective algorithms have taken precedence, linear regression continues to hold it’s ground.

One of the primary reasons linear models are widely adopted is because they fit easily and are not computationally heavy.

In this article (part 3), we are going to implement a Linear Regression algorithm in using graphs, variables, and placeholders, which we have covered in the last two posts. Let us have a quick recap:

  1. These nodes collectively form the operational flow for the graph
  2. Each node is an operation with some inputs that supplies an output after execution
  3. Variables are used to store weights and bias of the trained model while Placeholders are used to store the data samples

If any of the above sounds new and unfamiliar, I encourage you to read the previous articles in this series:

Part 1: Tensorflow Installation and Setup, Syntax, and Graphs

Part 2: Variables and Placeholders in Tensorflow

For the context of this article, we will take the famous House Prices: Advanced Regression Techniques dataset.

Problem Statement:

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Now, let us start by importing the data and deep diving into the coding part:

import pandas as pd
import numpy as np
import matplotlib as plt
%matplotlib inline
import tensorflow as tf
print("Tensorflow version: {}".format(tf.__version__))
train = pd.read_csv("train.csv")

As our target is to predict the SalePrice of the house, the most dependable variable would be the actual area. Let us look at the scatter plot between GrLivArea and SalePrice.

train.plot(kind='scatter', x= 'GrLivArea',y ='SalePrice', color='red', figsize=(15,5))

We can clearly see a linear relationship between these two. Now, let us pen down the linear model equation; y = mx + b, where m is the slope and b is the intercept.

Here we are trying to predict the y_true using the GrLivArea information. Our model equation should look like:

y_true = (m * GrLivArea) + b

Before we proceed with building the model, let us scale both the target and the feature.

from sklearn.preprocessing import MinMaxScalerscaler = MinMaxScaler()train_df['GrLivArea'] = scaler.fit_transform(train_df['GrLivArea'].reshape(-1, 1))
train_df['SalePrice'] = scaler.fit_transform(train_df['SalePrice'].reshape(-1, 1))

It’s finally time to code the model in TensorFlow! As we previously discussed, the weight m and bias b are stored in variables.

# Declaring Variablesm = tf.Variable(0.0)
b = tf.Variable(0.0)
# Declaring placeholders to place the training dataxph = tf.placeholder(tf.float32,[batch_size])
yph = tf.placeholder(tf.float32,[batch_size])
# Linear Regression Equationy_model = m*xph + b
error = tf.reduce_sum(tf.square(yph-y_model))
# Gradient Descent Optimizeroptimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train = optimizer.minimize(error)

We have coded the skeleton of our model. Let us discuss about implementing Gradient Descent. In TensorFlow, we have an inbuilt GradientDescentOptimizer which optimizes our error. In this context, we are reducing the residual squared sum:

error = tf.reduce_sum(tf.square(yph-y_model))
train = optimizer.minimize(error)

Now, let us code the important training part:

# Important! Always initialize the variablesinit = tf.global_variables_initializer()batch_size = 300with tf.Session() as sess:
epochs = 1000
for i in range(epochs):
rand_ind = np.random.randint(len(train_df),size=batch_size)
feed = {xph:train_df['GrLivArea'][rand_ind].values,
model_m, model_b =[m,b])

The batch_size is the number of samples of data that goes into one epoch. The parameters batch_size and learning_rate can vary depending on the problem.

Let us now print the trained weight and bias:

print("The trained weight for m is : {m}".format(m=model_m))
print("The trained weight for b is : {b}".format(b=model_b))

Time to have a look at the results! Let us fit these values and see how the predicted y_hat looks like.

y_hat = train_df['GrLivArea'] * model_m + model_btrain_df.plot(kind='scatter',x='GrLivArea',y='SalePrice')

Yay! Our model using one feature did a pretty good job of predicting the linear relationship between SalePrice andGrLivArea. Let us confirm the same with checking the RMSE value.

from sklearn.metrics import mean_squared_errorrmse = mean_squared_error(train_df['SalePrice'], y_hat) ** 0.5print("Root Mean Sqaured Error:",rmse )

The RMSE value I got is:

Out[]: Root Mean Squared Error: 0.07797508645667454

Note that your error might be different than mine but it shouldn’t be a day and night difference.

That’s all for this article! And Congratulations on building your first model with TensorFlow!

In the next article, we will cover how to code a complete Classification model in vanilla TensorFlow. Feel free to reach out to me if you have any query implementing the above code. And as always, ensure you follow Analytics Vidhya and stay tuned for the upcoming parts of this series.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mohammad Shahebaz

Kaggle Grandmaster 🏅| 👨🏻‍💻 Data Scientist | TensorFlow Dev 🔥