Published in

Analytics Vidhya

# Getting Started with TensorFlow the Easy Way (Part 3)

This is part 3 — “ Implementing a Regression Example in Tensorflow” in a series of articles on how to get started with TensorFlow.

Linear regression is usually always the first algorithm we learn. It seems to be the benchmark for getting into data science! And while other, more effective algorithms have taken precedence, linear regression continues to hold it’s ground.

One of the primary reasons linear models are widely adopted is because they fit easily and are not computationally heavy.

In this article (part 3), we are going to implement a Linear Regression algorithm in using graphs, variables, and placeholders, which we have covered in the last two posts. Let us have a quick recap:

1. These nodes collectively form the operational flow for the graph
2. Each node is an operation with some inputs that supplies an output after execution
3. Variables are used to store weights and bias of the trained model while Placeholders are used to store the data samples

If any of the above sounds new and unfamiliar, I encourage you to read the previous articles in this series:

For the context of this article, we will take the famous House Prices: Advanced Regression Techniques dataset.

## Problem Statement:

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Now, let us start by importing the data and deep diving into the coding part:

`import pandas as pdimport numpy as npimport matplotlib as plt%matplotlib inlineimport tensorflow as tfprint("Tensorflow version: {}".format(tf.__version__))train = pd.read_csv("train.csv")train.head()`

As our target is to predict the SalePrice of the house, the most dependable variable would be the actual area. Let us look at the scatter plot between GrLivArea and SalePrice.

`train.plot(kind='scatter', x= 'GrLivArea',y ='SalePrice', color='red', figsize=(15,5))`

We can clearly see a linear relationship between these two. Now, let us pen down the linear model equation; `y = mx + b`, where `m` is the slope and `b` is the intercept.

Here we are trying to predict the `y_true` using the `GrLivArea` information. Our model equation should look like:

`y_true = (m * GrLivArea) + b`

Before we proceed with building the model, let us scale both the target and the feature.

`from sklearn.preprocessing import MinMaxScalerscaler = MinMaxScaler()train_df['GrLivArea'] = scaler.fit_transform(train_df['GrLivArea'].reshape(-1, 1))train_df['SalePrice'] = scaler.fit_transform(train_df['SalePrice'].reshape(-1, 1))`

It’s finally time to code the model in TensorFlow! As we previously discussed, the weight `m` and bias `b` are stored in variables.

`# Declaring Variablesm = tf.Variable(0.0)b = tf.Variable(0.0)# Declaring placeholders to place the training dataxph = tf.placeholder(tf.float32,[batch_size])yph = tf.placeholder(tf.float32,[batch_size])# Linear Regression Equationy_model = m*xph + berror = tf.reduce_sum(tf.square(yph-y_model))# Gradient Descent Optimizeroptimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)train = optimizer.minimize(error)`

We have coded the skeleton of our model. Let us discuss about implementing Gradient Descent. In TensorFlow, we have an inbuilt GradientDescentOptimizer which optimizes our error. In this context, we are reducing the residual squared sum:

`error = tf.reduce_sum(tf.square(yph-y_model))train = optimizer.minimize(error)`

Now, let us code the important training part:

`# Important! Always initialize the variablesinit = tf.global_variables_initializer()batch_size = 300with tf.Session() as sess:    sess.run(init)    epochs = 1000    for i in range(epochs):        rand_ind = np.random.randint(len(train_df),size=batch_size)        feed = {xph:train_df['GrLivArea'][rand_ind].values,                 yph:train_df['SalePrice'][rand_ind].values}        sess.run(train,feed_dict=feed)    model_m, model_b = sess.run([m,b])`

The `batch_size` is the number of samples of data that goes into one epoch. The parameters `batch_size` and `learning_rate` can vary depending on the problem.

Let us now print the trained weight and bias:

`print("The trained weight for m is : {m}".format(m=model_m))print("The trained weight for b is : {b}".format(b=model_b))`

Time to have a look at the results! Let us fit these values and see how the predicted `y_hat` looks like.

`y_hat = train_df['GrLivArea'] * model_m + model_btrain_df.plot(kind='scatter',x='GrLivArea',y='SalePrice')plt.plot(train_df['GrLivArea'],y_hat,'r')`

Yay! Our model using one feature did a pretty good job of predicting the linear relationship between `SalePrice` and`GrLivArea`. Let us confirm the same with checking the RMSE value.

`from sklearn.metrics import mean_squared_errorrmse = mean_squared_error(train_df['SalePrice'], y_hat) ** 0.5print("Root Mean Sqaured Error:",rmse )`

The RMSE value I got is:

`Out[]: Root Mean Squared Error: 0.07797508645667454`

Note that your error might be different than mine but it shouldn’t be a day and night difference.

In the next article, we will cover how to code a complete Classification model in vanilla TensorFlow. Feel free to reach out to me if you have any query implementing the above code. And as always, ensure you follow Analytics Vidhya and stay tuned for the upcoming parts of this series.

Part 4: Implementing a Classification in Tensorflow (Next Up)

--

--

## More from Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

## Get the Medium app

Kaggle Grandmaster 🏅| 👨🏻‍💻 Data Scientist | TensorFlow Dev 🔥