Intro to TensorFlow: Solving a simple regression problem

Today I’ll try to explain how to hack TensorFlow to solve a simple regression problem. For this project, we’ll use the data of Boston housing prices which comes by default in scikit-learn. In this tutorial, I assume you are familiar with some concepts of machine learning such as linear regression, neural networks and back-propagation in computation graphs. I’ll remind you about some of these along the way.

You need to have the following installed on your machine to follow this tutorial:

Programming environments:

  • Python 3
  • TensorFlow (Python API)

Python packages:

  • scikit-learn
  • numpy
  • matplotlib (optional, for visualising the error)

Let’s start off simple. We will try to fit our data as a linear combination of the features. The Boston housing dataset has 506 observations, with each observation having 13 features. Since we want to predict housing prices, which can take a large range of values, we will treat this as a regression problem. And since we are restricting ourselves to a linear fit, this becomes a linear regression problem.

The basic computation unit in TensorFlow is a graph. A TensorFlow project is typically structured into 2 parts:

  1. a construction phase where you design the computational graph, and
  2. an analysis phase where you run the graph and perform calculations on it.

Let’s start off by importing the required packages.

Now get the data and restructure it (split it into a training set, validation set and a test set).

The True passed to load_boston() lets it know that we want features and prices in separate numpy arrays.

Every piece of data in TensorFlow is represented by a node in the overall computational graph of your project. Nodes in TensorFlow are also called operations, or ops in short. A tensor is basically an n-dimensional array.

Each op takes zero or more tensors as input, performs some calculations, and returns zero out more tensors as output. Notice the zero on both sides — an op can map zero inputs to n outputs (a constant tensor), n inputs to zero outputs (a placeholder in memory, or a variable) and various other useful combinations.

There are various kinds of useful ops in TensorFlow. Some important ones are constant, placeholder and variable. Constants, as the name suggests, are nodes used to hold static values. Placeholders are containers in memory that can be fed different values at each execution of the program. Variables are ops whose values can change.

Placeholders are particularly useful for storing data in the program. At different executions of the program we can feed different sets of training data. They can also be useful in implementing cross validation. However for our simple example, we will be working with a single training set, hence we won’t be using placeholders.

Let’s define our very first tensors!

tf.Variable() defines a variable tensor. It takes an initial value as an argument. Here, we give tf.truncated_normal() as an initial value, which generates a regularised set of numbers from the normal probability distribution. Choosing a random set of initial weights is considered a good practise in machine learning. tf.zeros() works similar to numpy.zeros().

It’s also worth mentioning here that TensorFlow defines its own datatypes. You need to specify the data type of your choice while creating tensors. Here, I use tf.float64 because that is the type of the observations in our dataset.

I’ll now define a function in plain old Python which calculates predictions and error, so that we don’t have to write code to compute these time and again.

def calc(x, y):
# Returns predictions and error
predictions = tf.add(b, tf.matmul(x, w))
error = tf.reduce_mean(tf.square(y - predictions))
return [ predictions, error ]

tf.add() does what you’d expect, tf.matmul() multiplies matrices, tf.reduce_mean() computes the mean of all elements in the tensor passed to it, and tf.square() squares each element in the tensor passed to it.

Here, I am using the mean square error as a measure of how good our model is doing. There are various other errors we could have used. One very important advantage of the mean square error is that it is differentiable, and we know that a differentiable graph can be exploited in back-propagation.

Now I’ll do some housekeeping (nothing fancy here):

y, cost = calc(train_features, train_prices)
# Feel free to tweak these 2 values:
learning_rate = 0.025
epochs = 3000
points = [[], []] # You'll see later why I need this

Every graph in TensorFlow is run in an environment called a session. The session takes care of your graph, so you don’t have to worry about things going wrong in the computation. You can take control of the graph to do more powerful things with it, but for now we’ll let TensorFlow handle that for us.

Before I start with a session, there’s a couple of things you need to know:

When we have variable tensors in our neural network, we need to explicitly ask TensorFlow to initialise them for us before using the graph. tf.global_variables_initializer() returns an op that does exactly that. TensorFlow also provides an API that performs gradient descent for us. We just have to call tf.train.GradientDescentOptimizer(learning_rate) and ask it to minimize(cost) for us. We catch this op in optimizer.

Now let’s dive into where the magic happens:

tf.Session provides a run() method that runs one computation of an op. Firstly, we run the variable initialiser. Then we run optimizer and capture the error at particular times. We then plot this error as a function of the number of iterations run. Finally, we print the error on our validation set.

If you feel you’ve done your best on the training set, you can test your model by uncommenting the last line and seeing how this model performs on the test data. Remember not to train on the test data by cheating and looking at this error earlier, you will lose the ability to generalise!

That’s it for today! Stay tuned for more interesting stuff.