Gentlest Introduction to Tensorflow #1

Summary: Tensorflow (TF) is Google’s attempt to put the power of Deep Learning into the hands of developers around the world. It comes with a beginner & an advanced tutorial, as well as a course on Udacity. However, the materials attempt to introduce both ML and TF concurrently to solve a multi-feature problem — character recognition, which albeit interesting, unnecessarily convolutes understanding. In this series of articles, we present the gentlest introduction to TF that starts off by showing how to do linear regression for a single feature problem, and expand from there.

This is part of a series:

  • Part 1 (this article): Linear regression with Tensorflow for single feature single outcome model
  • Part 2: Tensorflow training illustrated in diagrams/code, and exploring training variations
  • Part 3: Matrices and multi-feature linear regression with Tensorflow
  • Part 4: Logistic regression with Tensorflow

Introduction

We are going to solve an overly simple, and unrealistic problem, which has the upside of making understanding the concepts of ML and TF easy. We want to predict a single scalar outcome, house price (in $) based on a single feature, house size (in square meters, sqm). This eradicates the need to handle multi-dimensional data, enabling us to focus solely on defining a model, implementing, and training it in TF.

Machine Learning (ML) In Brief

We start with a set of data points that we have collected (chart below), each representing the relationship between two values —an outcome (house price) and the influencing feature (house size).

Image for post
Image for post

However, we cannot predict values for features that we don’t have data points for (chart below)

Image for post
Image for post

We can use ML to discover the relationship (the ‘best-fit prediction line’ in the chart below), such that given a feature value that is not part of the data points, we can predict the outcome accurately (the intersection between the feature value and the prediction line.

Image for post
Image for post

Step 1: Choose a Model

To do prediction using ML, we need to choose a model that can best-fit the data that we have collected.

We can choose a linear (straight line) model, and tweak it to match the data points by changing its steepness/gradient and position.

Image for post
Image for post

We can also choose an exponential (curve) model, and tweak it to match the same set of data points by changing its curvature and position.

Image for post
Image for post

To compare which model is a better-fit more rigorously, we define best-fit mathematically as a cost function that we need to minimize. An example of a cost function can simply be the absolute sum of the differences between the actual outcome represented by each data point, and the prediction of the outcome (the vertical projection of the actual outcome onto the best-fit line). Graphically the cost is depicted by the sum of the length of the blue lines in the chart below.

Image for post
Image for post

NOTE: More accurately the cost function is often the squared of the difference between actual and predicted outcome, because the difference can sometimes can be negative; this is also known as min least-squared.

In the spirit of keeping things simple, we will model our data points using a linear model. A linear model is represented mathematically as:

To tweak the model to best fit our data points, we can:

  • Tweak W to change the gradient of the linear model
Image for post
Image for post
  • Tweak b to change the position of the linear model
Image for post
Image for post

By going through many values of W, b, we can eventually find a best-fit linear model that minimizes the cost function. Besides randomly trying different values, is there a better way to explore the W, b values quickly?

If you are on an expansive plateau in the mountains, when trying to descent to the lowest point, your viewpoint looks like this.

Image for post
Image for post

The direction of descent is not obvious! The best way to descend is then to perform gradient descent:

  • Determine the direction with the steepest downward gradient at current position
  • Take a step of size X in that direction
  • Repeat & rinse; this is known as training

Minimizing the cost function is similar because, the cost function is undulating like the mountains (chart below), and we are trying to find the minimum point, which we can similarly achieve through gradient descent.

Image for post
Image for post

With the concepts of linear model, cost function, and gradient descent in hand, we are ready to use TF.

Step 2: Create the Model in TF

The 2 basic TF components are:

Placeholder: Represent an entry point for us to feed actual data values into the model when performing gradient descent, i.e., the house sizes (x), and the house prices (y_).

Image for post
Image for post

Variable: Represent a variable that we are trying to find ‘good’ values that minimizes the cost function, e.g., W, and b.

Image for post
Image for post

The linear model (y = W.x + b) in TF then becomes:

Image for post
Image for post

Similarly to feed actual house prices (y_) of the data points into the model, we create a placeholder.

Image for post
Image for post

Our cost function of least-min squared becomes:

Image for post
Image for post

Since we do not have actual data points for house price (y_), house size (x), we generate them.

Image for post
Image for post

We set the house price (ys) to always be 2 times the house size (xs) for simplicity.

With the linear model, cost function, and data, we can start performing gradient descent to minimize the cost function, to obtain the ‘good’ values for W, b.

Image for post
Image for post

The 0.00001 is the size of the ‘step’ we take in the direction of steepest gradient each time perform a training step; this is also called learning rate.

Step 3: Train the Model

Training involves performing gradient descent a pre-determined number of times or until the cost is below a pre-determined threshold.

All variables needs to be initialize at the start of training otherwise they may hold remnant values from previous execution.

Image for post
Image for post

Although TF is a python library, and python is an interpreted language, TF operations, by default are NOT interpreted for performance reasons. Thus the init above is NOT executed. Instead TF executes within a session; create a session (sess) and then execute stuff using sess.run().

Image for post
Image for post

Similarly we execute the train_step above within a loop by calling it within sess.run().

Image for post
Image for post

The reason why you need to feed actual data points into feed, which is composed of x, y_ is that TF resolves the train_step into its dependencies:

Image for post
Image for post

At the bottom of the dependencies are the placeholders x, y_; and as we learned earlier tf.placeholders are used to indicate where we will feed actual data point values house price (y_), and house size (x).

The print statement in the loop will show how TF learn the ‘good’ values for W, and b over each iteration.

Image for post
Image for post

Wrapping Up

We have learned about Machine Learning in its simplest form; predict an outcome from a single feature. We chose a linear model (for simplicity) to fit our data points, define a cost function to represent best-fit, and train our model by repeatedly tweaking its gradient variable, W, and position variable b to minimize the cost function.

In the next article, we will:

  • Set up Tensor Board to visualize TF execution to detect problems in our model, cost function, or gradient descent
  • Feed data points in batches into the model during each training step (instead of just one data point at a time) to understand how it affects training

All of Us are Belong to Machines

Writings about machine learning, and artificial…

Soon Hin Khor, Ph.D.

Written by

Use tech to make the world more caring, and responsible. Nat. Univ. Singapore, Carnegie Mellon Univ, Univ. of Tokyo. IBM, 500Startups & Y-Combinator companies

All of Us are Belong to Machines

Writings about machine learning, and artificial intelligence

Soon Hin Khor, Ph.D.

Written by

Use tech to make the world more caring, and responsible. Nat. Univ. Singapore, Carnegie Mellon Univ, Univ. of Tokyo. IBM, 500Startups & Y-Combinator companies

All of Us are Belong to Machines

Writings about machine learning, and artificial intelligence

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store