Introduction to TensorFlow — CPU vs GPU

Erik Hallström
2 min readNov 11, 2016

Dear reader,

This article has been republished at Educaora and has also been open sourced. Unfortunately TensorFlow 2.0 changed the API so it is broken for later versions. Any help to make the tutorials up to date are greatly appreciated. I also recommend you looking into PyTorch.

In this tutorial we will do simple simple matrix multiplication in TensorFlow and compare the speed of the GPU to the CPU, the basis for why Deep Learning has become state-of-the art in recent years.

What is TensorFlow?

It’s a framework to perform computation very efficiently, and it can tap into the GPU (Graphics Processor Unit) in order too speed it up even further. This will make a huge effect as we shall see shortly. TensorFlow can be controlled by a simple Python API, which we will be using in this tutorial.

Graphs and Tensors

When a native computation is done in many programming languages, it is usually executed directly. If you type a = 3*4 + 2 in a Python console, you will immediately have the result. Running a number of mathematical computations like this in an IDE also allows you to set breakpoints, stop the execution and see intermediate results. This is not possible in TensorFlow, what you actually do is specifying the computations that will be done. This is accomplished by creating a computational graph, which takes multidimensional matrices called “Tensors” and does computations on them. Each node in the graph denotes an operation. When creating the graph, you have the possibility to explicitly specify where the computations should be done, on the GPU or CPU. By default it will check if a GPU is available, and use that.


The Graph is run in a Session, where you specify what operations to execute in the run-function. Data from outside may also be supplied to placeholders in the graph, so you can run it multiple times with different input. Furthermore, intermediate result (such as model weights) can be incrementally updated in variables, which will retain their values between runs.


This code example creates pairs of random matrices, clocks the multiplication of them depending on size and device placement.

You see that the GPU (a GTX 1080 in my case) is much faster than the CPU (Intel i7). Back-propagation is almost exclusively used today when training neural networks, and it can be stated as a number of matrix multiplications (backward and forward pass). That’s why using GPU:s are so important for quickly training deep-learning models.

CPU time in green and GPU time in blue. The initial GPU delay at the first iteration is perhaps due to TensorFlow setting starting up stuff.

Next step

In the next post we will use TensorFlow to create a recurrent neural network.