The Basics of Neural Networks with Tensorflow

Jun 5 · 7 min read

Today, we’re going to learn the basic ideas behind how a neural network functions, and then go over a short example of how to construct one in Python.

Coding a neural network can be as simple as instantiating a model and then feeding it data until the desired result is produced, but it’s important to understand the underlying concepts involved in creating a neural network in order to make us of them effectively.

Typically, when we code, we create a set of rules that take in data and then provide answers. This way of coding generally works, but some problems are too general or complicated for humans to devise rules for solving them. For these problems, machine learning might be used to reach an answer more easily. Machine learning inverts the conventional method, and by providing a computer with data and ‘answers’ to the problem we want to solve, the computer generates rules for classifying future data.

What is a Neural Network?

Image courtesy of Wikipedia

A neural network, like any function, takes in inputs, and then returns a value. For this articles, we’ll consider a neural network that classifies images. The network consists of several layers of nodes. The first layer is comprised of the input nodes, which takes in the relevant information. A picture typically consists of a 2d array of pixels, each with an RBG value, so they can be expressed as 3d arrays, which can then be flattened into a single dimension. A neural network looking at 32 by 32 pixel images will therefore have 32*32*3 = 3,072 input nodes. The input nodes then perform some mathematical operation on their inputs and output a value between 0 and 1. Each value is then passed into each node in the next layer of nodes, which is a ‘hidden’ layer. The hidden layer then passes the new values into other functions, and passes the output to another layer of hidden nodes until the output layer is reached. At the output layer, the values at the nodes represent the probability that a give piece of data falls into a certain category.

A neural network can have any number of hidden layers. It can be difficult to conceptualize exactly what each hidden layer does, but a simple explanation is this: The values passed into a neural network all map to individual pixels in an image, while the output values are an evaluation of the entire image, so each tier of the neural network searches for increasingly more complicated patterns in the image. For example, in order to detect a line in an image, a given hidden node might have a function that has output highly dependent on the RBG values in a certain area of that image, and whether there are contrasting pixels in the surrounding image. A node in the hidden layer will output a value close to 1 if it detects a ‘match’ for the pattern it is looking for, and a 0 value close to 0 if there is none. The values from each node in the hidden layer are then passed to the next layer, which essentially looks for patterns in the patterns from the previous layer. Once the final hidden layer has calculated its values, the output layer produces its results based on the prevalence of the largest and most significant patterns in the image.

How exactly does a machine learn?

Thus far, we’ve gone over the basic structure of a neural network, but how does this network fit itself to data? Earlier we learned that each node outputs a single number, and that each node in the hidden and output layers take in output from all nodes in the previous layer. However, there are also additional components to the data taken in by hidden and output nodes. Each output value has a ‘weight’ associated with it. The weight is a coefficient applied to the output of each node, and changes depending on which node is the destination for that output. A high weight for a given output with a specific destination means that that particular neuron is important for recognizing a certain feature. For example, an output-destination pair with a weight of 5 has 20 times more impact on the determining the input of a certain neuron than an output-destination pair with a weight of .25.

Training the neural network does not change how the functions of its neurons operate. It changes the weights associated with the connection between neurons. Models determine the best possible neural network by using a least cost function. A perfect model should, in theory, output 1 for the neuron that corresponds to the correct classification for an image, and a 0 for everything else. Cost functions vary, but a typical cost function might take the difference between outputs and true values and then sum them together. The cost function does this for all data in a training set, and the larger the number a cost function produces, the more wrong a given model is. A model will start by guessing, and then, will change the weights between the neuron connections in order to decrease the cost function and will continue to do this until a local minimum has been reached. Once the model has been trained, it can be used to evaluate data.


Now that we have some idea of what a neural network is, we’ll look at a short example that can be found here:

This data consists of movie reviews tagged ‘positive’ or ‘negative’, based on the critic’s opinion conveyed in the article.

from __future__ import absolute_import, division, print_function, unicode_literalsimport tensorflow as tf
from tensorflow import keras
import numpy as np

First, we need to import packages. We’ll be using tensorflow, one of the most popular python tools for machine learning. Make sure to install all relevant packages as well using ‘pip install’, if necessary.

The data is currently expressed as lists of integers, where each integer corresponds to a word in the dictionary. So, it might be convenient if we had a means to convert those integers back into words.

word_index = imdb.get_word_index()
word_index = {k:(v+3) for k,v in word_index.items()}
word_index["<PAD>"] = 0
word_index["<START>"] = 1
word_index["<UNK>"] = 2
word_index["<UNUSED>"] = 3
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])def decode_review(text):
return ' '.join([reverse_word_index.get(i, '?') for i in text])

The decode_review function does just that.

train_data = keras.preprocessing.sequence.pad_sequences(train_data,
value=word_index["<PAD>"], padding='post',maxlen=250)
test_data = keras.preprocessing.sequence.pad_sequences(test_data,

Now, because the neural network has a fixed number of nodes, and each node needs an input, we need to ensure that the reviews are all the same size. We can do this by only taking in the first 250 words in the review.

vocab_size = 10000
model = keras.Sequential()
model.add(keras.layers.Embedding(vocab_size, 16))
model.add(keras.layers.Dense(16, activation=tf.nn.relu))
model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))

This code actually creates the neural network. The first line creates the network itself, while the ‘add’ statements all add layers to the network.


Now, we need an optimizer and a loss function. There are multiple available, but binary crossentropy is best to use because our outputs are binary probabilities.

x_val = train_data[:10000]
partial_x_train = train_data[10000:]
y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]

Here, we set aside the first 10000 data points to act as a validation set.

history =,
validation_data=(x_val, y_val),

Now, we’re going to train the model. This model will run on 512 data points at a time 40 times. With each epoch, the model’s neuron connection weights are changed and accuracy generally increases. Run it in your own notebook to see how the model becomes more accurate over time.

Here, we can see the value of our cost function decreasing with each epoch.

Here, we see the accuracy of the neural network across epochs. Initially, the network’s accuracy is about 55 percent on the training set, very close to guessing. But, as the model continues to optimize itself, the accuracy increases, and levels off for the test set at about 88 percent. We can see that training accuracy continues to rise, but this increased ability to correctly predict the value of the training data does not significantly impact the model’s predictive power. This might be indicative of over-fitting — when a model is very good at predicting things about the data it is trained on, but is too specific to that data to the point where it is not useful to describe other data.

We’ve gone over the basic concepts of neural networks as well as an example of how to implement one. Hopefully this article should serve as a nice start to the world of machine learning for anyone new to the concept.

Key Terms:

Neural Network: A network of nodes and weighted connections that output some prediction about about its data

Neuron: The ‘nodes’ of a neural network. They are functions that taken in a float and return a value between 0 and 1

Input Layer: The layer of nodes of in a neural network that directly takes in data.

Hidden Layers: The intermediate layers in a neural network that do most of the processing.

Output Layers: The layer of nodes that returns one or more values.

Weight: A coefficient applied to the outputs of nodes. It can also be thought of as the relative importance of each input.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade