Simplest Introduction to Neural Networks in Keras

Solving the non-linear XOR problem

Matheus Farias
Analytics Vidhya
3 min readFeb 21, 2020

--

Neural networks are a very useful tool that can be used to solve a lot of problems in different natures since it is an universal approximation method. A good approach to start in this world is by using a well-known simple neural networks library: Keras

In this text it will be shown the easiest way to solve one of the easiest problems in machine learning and neural networks: the XOR problem.

The XOR problem

First of all, XOR is a logic operation that can be defined with the truth table below:

The XOR’s truth table.

The great motivation of using neural networks to learn how this function generates the answer C given the inputs A and B appears when the situation is not a linear problem.

In a didactical manner, for understanding what means to say that the XOR problem is not linear, it suffices to see geometrically that it is impossible to separate the answers 0 and 1 with a single line in the cartesian plane:

This problem needs more than a single line to cluster the answer, it is a non-linear problem.

Thus, it will be verified, in practice, that a single neuron (Perceptron) can’t solve the problem, but a DNN can.

Let’s code!

To solve this problem, it was used the Keras neural network library and obviously, numpy, which can be both downloaded using pip as the follow:

pip install keras
pip install numpy

So it can be started importing those libraries:

import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD

To train the neural network system, it need to be defined what will be the input vector and the corresponding output vector that classifies the input.

For this problem, it was defined the input vector x as all A and B possibilities with 1 bit each, and the output vector y as the XOR answer (looking to the C variable in the truth table),

x = np.array([[1,0], [1,1], [0,1], [0,0]])
y = np.array([[1], [0], [1], [0]])

After that, it is now possible to describe the architecture that will be used to train the system. As we want simplicity, we will compare a single neuron (Perceptron model) with a two ReLU hidden layer DNN with 8 neurons:

model = Sequential()
model.add(Dense(8, input_dim=2))
model.add(Activation('relu'))
model.add(Dense(8))
model.add(Activation('relu'))
model.add(Dense(1, activation='sigmoid'))

To compile the Keras model, the loss function is defined as binary crossentropy, and the stochastic gradient descent as the optimizer, with 0.1 as the learning rate coefficient:

sgd = SGD(lr=0.1)
model.compile(loss='binary_crossentropy', optimizer='sgd')

To finish the implementation, it was used 1000 epochs with batch size 1 to fit the model:

model.fit(x, y, epochs=1000, batch_size= 1)

To see if the implementation was correct, it is possible to put the system to predict the desired answers:

predictions = model.predict(x)
print(predictions)

So, the full code will be:

"""
Created on Fri Feb 21 02:01:40 2020
@author: Matheus Farias
"""
#importing librariesfrom keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
import numpy as np
#defining the input and output training vectorsx = np.array([[1,0], [1,1], [0,1], [0,0]])
y = np.array([[1], [0], [1], [0]])
# defining the keras modelmodel = Sequential()
model.add(Dense(8, input_dim=2))
model.add(Activation('relu'))
model.add(Dense(8))
model.add(Activation('relu'))
model.add(Dense(1, activation='sigmoid'))
# compiling the keras modelsgd = SGD(lr=0.1)
model.compile(loss='binary_crossentropy', optimizer='sgd')
# fitting the keras model on the training vectors
model.fit(x, y, epochs=1000, batch_size= 1)
# predicting the desired answerspredictions = model.predict(x)
print(predictions)

Results

The output accuracy is shown:

[[0.9876364 ]
[0.02138409]
[0.9333476 ]
[0.02856272]]

Which proves that this non-linear model solves the problem.

Now, let’s see if a single neuron, e.g, a linear model can solve the problem. For this, it was used the following model:

model = Sequential()
model.add(Dense(1, activation='sigmoid'))

And for this new model, the output accuracy is shown below:

[[0.5178774 ]
[0.49577093]
[0.48331842]
[0.5054262 ]]

Verifying that this problem couldn’t be solved with a linear model.

--

--

Matheus Farias
Analytics Vidhya

Electronics Engineer interested in Artificial Intelligence