Machine Learning: Modeling OR; XOR with TensorFlowJS

Vladimir Topolev
Geek Culture
Published in
6 min readJul 7, 2021

In this article, we will implement a model for logical operations OR and XOR using TensorFlow.js. This task may be considered as “Hello World” program in Machine Learning world which everybody has already followed when learning something new. It doesn’t take a lot of time, believe me.

We follow along all steps which allow you to get a simple idea how this process may look like and interpolate this experience and approach to build new neural networks.

Modeling OR logical operation

Well, let’s start to build a neuron network for logical operation OR. First of all, imagine that our model is a black box and its content will be discovered a little bit later. But right now we know how many inputs and outputs our model (black box) should have — 2 inputs and 1 output (figure 1). For model learning of course we need to have a train set as well. In our case, it presents a set from 4 available samples and one expected output against each sample:

  • inputs: [[0,0], [0, 1], [1, 0], [1, 2]]
  • outputs: [[0], [1], [1], [1]]
Figure 1

Let’s depict the train set on the coordinate plane with axises X1 and X2. It allows us to understand better which topology of neural network should be.

Figure 2

We need to draw a line that will divide the plane into 2 parts. All ‘TRUE’ values are on the right side, all ‘FALSE’ values are on the left side (Figure 2, right). We also know that one neuron in a neural network (perceptron) is perfectly fit to solve this task. The output value of perceptron depends on input signals and calculated as:

But it’s an equation of a line, it means that one neuron theoretically may divide our surface as we expect.

Since our output values belong to the range [0, 1], we also need to apply the sigmoid activation function. Well, our black box for this specific task may look like this:

Figure 3

Let’s convert our model prototype in TensorFlow model. First of all, we need to convert our train set in tensors. Tensor — is just a container of data, which may have N axes and arbitrary count of elements along of each axises. Most of us is already familiar with tensors from Math — vectors (tensors with one axis), matrixes (tensors with 2 axises — rows and columns).

In TensorFlow in each tensor the first axis (axis 0) is responsible for location of all available samples from the train set (figure 4).

Figure 4

In our case we have 4 samples in our train set (figure 1), therefore the input tensor has 4 elements along the first axis (axis 0). Each element of the training sample is a vector consisting of two elements X1, X2. Thus, the input tensor has axises (matrix), there 4 elements along the first axis and there are 2 elements along the second axis.

Also, we need to create an output tensor for model in the same way:

Let’s create our tensorflow model which has one layer with one neuron in it, it’s intuitively understandable from TensorFlow API itself:

Any model creation starts from the invocation of tf.sequention method which creates a skeleton for our model. The main building model block is a layer and we can define as many layers as we want. In our case, we have only one layer which contents only one neuron and this layer is called the dense layer, which means that each neuron in the next layer has a connection to each neuron from previous layer. Let’s imagine that we have 2 dense layers. The first layer has N neurons, the second — M neurons, then the common count of connections will be (N+1)×M, The 1 here is the bias. Since we have only one neuron in the layer, therefore unit = 1. Also for the first layer, we need to define how many inputs the model has — in our case it’s 2, therefore inputShape = [2]. Pay your attention, that set inputShape for other layers besides the first one doesn’t make sense, since TensorFlow may calculate it by itself basing on model topology of the previous layer

Each layer may have its own activation function, since our output values should belong to the range [0, 1] it’s defined as a sigmoid activation function. All activation functions implemented in TensorFlow are available here.

We need to complete the last step — to compile the model and define 2 mandatory parameters: loss function and optimizer. The loss function is a function that determines the error between the output of algorithms and the given target value. The optimizer is responsible for calculation new weights of the model that leads to minimizing the value of the loss function.

We set up as an optimizer stochastic gradient descent with a constant learning rate equals 0.1

A list of all implemented optimiser in TensorFlow you may find here: tf.train.sgd, tf.train.momentum, tf.train.adagrad, tf.train.adadelta, tf.train.adam, tf.train.adamax, tf.train.rmsprop.

Loss function set up as a root mean square error:

The last step is a learning process, and we just need to invoke the fit method for created model and set up a training tensor for input and expected results (output tensor):

We have set that the learning process should consist of 100 learning steps (number of learning epochs); also at each new epoch — the input data should be shuffled in random order (shuffle = true) — which will speed up the process of model convergence, since there are few instances in our training dataset (only 4 samples).

After the completion of the training process, we can use the predict method, which, based on new input signals, will calculate the output value.

The generateInputs method simply generates a 10x10 sample dataset that divides the coordinate plane into 100 squares:

Here you may see the learning process in action:

Figure 5

Code in plunker here: https://plnkr.co/edit/FUYbevLR6PbyddIa

Modeling XOR logical operation

Train set for this operation provided in figure 6, also put this set in a coordinate plane in the same way as we did for OR function

Figure 6

Pay your attention, unlike for OR operation we couldn’t split coordinate plane with one line, so that there are all FALSE values on one side and there are all TRUE values on the other side. But we could do with two lines (figure 7, left):

Obviously, in this case, one neuron isn’t enough to resolve this task. We should have at least one extra layer with 2 neurons which would model the behavior of 2 lines on the coordinate plane (figure 7, right).

Figure 7

We should only several changes in the previously implemented code. The first one is that we need to set up a new train set:

And let’s change model topology according to figure 7, right:

Here you may see the learning process in action:

Figure 8

Code in plunker here: https://plnkr.co/edit/pF8dCwQuzurgCgTu

--

--

Vladimir Topolev
Geek Culture

Addicted Fullstack JS engineer. Love ReactJS and everything related to animation