Tensorflow.js is a library built on deeplearn.js to create deep learning modules directly on the browser. Using that you can create CNNs, RNNs , etc … on the browser and train these modules using the client’s GPU processing power. Hence, a server GPU is not needed to train the NN. This tutorial starts by explaining the basic building blocks of TensorFlow.js and the operations on them. Then, we describe how to create some complicated models.
One Note or Two …
If you want to play with the code I created an interactive coding session on Observable. Also, I created many mini-projects including simple classification, style transfer, pose estimation and pix2pix translation.
Since TensorFlow.js runs on the browser you just need to include the following script to the header of the html file
This will load the latest published version of the bundle.
Tensors (The building blocks)
If you are familiar with deep learning platforms like TensorFlow you should be able to recognize that tensors are n dimensional arrays that are consumed by operators. Hence they represent the building block for any deep learning application. Let us create a scalar tensor
This created a scalar tensor. We also can convert arrays to tensors
This creates a constant tensor of the array
[2,2]. In other words we converted the one dimensional array to a tensor by a applying the tensor function. We can use
input.shape to retrieve the
sizeof the tensor.
This has the shape
. We can also create a tensor with specific size. For instance, here we create a tensor of zeros with shape
In order to use tensors we need to create operations on them. Let us say we want to find the square of a tensor
The value of
x2 will be
[4,9,16]. TensorFlow.js also allows chaining operations. For example, to evaluate the 2nd power of a tensor we use
x2 tensor will have value
Usually we generate lots of intermediate tensors. For instance, in the previous example after evaluating
x2 we don’t need the value of
x. In order to do that we call
Note that we can no longer use the tensor
x in later operations. Now, it might be a little inconvenient to do that for every tensor. Actually, not disposing tensors will be an overhead for the memory. TensorFlow.js offers a special operator
tidy() to dispose intermediary tensors automatically
Notice that the value of the tensor
y will be disposed since we don’t need it after we evaluate the value of
Here we learn how to solve an optimization problem. Given a function f(x) we are asked to evaluate x = a that minimizes f(x). To do that we will need an optimizer. An optimizer is an algorithm to minimize a function by following the gradient. There are many optimizers in the literature like SGD, Adam, etc… These optimizers differ in their speed and accuracy. Tensorflowjs support the most important optimizers.
We will take a simple example were f(x) = x⁶+2x⁴+3x²+x+1. The graph of the function is shown below. We see that the minmum of the function is in the interval
[-0.5,0] . We will use an optimizer to find the exact value.
First we define the function to minimize
Now we can iteratively minimize the function to find the value of the minimum. We will start by an initial value of a = 2. The learning rate defines how fast we jump to reach the minimum. We will use an Adam optimizer
Using a learning rate with value
0.9 we find the value of the minimum after
200 iterations to be
A Simple Neural Network
Now we learn how to create a neural network to learn XOR which is a nonlinear operation. The code is similar to keras implementation. We first create the training set which takes two inputs and one output. We will feed a batch of 4 items in each iteration
Then we create two dense layers with two different nonlinear activation functions. We use stochastic gradient descent with cross entropy loss. The learning rate is
Then we fit the model for
Finally we predict on the training set
the output should be
[[0.0064339], [0.9836861], [0.9835356], [0.0208658]] which should be expected.
TensorFlow.js uses automatic differentiation using computational graphs. We just need to create the layers, optimizer and compile the model. Let us create a sequential model
Now we can add different layers for the model. Let us add the first convolutional layer with input
Here we created a
conv layer that takes input of size
[28,28,1]. The input will be a gray image of size
28 x 28. Then we apply
8 kernels of size
5x5 and stride equals to
1 initialized with
VarianceScaling. After that, we apply an activation function which basically takes the negative values in the tensor and replaces them with zeros. Now we can add this convlayer to the model
Now what is nice about Tensorflow.js we don’t need to specify the input size for the next layer as it will be evaluated automatically after we compile the model. We can also add max-pooling, dense layers , so on. Here is a simple model
We can apply a tensor for any layer to inspect the output tensor. But here is a catch the input needs to be of shape
BATCH_SIZE represents the number of dataset elements we apply to the model at a time. Here is an example of how to evaluate a convolutional layer
After inspecting the shape of the
output tensor we see it has shape
[1,24,24,8]. This is evaluated using the formula
Which will result in
24 in our case. Returning to our model we realize that we used
flatten() which basically convert the input from the shape
[BATCH_SIZE,a,b,c] to the shape
[BATCH_SIZE,axbxc]. This is important because in the dense layers we cannot apply
2d arrays. Finally, we used the dense layer with output units
10 which represents the number of classes we need in our recognition system. Actually this model is used for the recognizing hand-written digits in the so called MNIST dataset.
Optimization and Compilation
This will create an Adam optimizer using the specified learning rate. Now, we are ready to compile the model (attaching the model with the optimizer)
Here we created model that uses Adam to optimize the loss function that evaluates a cross entropy of the predicted output and the true label.
After compiling the model we are ready to train the model on a dataset. We need to use the
fit() function for that
Note that we are feeding to the fit function a batch of training set. The second variable for the
fit function represents the true labels of the model. Lastly, we have the configuration parameters like the
epochs. Note that
epochs represents how many times we iterate over the current batch NOT the whole dataset. Hence we can for example wrap that code inside a
for loop that iterates over all the batches of the training set.
Note that we used the special keyword
await which basically blocks and waits for the function to finish executing the code. It is like running a another thread and the main thread is waiting for the fitting function to finish execution.
One Hot Encoding
Usually the given labels are numbers which represents the class. For instance, suppose we have two classes an orange class and an apple class. Then we will give the orange class label
0 and the apple class label
1. But, our network accepts a tensor of size
[BATCH_SIZE,NUM_CLASSES]. Hence we need to use what we call one hot encoding
Hence we converted the
1d tensor off labels into a tensor of shape
Loss and Accuracy
In order to inspect the performance of our model we need to know the loss and the accuracy. In order to do that we need to fetch the results of the model using the history module
Note that we are evaluating the loss and accuracy of the
validationData that was an input to the
Suppose that we are done with training a model and it gives good loss and accuracy. It is time to predict the results of unseen data element. Suppose we are given an image that is in our browser or we took directly from our webcam, then we can use our trained model to predict its class. First, we need to convert the image into tensor
Here we created a
canvas and retrieved
imageData from it and then we converted to a tensor. Now the tensor will have size
[28,28,3] but the model takes 4-dimensional vectors. Hence we need to add an extra dimension for the tensor using
Hence the output tensor will have size
[1,28,28,3] since we have added a dimension at index
0. Now for prediction we simply use
predict will return the value of the last layer in our network usually a
softmax activation function.
In the previous sections we had to train our model from scratch. However, this is an expensive operation since it requires more training iterations. Hence, we use a pretrained model called mobilenet. It is a light-wight CNN that is optimized to run in mobile applications. Mobilenet was trained on ImageNet classes. Basically we have precomputed activations trained on
1,000 different classes.
To load the model we use the following
We can use inputs , outputs to inspect the structure of the model
Hence we need images of size
[1,224,224,3] and the output will be a tensor of size
[1,1000] which holds the probability of each class in the ImageNet dataset.
For the sake of simplicity we will take an array of zeros and try to predict the class number out of
After running the code I get class = 21 which represents a kite o:
Now we need to inspect the contents of the model. To do that we can get the models layers and names
We see that we have
88 layers which would be so expensive to train again on another dataset. Hence, the basic trick is to use this model just to evaluate the activations (we will not retrain) but we will create dense layers that we can train on another number of classes.
For instance, suppose we need a model to differentiate between carrots and cucumbers. We will use mobilene tmodel to calculate the activation up to some layer we choose. Then we use dense layers with output size
2 to predict the correct class. Hence the
mobilenet model will be in some sense ‘freezed’ and we just train the dense layers.
First, we need to get rid of the dense layers of the model. We choose to extract a random layer let us say number
81 with name
Now let us update our model to have this layer is an output
Finally, we create the trainable model but we need to know the last layer output shape
We see that the shape
[null,7,7,256] Now we can input this to our dense layers
As you can see we created a dense layer with
100 neurons and the output layer with size
And we can use the previous sections to train the last model using a certain optimizer.