Brief Intro to Autograd in Javascript

Stephen Oni
13 min readAug 1, 2019

--

“….. By far the best strategy for learning a lot that I discovered was to implement everything. I happened to do a lot in Javascript for lols. Whenever I read something I fall into an illusion of understanding it and forcing myself to reimplement it always uncovers interesting insight. It’s my favorite way to learn”, Andrej Karpathy.

I will assume you know backpropagation and how it works, and if not you can do well to check this cs231note on backpropagation, it provides a concise and clear view into what it is and preventing the use of more math notation use in some other resources.

Briefly, backpropagation can be said to be a process of collecting gradient starting from the output to the input, and the gradient collected can be used to update every input/parameters that contribute to the output value, to get a better output value.

So what brings about autograd?. Some times it is easier to calculate the backprop for some non-complex computational graph, but it becomes more tedious and prone to error for a complex computational graph since they have to be done manually. To prevent error during the calculation of backprop and to make it easier to generate the backprop for more dense function, Autograd was introduced.

The idea behind Autograd is that since all complex computational graph are made up of simple ones and these simple ones are made up of basic math function, and there are also a lot of repetition of the same method during backprop, then it was realized it is possible to abstract the backprop process, this idea introduces the concept of Object-Oriented Programming (OOP) into the backprop process, to abstract the process.

NOTE; there is a mathematical introduction of automatic gradient differentiation here , the illustration I made is just to think of it in terms of programming concept.

Using the idea of OOP, we can abstract the mathematical view of some maths functions and even tensor itself, and start thinking of them, in terms of Object, having some properties.

To understand the concept deeply, I will not start with tensors but with a static object and from there we can build the concept into a tensor and then into a more complex function like Neural network.

c231n note on backprop

Remember, I said we will not be seeing those mathematical functions like addition, multiplication, or tensor as maths variable but as Object. Ideally, a language like python makes it easier to view some function like addition, multiplication e.t.c as a property of a tensor. But in javascript will don’t have such privileges, and this will force us into thinking of different ways to represent this maths function and even tensors itself, which we will get to know later.

Our main goal is to achieve the code snippet below for the above figure ;

var x = new Tensor(-2,true);var y = new Tensor(5,true);var z = new Tensor(-4,true);var q = new add(x,y);var f = new multi(q,z);f.grad(1)
f.backward()

Since we are thinking of it as Object, let start by creating a Tensor function, but this tensor function will only be taking single-value, so you can see it has a single-value function, but I still name it tensor to follow the pattern of PyTorch

function Tensor(arr,require_grad){
this.item = arr;
this.require_grad = require_grad; this.gradv = 0;}Tensor.prototype = { grad: function(g){ this.gradv = g;
}
}

The above concept will be familiar to PyTorch users. The item property stands for the value of the tensor and require_grad is a boolean property of the tensor, it determine if gradient is to flow in or not. And for those already very familiar with Neural network , the ideology of freezing layer comes from setting the require_grads to false . The gradv store the gradient belonging of flowing into a tensor. And the grad function property helps to set the gradient or can be seen as helping to collect the gradient to be flown in.

We can now create the Object/class for some basic function, and since the example, we are using make use of addition and multiplication we will only create a class for them.

function add(x,y){     this.x = x;
this.y = y;
this.require_grad=true;
this.item = x.item + y.item
this.gradv = 0;
}add.prototype = { backward: function(){ if(this.x.require_grad){
this.x.grad(1*this.gradv);
if("backward" in this.x){
this.x.backward()
}
}
if(this.y.require_grad){
this.y.grad(1*this.gradv);
if("backward" in this.y){
this.y.backward()
}
}
},
grad: function(g){
this.gradv = g;
}
}

Note that the add object contains backward function and the Tensor does not this because, we do not differentiate a value in calculus but a function. the idea is you can’t differentiate just x but you can for f(x) = x , that’s the main reason the tensor does not have backward function, since gradient only flow into them.

The add function takes in two variables, and the gradient for add function is 1 (if u’ve gone through the cs231n material). The gradient flowing into the add object gradv is multiplied by 1. And you see that we calculated the gradient for the two input, by first checking if the require_grads is set to True and if true the grad of the input is set, and then also check if the input contains backward property.

The same structure used to create the addition object , will be used to create all other math operators class. Hence, let create an object for Multiplication.

function multi(x, y) {     this.item = x.item * y.item;
this.x = x;
this.y = y;
this.gradv = 0;
this.require_grad = true;
}
multi.prototype = {
backward: function () {
if (this.x.require_grad) {
this.x.grad(this.y.item * this.gradv);
if ("backward" in this.x) {
// console.log("True")
this.x.backward()
}
}
if (this.y.require_grad) {
this.y.grad(this.x.item * this.gradv);
if ("backward" in this.y) {
this.y.backward()
}
}
}, grad: function (g) { this.gradv = g;
}
}

We can see that the code has the same structure for multiplication, but the backward function is different, the backprop of an input x is the value of the other input y .

Having created this function is now possible for us to implement the first set of code snippet that was shown.

var x = new Tensor(-2,true);var y = new Tensor(5,true);var z = new Tensor(-4,true);var q = new add(x,y);var f = new multi(q,z);console.log(f.item) output: -12

Since f is an object, let see what it is made up, this might gives us a better look in to what the object is all about.

{ item: -12,
x:
{ x: { item: -2, require_grad: true, gradv: 0 },
y: { item: 5, require_grad: true, gradv: 0 },
require_grad: true,
item: 3,
gradv: 0 },
y: { item: -4, require_grad: true, gradv: 0 },
gradv: 0,
require_grad: true }

You can see that this object f contains is own item property which is -12 and also contains the input which x and y and x is an operator containing input x and y . To see the name of this operator in the output lets add the below property to the operator and the tensor object. With good visualization tool we should be able to plot a graph base on the node.

this.name = "name of operator or tensor"

Adding the above property to each of the object we can now see what x and y is;

{ item: -12,
x:
{ x: { item: -2, require_grad: true, gradv: 0, name: '<Tensor>' },
y: { item: 5, require_grad: true, gradv: 0, name: '<Tensor>' },
require_grad: true,
item: 3,
gradv: 0,
name: '<Add>' },
y: { item: -4, require_grad: true, gradv: 0, name: '<Tensor>' },
gradv: 0,
require_grad: true,
name: '<Multi>' }

You can see that the grad are still set to 0 this because we’ve not backprop the function.

Now let see what happens when we backprop the object f

f.grad(1)
f.backward()
console.log(f)

The gradient of the f object is first set to one, this because the differentiation of a function w.r.t itself is 1 and but in building neural network we don't need to set our output gradient to 1.

The output of the previous code block give us;

{ item: -12,
x:
{ x:
{ item: -2, require_grad: true, gradv: -4, name: '<Tensor>' },
y: { item: 5, require_grad: true, gradv: -4, name: '<Tensor>' },
require_grad: true,
item: 3,
gradv: -4,
name: '<Add>' },
y: { item: -4, require_grad: true, gradv: 3, name: '<Tensor>' },
gradv: 1,
require_grad: true,
name: '<Multi>' }

Now you can checked the image example we are trying to implement again and you will see that they have the same gradient value. The <Add> object has it own gradient and its input x and y has their own gradient.

It is possible for us now to access the gradient of the individual operator and input

console.log(x.gradv)//-4

You can try to check the gradv for each of the variable that was being created. To better understand the idea of require_grad let set the input x require_grads to false.

var x = new Tensor(-2,false);var y = new Tensor(5,true);var z = new Tensor(-4,true);var q = new add(x,y);var f = new multi(q,z);f.grad(1)
f.backward()
console.log(x.gradv) //output is 0
console.log(f)
{ item: -12,
x:
{ x:
{ item: -2, require_grad: false, gradv: 0, name: '<Tensor>' },
y: { item: 5, require_grad: true, gradv: -4, name: '<Tensor>' },
require_grad: true,
item: 3,
gradv: -4,
name: '<Add>' },
y: { item: -4, require_grad: true, gradv: 3, name: '<Tensor>' },
gradv: 1,
require_grad: true,
name: '<Multi>' }

The output gradient of add did not flow into x.

I think by now you’ve gotten the full gist of autograd, by first solving a simpler problem, now we can move into the Tensor itself instead ofthis single-value function we implemented.

The Concept

  1. All tensors in autograd has a grad_fn method(function) that collect gradient flow. And they also have grad properties that store the incoming gradient. require_grad is used to determine if gradient is to flow into the tensor.
  2. All mathematical operators contains a forward pass which is the actual computation w.r.t what the operator is, e.g if the operator is addition then the forward pass is about adding two numbers together.
  3. And also, the mathematical operator, contains backward pass; in which the backprop is being calculated and gradient are being assign to the tensors input. For example the gradient of a an add Object is 1 * the incoming gradient flow.
  4. The mathematical operator also have a grad_fn method which collect gradient inflow and assign them to the grad property of the operator.

Tensors

It was very much easy to experiment with single-value function, but becoming difficult when we start to think of tensor. Python and libraries like Numpy make it easier to experiment with tensor, but javascript does not allow such, and as I said earlier, we will be forced to think of matrix in another form, to aid the implementation in javascript.

The first mistake I made earlier while trying to create tensors in javascript was thinking of tensor has a nested list, like the way they are implemented in python as;

array([
[2, 3, 4],
[5, 6, 7],
[8, 9,10]
])

This made it much more difficult to implement dot product in js, although it was possible but difficult. But after studying the work of Andrej Karpathy, I found out that I can think of matrix in another form.

And the best form to think of it is in terms of a flatten matrix.

From the above images, you can see how a nested list (Matrix ) flattens out. The main question is how do we access the matrix row and columns, in the order in which they have been created in the original matrix.

Let create the Tensor object and it properties

var Tensor = function (n, d, require_grad) {
this.n = n;
this.d = d;
this.out = zeros(n * d);
this.dout = zeros(n * d);
this.require_grad = true;
}
Tensor.prototype = {
get: function (row, col) {
var ix = (this.d * row) + col;
assert(ix >= 0 && ix < this.w.length);
return this.out[ix];
},
set: function (row, col, v) {
var ix = (this.d * row) + col;
assert(ix >= 0 && ix < this.w.length);
this.out[ix] = v;
},
setFrom: function (arr) {
for (var i = 0, n = arr.length; i < n; i++) {
this.out[i] = arr[i];
} }, randn: function (mu, std) { fillRandn(this.out, mu, std); return this;
},
grad: function (grad) { this.dout = grad; } }

In the tensor object n stands for the number of rows and d stand for the number of columns , it also represents depth in the case of 3dim. And we set the out going matrix and it gradient to zero, using the zeros utility function.

NOTE: the best way to go through this, is just to follow the concept through, the code block for the utility function will not be pasted, but you can get them here

Our main focus now should be on the get and set property, this shows how the tensor value is accessed. To get a better view of it, let work with a basic example

//given a matrix 3 x 3[[2, 3, 4],
[5, 6, 7],
[8, 9, 10]]

How aim is to reduce the above matrix to;

[2,3,4,5,6,7,8,9,10]

So to access this array, we multiply the number of columns and the row together and then move in the direction of the column.

(col * row) + col_i

The above code matters when doing dot multiplication. Now let see how it works

//for 3 x 3 matrix we have n_col = 3, n_rows = 3
//hence the length of the matrix is 9
//to access any of the value, let say we want to access all values
//for row 0
//values for row 0
3 * 0 + 0 = 0
3 * 0 + 1 = 1
3 * 0 + 2 = 2 // this are the index of the value of row 0
//value for row 1
3 * 1 + 0 = 3
3 * 1 + 1 = 4
3 * 1 + 2 = 5
//value for row 3
3 * 2 + 0 = 6
3 * 2 + 1 = 7
3 * 2 + 2 = 8

The above code loop through the column and add the index to the multiplication of columns and rows.

Since the tensor object has been created, we can now create the operator object for add and multi has we did for the sing-value function(Tensor) we created before.

function add(x, y) {
assert(x.w.length === y.w.length);
this.items = new Mat(1,x.w.length);
for (var i = 0; i < x.w.length; i++) {
this.items.out[i] = x.out[i] + y.out[i];
}
this.x = x;
this.y = y;
this.require_grad = true;
this.out = this.items.out;
this.dout = this.items.dout;
this.n = this.items.n;
this.d = this.items.d;
}
add.prototype = { backward: function () {
if (this.x.require_grad) {
this.x.grad(this.dw);
if ("backward" in this.x) {
this.x.backward()
}
}
if (this.y.require_grad) {
this.y.grad(this.dw);
if ("backward" in this.y) {
this.y.backward()
}
}
},
grad: function (g) {
assert(this.items.dw.length === g.length);
this.dw = g;
}
}

In this new tensors and operator object, Istopped using gradv to store the gradient, I use dout just to make you know that the gradient is the gradient of the out which is the output of the function. The new Mat object is just to create a storage matrix. check here to see the utility functions.

And let create the dot product object called Multid

function Multid(x, y) {
assert(x.d === y.n, "matmul dimension misaligned");
this.n = x.n;
this.d = y.d;
this.x = x;
this.y = y;
this.require_grad = true;
this.items = new Mat(this.n, this.d);
this.out = this.items.out;
this.dout = this.items.dout
for (var i = 0; i < x.n; i++) { for (var j = 0; j < y.d; j++) { var dot = 0.0;
for (var k = 0; k < x.d; k++) {
dot += this.x.w[x.d * i + k] * this.y.w[y.d * k + j];
}
this.w[this.d * i + j] = dot;
}
}
}
Multid.prototype = {
backward: function () {
if (this.x.require_grad) {
for(var i = 0;i< this.x.n;i++){
for(var j=0;j<this.y.d;j++){
for(var k =0;k<this.x.d;k++){
var b = this.dout[this.y.d*i+j];
this.x.dout[this.x.d*i+k] += this.y.out[this.y.d*k+j] * b;
}
}
}
if ("backward" in this.x) {
this.x.backward()
}

if (this.y.require_grad) {
for(var i = 0;i< this.x.n;i++){
for(var j=0;j<this.y.d;j++){
for(var k =0;k<this.x.d;k++){
var b = this.dout[this.y.d*i+j]; this.y.dout[this.y.d*k+j] += this.x.out[this.x.d*i+k] * b;
}
}
}
if ("backward" in this.y) {
this.y.backward()
}
}
},
grad: function (g) {
assert(this.dw.length === g.length);
this.dw = g;
}
}

Now we can create a simple dot product and backprop through it

a = new TensorD(2,3)
a.setFrom([2,3,4,5,6,7])
b = new TensorD(3,4)
b.setFrom([4,5,6,7,8,9,1,2,4,7,1,4])
mult = new MultiD(a,b)//output
[48, 65, 19, 36, 96, 128, 43, 75]

The output [48, 65, 19, 36, 96, 128, 43, 75] is originally represented as:

[[48, 65, 19, 36], 
[96, 128, 43, 75]
]

We can then run the backward

gradm = new TensorD(2,4)
gradm.setFrom([1,1,1,1,1,1,1,1])
mult.grad(gradm)
mult.backward()

You can check the gradient of a which is the dot product of mult.dout and the transpose of b.out , which will give us

[22, 20, 16, 22, 20, 16][[22, 20, 16], 
[22, 20, 16]
]

Note that the out and dout are of the same shape.

I think this simple approach and example should get us started with the concept of Autograd.

But we can then go further to create a simple neural network in by creating some basic Layer Object like, Linear, ReLu, and Softmax and also we can go further to create the Loss and the optimization function. The below code block shows a simple code snippet of that;

var model = new Sequential([
new Linear(2,3),
new ReLU(),
new Linear(3,2),
new Softmax()
]);
var x = new Tensor(1,2,require_grad=true)
x.setFrom([2,3]);
model.forward(x)var loss = new Loss(1,model.out)
loss.backward()
optim.step()

The code snippet should be familiar to pytorch users.

I could go on to paste the code to follow through the building of the above code block, but doing that we make this post look long and the javascript format here is not fancy.

Building a full deep learning framework with autograd follows the same steps.

so to get the full gist of how the pytorch like frame work is built with javascript you can check the reactive document here to read and code alongside or you can check it on github.

--

--