Self Driving Cars Course # 8 — Miniflow: Our own Tensorflow (Part 1)

Chandan Verma
Autonomous Machines
13 min readMar 25, 2019

Welcome to the Self-driving car course part 8. This blog course will introduce us to the world of self-driving cars, how do self-driving cars work, self-driving cars pros and cons, what are self-driving cars companies

What we learned ???

We have covered most of the sections that are required. If you haven’t read it already you can visit part 1, part 2, part3 and part4.

What’s coming up

This section will give you information about:

  • Miniflow

In this lab, you’ll build a library called Miniflow which will be your own version of TensorFlow!

TensorFlow is one of the most popular open source neural network libraries, built by the team at Google Brain over just the last few years.

Following this lab, you’ll spend the remainder of this module actually working with open-source deep learning libraries like TensorFlow and Keras. So why bother building MiniFlow? Good question, glad you asked!

The goal of this lab is to demystify two concepts at the heart of neural networks — backpropagation and differentiable graphs.

Backpropagation is the process by which neural networks update the weights of the network over time.

Differentiable graphs are graphs where the nodes are differentiable functions. They are also useful as visual aids for understanding and calculating complicated derivatives. This is the fundamental abstraction of TensorFlow — it’s a framework for creating differentiable graphs.

With graphs and backpropagation, you will be able to create your own nodes and properly compute the derivatives. Even more importantly, you will be able to think and reason in terms of these graphs.

Now, let’s take the first peek under the hood…

Miniflow

Notice that the output layer performs a mathematical function, addition, on its inputs. There is no hidden layer.

Graphs

The nodes and edges create a graph structure. Though the example above is fairly simple, it isn’t hard to imagine that increasingly complex graphs can calculate . . . well . . . almost anything.

There are generally two steps to create neural networks:

  1. Define the graph of nodes and edges.
  2. Propagate values through the graph.

Miniflow works the same way. You’ll define the nodes and edges of your network with one method and then propagate values through the graph with another method

Let’s consider how to implement this graph structure in Miniflow. We’ll use a Python class to represent a generic node.

We know that each node might receive input from multiple other nodes. We also know that each node creates a single output, which will likely be passed to other nodes. Let’s add two lists: one to store references to the inbound nodes, and the other to store references to the outbound nodes.

class Node(object):
def __init__(self, inbound_nodes=[]):
# Node(s) from which this Node receives values
self.inbound_nodes = inbound_nodes
# Node(s) to which this Node passes values
self.outbound_nodes = []
# For each inbound_node, add the current Node as an outbound_node.
for n in self.inbound_nodes:
n.outbound_nodes.append(self)

Each node will eventually calculate a value that represents its output. Let’s initialize the value to None to indicate that it exists but hasn’t been set yet.

class Node(object):
def __init__(self, inbound_nodes=[]):
# Node(s) from which this Node receives values
self.inbound_nodes = inbound_nodes
# Node(s) to which this Node passes values
self.outbound_nodes = []
# For each inbound_node, add the current Node as an outbound_node.
for n in self.inbound_nodes:
n.outbound_nodes.append(self)
# A calculated value
self.value = None

Each node will need to be able to pass values forward and perform backpropagation (more on that later). For now, let’s add a placeholder method for forward propagation. We’ll deal with backpropagation later on.

class Node(object):
def __init__(self, inbound_nodes=[]):
# Node(s) from which this Node receives values
self.inbound_nodes = inbound_nodes
# Node(s) to which this Node passes values
self.outbound_nodes = []
# For each inbound_node, add the current Node as an outbound_node.
for n in self.inbound_nodes:
n.outbound_nodes.append(self)
# A calculated value
self.value = None
def forward(self):
"""
Forward propagation.
Compute the output value based on `inbound_nodes` and
store the result in self.value.
"""
raise NotImplemented

Nodes that Calculate

While Node defines the base set of properties that every node holds, only specialized subclasses of Node will end up in the graph. As part of this lab, you’ll build the subclasses of Node that can perform calculations and hold values. For example, consider the Input subclass of Node.

class Input(Node):
def __init__(self):
# An Input node has no inbound nodes,
# so no need to pass anything to the Node instantiator.
Node.__init__(self)
# NOTE: Input node is the only node where the value
# may be passed as an argument to forward().
#
# All other node implementations should get the value
# of the previous node from self.inbound_nodes
#
# Example:
# val0 = self.inbound_nodes[0].value
def forward(self, value=None):
# Overwrite the value if one is passed in.
if value is not None:
self.value = value

Unlike the other subclasses of Node, the Input subclass does not actually calculate anything. The Input subclass just holds a value, such as a data feature or a model parameter (weight/bias).

You can set value either explicitly or with the forward() method. This value is then fed through the rest of the neural network.

The Add Subclass

Add, which is another subclass of Node, actually can perform a calculation (addition).

class Add(Node):
def __init__(self, x, y):
Node.__init__(self, [x, y])
def forward(self):
x_value = self.inbound_nodes[0].value
y_value = self.inbound_nodes[1].value
self.value = x_value + y_value

Notice the difference in the __init__ method, Add.__init__(self, [x, y]).Unlike the Input class, which has no inbound nodes, the Add class takes 2 inbound nodes, x and y, and adds the values of those nodes.

MiniFlow has two methods to help you define and then run values through your graphs: topological_sort() and forward_pass().

In order to define your network, you’ll need to define the order of operations for your nodes. Given that the input to some node depends on the outputs of others, you need to flatten the graph in such a way where all the input dependencies for each node are resolved before trying to run its calculation. This is a technique called a topological sort.

The topological_sort() function implements topological sorting using Kahn’s Algorithm. The details of this method are not important, the result is; topological_sort() returns a sorted list of nodes in which all of the calculations can run in series. topological_sort() takes in a feed_dict,which is how we initially set a value for an Input node.

def topological_sort(feed_dict):
"""
Sort generic nodes in topological order using Kahn's Algorithm.
`feed_dict`: A dictionary where the key is a `Input` node and the value is the respective value feed to that node. Returns a list of sorted nodes.
"""
input_nodes = [n for n in feed_dict.keys()] G = {}
nodes = [n for n in input_nodes]
while len(nodes) > 0:
n = nodes.pop(0)
if n not in G:
G[n] = {'in': set(), 'out': set()}
for m in n.outbound_nodes:
if m not in G:
G[m] = {'in': set(), 'out': set()}
G[n]['out'].add(m)
G[m]['in'].add(n)
nodes.append(m)
L = []
S = set(input_nodes)
while len(S) > 0:
n = S.pop()
if isinstance(n, Input):
n.value = feed_dict[n]
L.append(n)
for m in n.outbound_nodes:
G[n]['out'].remove(m)
G[m]['in'].remove(n)
# if no other incoming edges add to S
if len(G[m]['in']) == 0:
S.add(m)
return L

The other method at your disposal is forward_pass(), which actually runs the network and outputs a value.

def forward_pass(output_node, sorted_nodes):
"""
Performs a forward pass through a list of sorted nodes.
Arguments: `output_node`: The output node of the graph (no outgoing edges).
`sorted_nodes`: a topologically sorted list of nodes.
Returns the output node's value
"""
for n in sorted_nodes:
n.forward()
return output_node.value

Our miniflow.py is ready as shown below

class Node(object):
def __init__(self, inbound_nodes=[]):
# Nodes from which this Node receives values
self.inbound_nodes = inbound_nodes
# Nodes to which this Node passes values
self.outbound_nodes = []
# A calculated value
self.value = None
# Add this node as an outbound node on its inputs.
for n in self.inbound_nodes:
n.outbound_nodes.append(self)
# These will be implemented in a subclass.
def forward(self):
"""
Forward propagation.
Compute the output value based on `inbound_nodes` and
store the result in self.value.
"""
raise NotImplemented
class Input(Node):
def __init__(self):
# an Input node has no inbound nodes,
# so no need to pass anything to the Node instantiator
Node.__init__(self)
# NOTE: Input node is the only node that may
# receive its value as an argument to forward().
#
# All other node implementations should calculate their
# values from the value of previous nodes, using
# self.inbound_nodes
#
# Example:
# val0 = self.inbound_nodes[0].value
def forward(self, value=None):
if value is not None:
self.value = value
class Add(Node):
def __init__(self, x, y):
# You could access `x` and `y` in forward with
# self.inbound_nodes[0] (`x`) and self.inbound_nodes[1] (`y`)
Node.__init__(self, [x, y])
def forward(self):
"""
Set the value of this node (`self.value`) to the sum of its inbound_nodes.
x_value = self.inbound_nodes[0].value
y_value = self.inbound_nodes[1].value
self.value = x_value + y_value """
def topological_sort(feed_dict):
"""
Sort generic nodes in topological order using Kahn's Algorithm.
`feed_dict`: A dictionary where the key is a `Input` node and the value is the respective value feed to that node. Returns a list of sorted nodes.
"""
input_nodes = [n for n in feed_dict.keys()] G = {}
nodes = [n for n in input_nodes]
while len(nodes) > 0:
n = nodes.pop(0)
if n not in G:
G[n] = {'in': set(), 'out': set()}
for m in n.outbound_nodes:
if m not in G:
G[m] = {'in': set(), 'out': set()}
G[n]['out'].add(m)
G[m]['in'].add(n)
nodes.append(m)
L = []
S = set(input_nodes)
while len(S) > 0:
n = S.pop()
if isinstance(n, Input):
n.value = feed_dict[n]
L.append(n)
for m in n.outbound_nodes:
G[n]['out'].remove(m)
G[m]['in'].remove(n)
# if no other incoming edges add to S
if len(G[m]['in']) == 0:
S.add(m)
return L
def forward_pass(output_node, sorted_nodes):
"""
Performs a forward pass through a list of sorted nodes.
Arguments: `output_node`: A node in the graph, should be the output node (have no outgoing edges).
`sorted_nodes`: A topologically sorted list of nodes.
Returns the output Node's value
"""
val = 0
for n in sorted_nodes:
# print(n.value)
if n.value is not None:
val += n.value
n.forward()
output_node.value = val
return output_node.value

We will try to implement the graph below in python by importing our recently built miniflow library

from miniflow import *x, y = Input(), Input()f = Add(x, y)feed_dict = {x: 10, y: 5}sorted_nodes = topological_sort(feed_dict)
output = forward_pass(f, sorted_nodes)
# NOTE: because topological_sort sets the values for the `Input` nodes we could also access
# the value for x with x.value (same goes for y).
print("{} + {} = {} (according to miniflow)".format(feed_dict[x], feed_dict[y], output))

Learning and Loss

Like MiniFlow in its current state, neural networks take inputs and produce outputs. But unlike MiniFlow in its current state, neural networks can improve the accuracy of their outputs over time (it’s hard to imagine improving the accuracy of Add over time!). To explore why accuracy matters, I want you to first implement a trickier (and more useful!) node than Add: the Linear node.

The Linear Function

A simple artificial neuron depends on three components:

  • inputs, x (vector)
  • weights, w (vector)
  • bias, b (scalar)

The output, o, is just the weighted sum of the inputs plus the bias

Remember, by varying the weights, you can vary the amount of influence any given input has on the output. The learning aspect of neural networks takes place during a process known as backpropagation. In backpropogation, the network modifies the weights to improve the network’s output accuracy. You’ll be applying all of this shortly.

In this next section, you’ll try to build a linear neuron that generates an output by applying a simplified version of Equation (1). Linear should take an list of inbound nodes of length n, a list of weights of length n, and a bias.

class Linear(Node):
def __init__(self, inputs, weights, bias):
Node.__init__(self, [inputs, weights, bias])
# NOTE: The weights and bias properties here are not
# numbers, but rather references to other nodes.
# The weight and bias values are stored within the
# respective nodes.
def forward(self):
"""
Set self.value to the value of the linear function output.

Your code goes here!
"""

output = 0
for i in range(0, len(self.inbound_nodes)):
#print(self.inbound_nodes[i].value)

inValue = self.inbound_nodes[0].value[i]
weight = self.inbound_nodes[1].value[i]
bias = self.inbound_nodes[2].value

output += (inValue*weight + bias)

self.value = output
pass

In the above section, I set self.value to the bias and then loop through the inputs and weights, adding each weighted input to self.value. Notice calling .valueonself.inbound_nodes[0]orself.inbound_nodes[1] gives us a list.

Transform

Linear algebra nicely reflects the idea of transforming values between layers in a graph. In fact, the concept of a transform does exactly what a layer should do — it converts inputs to outputs in many dimensions.

Let’s go back to our equation for the output.

For the rest of this section, we’ll denote x as X and w as W since they are now matrices, and b is now a vector instead of a scalar.

Consider a Linear node with 1 input and k outputs (mapping 1 input to k outputs). In this context, an input/output is synonymous with a feature.

In this case, X is a 1 by 1 matrix.

1 by 1 matrix, with 1 element. The subscript represents the row (1) and column (1) of the element in the matrix.

W becomes a 1 by k matrix (looks like a row).

The result of the matrix multiplication of X and W is a 1 by k matrix. Since bb is also a 1 by k row matrix (1 bias per output), bb is added to the output of the X and W matrix multiplication.

What if we are mapping n inputs to k outputs?

Then X is now a 1 by n matrix and W is a n by k matrix. The result of the matrix multiplication is still a 1 by k matrix so the use of the biases remains the same.

Let’s take a look at an example of n input features. Consider a 28px by 28px greyscale image, as is in the case of images in the MNIST dataset. We can reshape (flatten) the image such that it’s a 1 by 784 matrix, where n = 784. Each pixel is an input/feature. Here’s an animated example emphasizing a pixel is a feature.

Pixels are Features!

In practice, it’s common to feed in multiple data examples in each forward pass rather than just 1. The reasoning for this is the examples can be processed in parallel, resulting in big performance gains. The number of examples is called the batch size. Common numbers for the batch size are 32, 64, 128, 256, 512. Generally, it’s the most we can comfortably fit in memory.

What does this mean for X, W, and b?

X becomes a m by n matrix (where m is the batch size, by the number of input features, n) and W and b remain the same. The result of the matrix multiplication is now m by k (batch size by the number of outputs), so the addition of bb is broadcast over each row.

In the context of MNIST, each row of X is an image reshaped/flattened from 28 by 28 to 1 by 784.

Equation (1) turns into:

Equation (2) can also be viewed as Z=XW+B where B is the biases vector, b, stacked m times. Due to broadcasting, it’s abbreviated to Z = XW + b.

We will have to rebuild Linear to handle matrices and vectors using the venerable Python math package numpy to make your life easier. numpy is often abbreviated as np, so we’ll refer to it as np when referring to code.

I used np.array (documentation) to create matrices and vectors. You’ll want touse,np.dot which functions as matrix multiplication for 2D arrays (documentation), to multiply the input and weights matrices from Equation (2). It’s also worth noting that numpy actually overloads the __add__ operator so you can use it directly with np.array (eg. np.array() + np.array()).

class Linear(Node):
def __init__(self, X, W, b):
# Notice the ordering of the input nodes passed to the
# Node constructor.
Node.__init__(self, [X, W, b])

def forward(self):
"""
Set the value of this node to the linear transform output.

Your code goes here!
"""
#output = 0

X = self.inbound_nodes[0].value
W = self.inbound_nodes[1].value
b = self.inbound_nodes[2].value

Z = np.dot(X,W) + b
#print(X)
#print(W)
#print(b)

#output = np.dot(X,W)
self.value = Z

We will try to import our miniflow file in python and implement it.

import numpy as np
from miniflow import *

X, W, b = Input(), Input(), Input()

f = Linear(X, W, b)

X_ = np.array([[-1., -2.], [-1, -2]])
W_ = np.array([[2., -3], [2., -3]])
b_ = np.array([-3., -5])

feed_dict = {X: X_, W: W_, b: b_}

graph = topological_sort(feed_dict)
output = forward_pass(f, graph)

"""
Output should be:
[[-9., 4.],
[-9., 4.]]
"""
print(output)

Conclusion

We are half way in implementing miniflow our own version of tensorflow. Here we implemented concepts of graphs, forward propagation, Learning and loss, and linear transformation

What’s Next !!!!

Next up will continue the code up section wherein we will be implementing the remaining topics like sigmoid function, cost function, gradient descent, backpropagation and stochastic gradient descent using python.

Till then, lets program self-driving cars using this self-driving cars course.

References:

https://in.udacity.com/course/self-driving-car-engineer-nanodegree–nd013

--

--