# Override Tensorflow Backward-Propagation

Tensorflow is a great tool that works with deep learning. There are a lot of operations that you easily can implement and make good model that solves needed problems. But sometimes there are models that have own specific computations.

I faced with one of them. When you create neural network architecture you can find that some operations do not flow in backward-propagation. On a piece of paper you can compute gradient and derive the formulas that are participated in backward-propagation, but Tensorflow due to its complexity cannot resolve the gradient and as a consequence you cannot train neural network.

As I said Tensorflow is a great tool and it proposes the ability to override gradient and write own custom gradient that can flow in backward-propagation. But how to do it and make it really works?

Firstly, I will explain it using simple network just for understanding this process. And then I will show real case where I had to have to override the gradient.

Simple network:

stddev = 1e-1
labels = [0, 1]
b = [1, 2]
labels_ph = tf.placeholder(shape=(2), dtype=tf.int32)
b_ph = tf.placeholder(shape=(2),dtype=tf.float32)
c = tf.Variable(tf.truncated_normal(shape=[2], stddev=stddev))
k = tf.Variable(tf.truncated_normal(shape=[2], stddev=stddev))
a = tf.multiply(d, e)
loss = tf.losses.softmax_cross_entropy(logits=a, onehot_labels=labels_ph)
opt = tf.train.AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=0.1)

Imagine that this part cannot be calculated in backpropagation — Tensorflow returns None for such gradients:

It means that it’s not so obvious for Tensorflow how to compute the gradient in backpropagation. But you know that this is a differentiated model and on a piece of paper you can derive the formula. So what you need — just tell Tensorflow how to do it.

1. tf.RegisterGradient allows overrides the gradient: for this operation create a method where you setup the custom gradient.
return py_func(forward_func,
[c, k, d],
[np.float32],
graph,
name=name,

def py_func(func, inp, Tout, graph, stateful=True, name=None, grad=None):

# Need to generate a unique name to avoid duplicates
# if you have more than one custom gradients:

return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

As you see Tensorflow has own wrapper for python functions — tf.py_func. But at the same time it means that forward propagation you have to implement without Tensorflow operations. Because if you use Tensroflow operations in function that afterwards you pass to tf.py_func — it’s going to fail.

I couldn’t find the way how to implement forward propagation using Tensorflow operations, but I think it should be for having better performance. Please, share with me if you know.

2. Forward and Backprop methods:

a) What should be in forward pass (forward_func):

In forward_func you have to implement forward propagation using python operations.

def forward_func(c, k, d):
return e.astype(np.float32)

As you see d variable is redundant in this method, but you cannot avoid using it, because it will be needed in backprop. It means that forward and backprop are at the same bunch — they share variables. You can notice that in py_func you declare all variables that are needed in both methods.

b) What you want to get in backpropagation (backprop_func):

Finally, in backprop_func we can implement the custom gradient. But custom gradient you do not pass to any wrapper, so you have to do it with Tensorflow operations.

c = op.inputs[0]
k = op.inputs[1]
d = op.inputs[2]

Notice, that we have to return gradient for each variable that was passed to that function — it is partial derivatives. I do not show how to calculate the gradient for each variable. Please do it by yourself.

op contains all passed variables; grad — the flown gradient from the back propagation.

3. Then explicitly call compute gradients and apply them when you initialize graph:

And replace

with