When To Multiply Inside Your Neural Network?

Typical neural networks consist of linear combinations of input features and Relu units built upon them, and nothing else. So is there a need to introduce explicit multiplications, either on inputs or inside the network?

But first, let’s consider why you should not multiply inside your neural network. Suppose you have a bunch of features and want to construct arbitrary multiplicative terms. The straightforward thing would be to feed them into the network after applying log(). Multiplications turn to additions, job done! This is useful because of other reasons too. When you multiply a bunch of numbers, the product typically has a wide variance with either large or small values in the range. If you deal with the raw values, then a large region of your interest will likely get squashed. Mapping the features to log space prevents such catastrophes, so that’s definitely the first consideration before injecting multipliers into the network.

Secondly, neural networks can approximate arbitrary functions. And of course, it can approximate a multiplier as well. To see this, we train a single hidden layer neural network to learn multiplication. If you have Tensorflow installed, you can copy-paste the code below and run it.

''' Plot the root mean squared error of a single
hidden layer neural network while modeling x^2.
'''
import tensorflow as tf
import numpy as np
NUM_TRAIN_SAMPLES = 1000000
NUM_TEST_SAMPLES = 100000
feature_columns = [
tf.feature_column.numeric_column(key='a'),
tf.feature_column.numeric_column(key='b')]
a_train = np.random.rand(NUM_TRAIN_SAMPLES)
b_train = np.random.rand(NUM_TRAIN_SAMPLES)
train = tf.estimator.inputs.numpy_input_fn(
x={'a': a_train, 'b': b_train},
y=a_train * b_train, # model a * b
batch_size=128,
shuffle=True)
a_test = np.random.rand(NUM_TRAIN_SAMPLES)
b_test = np.random.rand(NUM_TRAIN_SAMPLES)
test = tf.estimator.inputs.numpy_input_fn(
x={'a': a_test, 'b': b_test},
y=a_test * b_test,
shuffle=False)
range_test = np.arange(0.00, 10.0, 0.01)
ranget = tf.estimator.inputs.numpy_input_fn(
x={'a': range_test, 'b': range_test},
y=range_test * range_test,
shuffle=False)
def estimate_error(num_hidden_units):
model = tf.estimator.DNNRegressor(
hidden_units=[num_hidden_units],
feature_columns=feature_columns)
model.train(input_fn=train, steps=100000)
eval_result = model.evaluate(input_fn=test)
mse = eval_result["average_loss"]**0.5
print('rmse=%f'%mse)
predictions = list(model.predict(input_fn=ranget))
for ip, p in zip(range_test, predictions):
v = p["predictions"][0]
print('x=%f, x^2=%f, model=%f'%(ip, ip*ip, v))
if __name__ == "__main__":
estimate_error(80)

With 80 Relus, we get to a root mean squared error near 0.003, which seems like a reasonable approximation of a multiplier. The code above also feeds numbers in the (0, 1) range to both inputs of the multiplier, essentially evaluating x², and then compares it with the actual value of x². The plot below shows the comparison. Unsurprisingly, the model seems quite good at emulating multiplication.

So the case for multiplication inside the network seems dead? Not quite. Note that the model has just learnt to approximate the output for the examples in the training data, it understands nothing about multiplication. In other words, what the model has learnt doesn’t generalize to true multiplication. To understand the significance of this, we extend the plot above, to values beyond the (0, 1) range that the training data is restricted to. Now the plot looks like:

The neural network can masquerade as a multiplier, but that breaks down quite dramatically as you go beyond the values seen during training. This may be important if your training data doesn’t represent the entirety of the universe in which the model operates. For example, if your model is used in search ranking, your training data may be limited to results from the first page, whereas in reality the model is used to score documents that are 100x in volume, beyond what is logged as training data.

So if you have a strong intuition that multiplication is the right way to model certain relations, then it may be better to explicitly enforce that in the network. The effect of it may not be immediately apparent from the logged data that is split as train and test. An online test may be the best way to confirm your intuition. Best of luck!

See also:

How do neural networks work?

How much training data do you need?