… network, it is possible to obtain the derivative of the cost function with respect to all weights. Training is done by subtracting from each weight a small fraction of its corresponding derivative, called the delta rule. Since everything is known, the network is trainable! But we would like a more efficient notation, since we previously only worked with scalars. Vector a…