
… for regularization.
The last recurrent layer is set to return its final state only (one step).
One fully-connected regular layer takes the “hidden” vector of size 512 and brings it back to the size of the vocabulary (98).
One last processing is to apply the softmax function to this vecto…