Really good write up.
Sachin Abeywardana


Thanks for reading. The step function is applied to each time step (this is technically a for loop.)

Regarding point #1, _stm is of size (batch_size, dim, timesteps) because we use K.repeat to effectively implement equation 1. Equation 1 is calculated for every character j in the input sequence using St-1. The purpose of _stm is to allow us to vectorize the calculation so we can do the calculation for all of the input sequence easily.

Regarding point #2, were you able to reconcile the equations 4–6? The only for-loop here is that step is applied to every element in the input sequence (over index t)

Regarding point #3, this is a good idea! If you want to take a stab at it please do make a PR or you can open an issue and I’ll work on it soon :)


Like what you read? Give Zafarali Ahmed a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.