Really good write up. I do have a few questions though. It’s all surrounding repeating some variables
- Is `_stm` of size (batch_size, dim, timesteps) where I’m assuming `stm` was of size (batch_size, dim)
- I’m trying to reconcile st = (1-zt)*stm + zt * s_tp with eq 7. **Is there a for loop in here somewhere so that you find st for all the timesteps?** I understand that you did repeat before, but that was for _stm, not stm.
- Would be super awesome if you could comment on your github repo the output sizes of _stm, zt etc.