When it comes to implementation, we can deal with matrix multiplication as a part of graph. If we consider each time step as an independent observation, we can consider each linear transformation as a fully connected layer without bias. In that case, batch size would be inflated n times. We have to use reshaping techniques for this a…