Rahul MekaWhat are self-attention models?In the early days of the NLP, wherever long-term dependencies are involved we have suffered the vanishing gradient problem even with the…Mar 3, 2019Mar 3, 2019