
…arametrization has a major disadvantage that limits its usefulness in cases of very large corpuses. Specifically, we notice that in order to compute a single forward pass of our model, we must sum across the entire corpus vocabulary in order to evaluate the softmax function. This is prohibitively expensive on large datasets, so we look to alternate approximations of this model for the sake of computational efficiency.
… we launched our MVP. We stopped talking to our customers and relied solely on metrics to guide us. The problem with metrics is that they can only tell you what’s going wrong, but not why. We kept guessing at what was going wrong, but nothing we did worked. It was only when we started ta…