Ohad RubinThe case for batch size = 1In the year 2045, the world had embraced the marvels of advanced technology, with artificial intelligence seamlessly integrated into daily…Jun 15Jun 15
Ohad RubinExploring Weight Decay in Layer Normalization: Challenges and a Reparameterization SolutionI came up with a nice workaround to get weight decay working on LayerNorm. GPT-4 will take it from here:May 3, 2023May 3, 2023
Ohad RubinConversations with GPT-4: Weight Initialization with the Truncated Normal distributionOhad: Please elaborate on this answer, and come up with a simple toy example of where the truncated normal distribution is better compared…Apr 3, 20231Apr 3, 20231
Ohad RubinUsing Comet.ml with AllennlpAdd this (somewhere) in your jsonnet config file:Oct 5, 2020Oct 5, 2020