Ohad RubinThe case for batch size = 1In the year 2045, the world had embraced the marvels of advanced technology, with artificial intelligence seamlessly integrated into daily…2 min read·Jun 15, 2024----
Ohad RubinExploring Weight Decay in Layer Normalization: Challenges and a Reparameterization SolutionI came up with a nice workaround to get weight decay working on LayerNorm. GPT-4 will take it from here:6 min read·May 3, 2023----
Ohad RubinConversations with GPT-4: Weight Initialization with the Truncated Normal distributionOhad: Please elaborate on this answer, and come up with a simple toy example of where the truncated normal distribution is better compared…6 min read·Apr 3, 2023--1--1
Ohad RubinUsing Comet.ml with AllennlpAdd this (somewhere) in your jsonnet config file:1 min read·Oct 5, 2020----