How Noise contrastive estimation works part5(Machine Learning future)

Monodeep Mukherjee
2 min readApr 23, 2024
  1. Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation(arXiv)

Author : Bingbin Liu, Elan Rosenfeld, Pradeep Ravikumar, Andrej Risteski

Abstract : Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE’s performance. However, such observations have never been made formal or quantitative. In fact, it is not even clear whether the difficulties arising from a poorly chosen noise distribution are statistical or algorithmic in nature. In this work, we formally pinpoint reasons for NCE’s poor performance when an inappropriate noise distribution is used. Namely, we prove these challenges arise due to an ill-behaved (more precisely, flat) loss landscape. To address this, we introduce a variant of NCE called “eNCE” which uses an exponential loss and for which normalized gradient descent addresses the landscape issues provably when the target and noise distributions are in a given exponential family

2. Understanding Hard Negatives in Noise Contrastive Estimation(arXiv)

Author : Wenzheng Zhang, Karl Stratos

Abstract : The choice of negative examples is important in noise contrastive estimation. Recent works find that hard negatives — highest-scoring incorrect examples under the model — are effective in practice, but they are used without a formal justification. We develop analytical tools to understand the role of hard negatives. Specifically, we view the contrastive loss as a biased estimator of the gradient of the cross-entropy loss, and show both theoretically and empirically that setting the negative distribution to be the model distribution results in bias reduction. We also derive a general form of the score function that unifies various architectures used in text retrieval. By combining hard negatives with appropriate score functions, we obtain strong results on the challenging task of zero-shot entity linking

--

--

Monodeep Mukherjee

Universe Enthusiast. Writes about Computer Science, AI, Physics, Neuroscience and Technology,Front End and Backend Development