prachi lalThe Curse of Variance ScalingThrough this content, I want to specifically target the reason behind why self attention is also called as ‘scaled dot product attention’.Jul 6Jul 6
prachi lalSOLID Principles vs ScriptingDuring the course of my internship at Bandhan Life, I was assigned a task to restructure my code according to best practices used in…Jun 30Jun 30
prachi lalNon-Linearity in Activation FunctionsOften we hear that activation functions need non-linearity. To understand this lets first discuss the whats and whys about linearity.Jun 27Jun 27