Ester HlavinTowards Data ScienceKaiming He Initialization in Neural Networks — Math ProofDeriving optimal initial variance of weight matrices in neural network layers with ReLU activation functionFeb 15, 20234Feb 15, 20234
Ester HlavinTowards Data ScienceXavier Glorot Initialization in Neural Networks — Math ProofDetailed derivation for finding optimal initial distributions of weight matrices in deep learning layers with tanh activation functionDec 23, 20224Dec 23, 20224
Ester HlavinTowards Data Science5 Derivatives to Excel in Your Machine Learning InterviewCalculus behind Machine Learning: Review of Derivatives, Gradient, Jacobian, and HessianSep 2, 20201Sep 2, 20201
Ester HlavinTowards Data ScienceActivation Functions in Deep Learning: From Softmax to Sparsemax — Math ProofComplete mathematical derivation of Sparsemax activation function: Softmax alternative for sparse outputsAug 26, 20202Aug 26, 20202