Understand Cross Entropy Loss in Minutes
We have always wanted to write about Cross Entropy Loss. It is only a natural follow-up to our popular Softmax activation article. They are best buddies. It was so strange to write one about Softmax without mentioning cross entropy loss! We also listen to our readers in our vote for article pipeline article. Our goal is to write as many beginner friendly ML, DL, reinforcement learning and data science tutorials as possible. The next step is to write about machine learning papers and write about machine learning in Chinese and Japanese! If you like the ideas, please vote in our pipeline article. As usual, if you want, you can email us to get a copy of the article. Want to be an Uniqtech student scholar? Write us here, message us on twitter X, we will give you some articles for free in exchange of your feedback and/or email signup. We will assume you know what Softmax is. If not, please read our previous article. It’s the most popular on the internet.
There is binary cross entropy loss and multi-class cross entropy loss. Let’s talk about the cross entropy loss first, and the binary one will hopefully be an afterthought. So now you know your Softmax, your model predicts a vector of probabilities[0.7, 0.2, 0.1]
Sum of 70% 20% 10% is 100%, the first entry is the most likely. And yes your true label says [1, 0, 0]
— definitely a cat, not a dog at entry 2, and definitely not a bird at entry 3. So how well did your model’s probability prediction do? A little linear algebra will help — dot product in particular! The dot product is the sum of the…