More Deep Learning. Less crying -> A guide

Subhaditya Mukherjee
Mar 5 · 6 min read

This is a guide to make deep learning less messy and hopefully give you a way to use less tissues next time you code.

Sad robot image


If you can answer yes to most of them. Read. Or cry. Your choice of course.

  • Do you work with deep learning models?

Oh yay. You made it here. Wipe your eyes for one last time because this will be a ride :)

PS. This might be a long-ish checklist but trust me, it will save you many tears. A note that the materials were compiled from way too many papers and slides so I do not have the proper citation for every statement here. In the references section you can find a list of all the ones I could find.

What you get from this article. No nonsense edition.

In this article, I have tried to cover the major parts that frustrate me on a daily basis and their potential solutions.

  • This is platform independent. So it does not matter if you are using pytorch/tensorflow/caffe/flux.jl or any of the others.

Some sensible defaults

Most of the time, contrary to popular belief we can actually get pretty great results by using some default values. Or sticking to simpler architectures before using some complicated one and messing everything up.

For the network

Let us look at some defaults we can look at while building a network. Note that this goes from easy -> complicated

  • Dataset with only images : Start with a LeNet like architecture -> ResNets -> Even more complicated ones

For training

What about training? Once you have set up everything, you might be faced with endless options. What do you stick to?

  • Optimizer : Honestly, stick to an Adam optimizer with lr = 3e-4. (Or use AdamW+ learning rate finder)

Some tricks that will make you scream in joy

the matrix
A visual description of your tears

Do give this paper by Tong He et al a read. It's amazing and covers these points in detail. So instead of repeating content, I have just given a tiny brief.

  • Learning rate finder : Why use a constant learning rate when you can vary it and identify the one which does the best. In a fraction of time.

Mom save me. There are too many hyper-parameters.

Here are some you can look at in order of importance. (Thank you Josh Tobin).

  • Spend most of your time on these : Learning rate, Learning rate schedules, Loss function and finally the Layer size.

I see a bug. HELP.

Some of the most common bugs we might face and how to begin solving them.

  • Incorrect tensor shapes : Use a debugger.

Sometimes your GPU starts cursing at you. Sometimes it's your fault. Sometimes you just forgot the clear the cache. This is for the other times.

Your tensors are too big

  • Reduce your batch size

You have stuffed it with too much data

  • Use an input queue of sorts (Dataloader)

You are doing the same thing too many times (Duplicated operations)

  • Memory leaks

My manager wants it today. Save me quick.

Want a quick way to identify a bunch of errors? Just pass the same data batch again and again. And check for these signs. (Talk about a hack). Basically just do the opposite if any of these happen.

Error goes up dramatically

  • Flip signs of loss functions or gradients

Your error pretends to be a pinata and explodes

  • Check all your log, exp functions etc

Oscillating error

  • Corrupted labels


  • Might be too low a learning rate

How well do you fit?

No I am not talking about that snazzy dress you got before the lock down.

Your model cries over test data.

  • Bulk up. Add more layers etc.

Your model just cries anyway.

  • Add more data to your training

Whats next?

Firstly. Thank you. And congratulations. You have taken a huge step towards better models. Cheers!

Well that about covers what I wanted to say here. It is by no means an exhaustive list. But that's why we have stack overflow right? I sincerely hope this helped you out a bit. And made you feel a bit more confident. Do let me know!! You can always reach out in the comments or connect with me from my website.


Do look at them if you want to learn more. These are the greats :)

  • Full Stack Deep Learning bootcamp.

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To stay up to date on other topics, follow us on LinkedIn.

Subhaditya Mukherjee

Written by

I am a dreamer and coder. Using my computer to get my thoughts to reality and trying to make the world better, one smile at a time :)

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To stay up to date on other topics, follow us on LinkedIn.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store