Practical Pieces of Advice for Structuring Deep Learning Based NLP Experiments
Here I will share some techniques I follow when I work on any Natural Language Processing projects that require Deep Learning. I believe these will also help mostly the beginners and also fellow research students like me.
- Visualize whatever you can: Deep neural nets are often indicated as ‘Black Box’. Because they work with millions of floating point parameters and it is not an easy task to make sense of what's going on with these numbers. But when we think about Science, one of the most fundamental things we always care about is WHY. Why it is working and why not should be our primary concern during experiments. I understand it is not a very easy task to do as it sounds. But adopting the habit to analyze helps in the long run. Knowing the strengths and weaknesses of a model can help us to improve. It is recommended to keep the log and plot the following things in runtime:
a) Training and validation loss
b) Training and validation accuracy (Problem dependent. Can be F1 score or other metrics. Typically I try to plot precision and recall too sometimes. You never know what helps.)
c) Learning rate (If it is not fixed over epochs)
d) Layer weights, biases, and gradients (These helps a lot to understand what's going on with the network, how it is optimizing, which layers are crucial to optimization, is there any vanishing gradient problem, etc.)
I used to plot such things using Matplotlib, but now I use Tensorboard with PyTorch and try to plot everything I can. It’s really cool and helpful.
2. Know the basics: An easy approach to do any DL project is to randomly copy-pasting available code snippets from the web and run them. Although it may help sometimes to save time, if you are a beginner I would strongly recommend you to take some time and learn how it works and implement yourself. It's very obvious that your code can be buggy, you might not be able to understand some concepts, or it is difficult. But trust me, if you thrive to understand things properly and make your code work, you will thank yourself later. It is not necessary to make everything perfect at first. Go slow, know the technique, you will reach perfection.
There are a lot of attractive names to utter in this area (ConvNet, LSTMs, GRU, ResNet, VGG, etc). Spend some of your time to get to know how they actually work, try to understand the math (Not 100% necessary, but 100% beneficial if you understand them), draw some diagrams, think about them. You will start visualizing them soon. I know it is difficult at the beginning, but worth it. Spend your time on knowing the basics, not just running codes using libraries. For example, try to know how different loss functions work. It will help to pick the better one for different problems and probably create a new one (Share with us when you do!).
I highly recommend implementing the backpropagation algorithm using very minimal resources (Numpy). Write the training loop yourself, create a progress bar for that, create checkpointing routines, code a loss function, do the batching yourself. Trust me, it helps!
From experience: I used Keras at the very beginning with Theano as the backend. It's a very nice framework, allowed me to quickly prototype anything, I really loved it. I followed some shortcuts to do some quick experiments without debugging the model. Later when I started PyTorch, I had to struggle for a couple of days to match the dimensions between some layers (!), which Keras handled itself. I was so frustrated. But I took the time to fill up my holes in understanding. Now I am more confident because I know things.
Using a framework reduces a lot of efforts, but taking some time to know how the modules are working is really important. What seems easy now, might not be very fruitful for the future.
3. Create your own framework: It is worthy to make a goal for own to create a nice experimentation framework. It is not needed to create it first and then start experimenting. But from the very beginning, it should be in our mind that I will reuse my code later for other projects as a pipeline. It allows us to think about a good design, write small, simple, and modular functions, functionally separable dedicated scripts and modules, dynamic pre-processing codes, and most importantly pushes to learn a lot. Try hard to make your code dynamic. Take time to look at open source codebases, learn good strategies, adopt them.
When you will know every nut and bolt of your big codebase, you will feel really confident.
4. Use a tiny amount of data to test your framework: DL projects are good enough to make you waste your time running things correctly. For example, you created a new model, but it's not working or after running a single epoch the training stopped due to a small bug. With a big dataset, a single epoch can consume a large amount of time to train. So try to use a few samples of your data (e.g. 100 samples with batch size 8), check if your model works on it and gets overfitted (that means the model can memorize the data, now with more data it can generalize), your whole framework works find for training, validating, and logging information, etc. You can just set a variable to determine the mode of training. If it is in test mode, it will work on the small data, otherwise the full dataset.
5. Preventing overfitting: There are proven techniques like dropouts, regularization, normalization to prevent overfitting. Try them with different parameters, but know what are you doing. If you are setting 0.3 as dropout, know what it means. If you are using batch normalization and dropout after any layer, know which ones to use first. Apart from reading from previous works, just run two separate experiments by changing the order. Observe, what happens. Try to know what is the meaning of the parameters like momentum, beta in the optimizers, lambda in l2-regularization. Getting an idea about these will help you to come up with a good parameter setting for good generalization.
6. Use data generators: Although I suggest you to write data loading and batching process by yourself at first, for bigger projects I recommend to use data generators from frameworks. They help a lot with memory problems, batching, and shuffling.
Avoid preprocessing the data for every run. Preprocess them and dump as JSON or pkl files. It will save time. Now a days, I use JSON files most as they are human readable.
7. Scripts vs Jupyter Notebook: I do not use Jupyter notebooks for training any models. I only use them to analyse data and visualizing things (sometimes for pre-processing data). I won’t go into details, why. Because if you can spend some time on this slides, you will get it.
8. Keep backup: It is very normal to get lost in hundreds of experiments, model configurations, and results. Apart from using Git, what I personally do is, for each run automatically create a directory with the run name, keep back up of the source files that were changed, keep logs, predictions for training and validation data, necessary checkpoints.
9. Keep notes: We forget easily. So write appropriate comments in your code, keep a log of experiments (what did you change and what happened). Write the details of preprocessing steps and data statistics.
Acknowledgement: I learned most of the things mentioned here from reading books, online blogs, my co-workers at Intel AI during last summer, and my fellow labmates in RiTUAL group. Thank you all.
It is very normal that I do not know everything, maybe I am wrong in several places in this write-up. Please let me know if I need to double check something, or you can contribute here by letting us know something that helped you.
Thank you for reading!