Tips and Tricks you should know while coding your own Machine Learning Model

Shreyak
BlockSurvey
6 min readMay 1, 2020

--

A machine learning model can be a mathematical representation of a real-world process. To generate a machine learning model you will need to provide training data to a machine-learning algorithm to learn from.

So, while coding your own model there are few things which should be taken into account.

  1. Use sufficient hardware for training a model-Training model usually requires a heavy system which includes high memory (RAM), Graphics Card, Processor. Using low-end hardware can take a lot more time for training and your system may hang and overheat.

According to Anaconda documentation.

Hardware requirements are as follows:-

  • CPU: 2 x 64-bit 2.8 GHz 8.00 GT/s CPUs
  • RAM: 32 GB (or 16 GB of 1600 MHz DDR3 RAM)
  • Storage: 300 GB. (600 GB for air-gapped deployments.) Additional space recommended if the repository will be used to store packages built by the customer. With an empty repository, a base install requires 2 GB.

2. Choose Linux as your Operating System-Python, the undisputed king among the languages used for ML runs best in Linux where all dependencies can be installed with ease. Similar is the case for R and Octave, the other popular languages. Tensorflow, which has become one of the most powerful toolkits for Deep Learning runs best on Linux.

3. Choose Anaconda’s Jupyter Notebook or Google Colab as your IDE-Jupyter notebook is one of the most used IDE for coding and training an ML model when you want to code and train your model locally. But when you want to use the cloud go with Google Colab. Since Jupyter Notebook requires a high specification system as discussed in point 1, whereas for using Google Colab only a web browser is required. It even works with your smartphone web browser.

4. Never try to run the same code cell more than once in Colab as well as in Jupyter Notebook.- It happens many times when you want to re-run the same cell, but while using Colab or Jupyter Notebook, you should restart the kernel before running the code again. This will prevent your model from over-training.

For restarting the kernel go to Runtime option in toolbar and select restart runtime. It will reconnect to the Kernel and your model will not over train with the same line of code. This will basically remove all your variables declarations.

5. Use GPU in place of TPU while using Colab- While training a model like TensorFlow use GPU as your hardware accelerator because it is much faster than TPU or None option available in Colab.

For selecting GPU go to Runtime option in toolbar then select change runtime and you will get the same option as shown in the image below. You can use None option if you are not training any model and want to code a simple program in Python.

6. Fine-tune few layers or only train the classifier -If you have a small dataset and you can also try to insert Dropout layers after convolutional layers that you’re going to fine-tune because it can help to combat overfitting in your network.

7. If your dataset is not similar to ImageNet dataset- You may consider building and training your network from scratch. You can use a pre-trained model for problems related to the text.

8. Always use normalization layers in your network- If you train the network with a large batch-size (say 10 or more), use BatchNormalization layer. Otherwise, if you train with a small batch-size, use InstanceNormalization layer instead. Note that major authors found out that BatchNormalization gives performance improvements if they increase the batch-size and it downgrades the performance when the batch-size is small. However, InstanceNormalization gives slightly performance improvements if they use a small batch-size. Or you may also try GroupNormalization.

9. Use SpatialDropout after a features concatenation- If you have two or more convolution layers (say Li) operate on the same input (say F). Since those convolutional layers are operated on the same input, the output features are likely to be correlated. So that SpatialDropout removes those correlated features and prevents overfitting in the network. Note: It is mostly used in lower layers rather than higher layers.

10. To determine your network capacity-Try to overfit your network with a small subset of training examples. If it doesn’t overfit, increase your network capacity. After it overfits, use regularization techniques such as L1, L2, Dropout or other techniques to combat overfitting.

11. Another regularization technique is to constraint or bound your network weights. This can also help to prevent the gradient explosion problem in your network since the weights are always bounded. In contrast to L2 regularization where you penalize high weights in your loss function, this constraint regularizes your weights directly. You can easily set the weights constraint in Keras.

12. Always shuffle- Your training data, both before training and during training, in case you don’t take benefit from temporal data. This may help improving your network performance.

13. If your problem domain is related to dense prediction (e.g. semantic segmentation), I recommend you to use Dilated Residual Networks as a pre-trained model since it is optimized for dense prediction.

14. pply class-weights- During training if you have a highly imbalanced data problem. In another word, give more weights to the rare class but fewer weights to the major class. The class-weights can be easily computed using sklearn. Or try to resample your training set using OverSampling and UnderSampling techniques. This can also help improving the accuracy of your prediction.

15. Choose the right optimizer- There are many popular adaptive optimizers such as Adam, Adagrad, Adadelta, or RMSprop etc. SGD+momentum is widely used in various problem domains. There are two things to consider:

First, if you care about fast convergence, use adaptive optimizers such as Adam, but it may get stuck in a local minima somehow and provides poor generalization (Figure below).

Second, SGD+momentum can achieve to find a global minima, but it relies on robust initializations and it might take longer than other adaptive optimizers to converge (Figure below). I recommend you to use SGD+momentum since it tends to reach better optima.

16. Use Max-pooling before ReLU -to save some computations. Since ReLU thresholds the values with zero: f(x)=max(0,x)and Max-pooling pools only max activations: f(x)=max(x1,x2,...,xi), use Conv > MaxPool > ReLUrather than Conv > ReLU > MaxPool.

Conclusion

I have mentioned 16 differents point which should be taken care of while training your own model from scratch. If you have more ideas and tricks feel free to comment below.

Thanks for reading this article. Requesting you to share your thoughts in the comments.

About Blocksurvey

BlockSurvey is a privacy-focused platform to create surveys, polls, & forms with complete confidentiality. Through BlockSurvey, all your data is encrypted end to end and only you can see it. You own your data. It’s your digital right. There are zero trackers and we keep you as anonymous to the data collectors. Our platform utilizes Blockstack and helps in maintaining privacy and anonymity, resulting in effective surveys and polls. Try out by creating your surveys and polls with us.

--

--

Shreyak
BlockSurvey

Technology Enthusiastic Guy. I post blogs related to Data Science, Machine Learning, Python, Flutter and much more.