10 tips to improve your machine learning models with TensorFlow

Samuel MERCIER

Follow

Published in

Decathlon Digital

8 min readOct 13, 2020

--

It’s been nearly 3 years since we’ve begun our major AI projects at Decathlon. Since then, we’ve created a number of digital products available to the many. Among those is the Sport Vision API, which you can use to automate image tagging and moderation and query Decathlon catalog from an image.

We’ve probably had as many failures as successes along the way, but what matters most is making every step a learning experience to improve odds for the next project. To give you a head start on your AI projects, today we share the top 10 tips we learnt to improve machine learning models with TensorFlow.

1) Clean up your dataset

Let’s start with the easy one: the dataset. We all know that we need a proper dataset if we want to reach high accuracy. However, what we noticed along the way is that while having a big dataset is important, having a clean one is even more.

We learnt it the hard way while building an app to generate insight from Instagram images. The app uses the Sport Vision API to identify the sport practiced in images posted on Instagram, data we then use to identify trending sports.

According to our tool, basketball was considered a trending sport in all regions of the world, which we found surprising. The reason? When manually cleaning our dataset, we missed a few images wrongly classified as basketball. As a result, any time the sport in an image was ambiguous, the model assumed it was of basketball.

Going over your dataset once, twice or even three times to remove any mistake is time well spent.

2) Master the art of transfer learning

An example of transfer learning for image classification. Source: Google codelab

Transfer learning means reusing parts of a neural network trained for a similar application, instead of training your neural network from scratch.

A typical example is for image classification. In this case, you can begin with a general image classification model (EfficientNet, VGG, Inception, Xception, …). You keep the convolutional layers (called the feature extraction part of the model) of the model as is, and you only train the feedforward neural network on top of the convolutional layers with your dataset.

By reusing parts of an existing model, you have fewer weights to update, and can reach a high level of accuracy with a much smaller dataset.

TensorFlow Hub represents a very nice source of pretrained models, make sure to browse it before beginning a new project. As a very general rule of thumb, we found out that EfficientNet (image classification), EfficientDet (object detection) and the Multilingual Universal Sentence Encoder (NLP tasks) are models that generally perform well, although you should always try a few different ones as the right model strongly depends on your exact use-case.

3) Tune your hyperparameters early and often

When building your neural network, you’ll have a lot of arbitrary choices to make: the learning rate, the optimizer, the number of layers, the size of each layer, the activation functions, … A bad choice for one of these, or a bad combination of hyperparameters, can significantly deteriorate your accuracy.

As such, chances are that you’ll be disappointed in the results when training your first neural network. You may be tempted to assume that the results are bad because you used the wrong approach or your dataset is too small. But in many cases, it’s actually just a question of having the wrong combination of hyperparameters.

To help you find the right hyperparameters, make sure to leverage libraries such as scikit-optimize or keras-tuner to avoid using an inefficient grid search approach.

4) Learn about dropout, L2 regularization and batch normalization

Along with inadequate hyperparameters (tip 3), overfitting represents one of the most common causes of model failure. Overfitting implies the identification of patterns in the noise within the training dataset, which leads to a misleadingly high training accuracy. Overfitting can be easily identified by comparing your training accuracy with the accuracy on an independent validation and test set.

Overfitting is generally more pronounced when the training dataset is small. But even with small datasets, there are some clever ways to reduce overfitting. Some of the more usual ones include L2 regularization (to penalize large weight values), dropout (to reduce correlation between neurons) and batch normalization (to stabilize the learning process).

All three approaches are very easy to introduce within your model, so as soon as you observe a large difference between your training and validation accuracy, don’t hesitate to introduce them within your neural network.

5) Print a confusion matrix

Example of a confusion matrix. Source: plotly

Your model may reach a satisfactory accuracy level, but this is just an average over your dataset — it does not consider that some errors are more costly than others.

If you work on a classification task, use a confusion matrix to visualize, for a test dataset, the frequency of errors for each pair of true label x predicted label. Remember our basketball issue discussed in tip 1? With a confusion matrix, we would have easily caught our mistake with a confusion matrix.

To get you started, you can use scikit-learn to compute your confusion matrix, and Plotly heatmaps to visualize the results.

6) Capitalize on easily accessible computational resources on the cloud

You don’t need to equip yourself with powerful hardware to train machine learning models anymore. There are many resources easily available online, including free GPUs through Google Colab, Kaggle or Paperspace.

If these resources are not sufficient, take the time to compare different providers, as training cost can quickly increase with the resources used. On our team, when a resource like Google Colab becomes insufficient, we simply upgrade to Colab Pro. This gives us priority access over more powerful GPUs (for instance P100 GPUs instead of K80) or TPUs, which, from our observations, can decrease by more than 5 times training duration.

In our case, a resource like Colab Pro was enough to properly train image classification models with more than 150 classes on a few hundreds of thousands of images, tune the hyperparameters and generate the models behind the Sport Vision API.

7) Take the time to properly set up your data pipeline

Sometimes, you may observe that, even if you increase the computational power, the training time does not decrease. In this case, it could mean that the bottleneck of your model is your data feeding pipeline.

As such, it can be a good investment to follow a few tutorials on TensorFlow Dataset API to master efficient input pipelines. If you plan to use TPUs, also take the time to learn how to properly fetch data from a Google Cloud Bucket.

At all cost, try to avoid using old data feeding pipelines, like Keras deprecated fit_generator. Learning to create a proper data pipeline is not the most fun subject, but it will significantly increase the efficiency of all your machine learning projects.

8) Break your code into smaller functions

As data scientists, we often take a few liberties when it comes to good programming practices. A poorly annotated Jupyter notebook with general variables names and plenty of copy-pasting may be good for quick early iterations, but it only gets you so far.

From the beginning of any project, make sure to always think in terms of preprocessing (reading your data, properly cleaning and transforming them), computation (your TensorFlow model) and postprocessing (transformation into the desired output format). Hyperparameter tuning can be seen as a wrapper surrounding your computation block.

Split all your blocks into small tasks (one task = one function, with a maximum of 20–30 lines). Choose explicit and consistent function and variable names, add plenty of annotations, use type hinting and follow PEP 8 standards the best you can.

Following good programming practices may take you a bit more time to get started on your project, but it’s gonna make you win big in terms of code clarity and reusability. If you are part of a team, it can also greatly improve your relationship with your colleagues ;)

9) Learn to use TensorFlow Serving

Once you have trained a powerful machine learning model, chances are that you will want to make your model accessible to other apps and developers. You can do so by exposing your model through an API.

Using TensorFlow Serving may represent the easiest and most efficient path to build an API. Using TensorFlow Serving involves saving your model in a SavedModel format, instead of the h5 format we are used to. You can use the saved_model_cli to get all the information about your SavedModel required to expose it as a REST API.

To get you started nicely with TensorFlow Serving, you can take a look at some tutorials provided on Google Colab. Once you have deployed your first API, you can move on to more advanced tutorials to investigate how to optimize API response time.

10) Always perform end testing

Visualization of the end testing results for the Sport Vision API

Once you have deployed your machine learning model as an API, make sure to perform some end testing: that is, make sure that your API returns the same response that you had during training.

For instance, when we launched a few endpoints of the Sport Vision API, we observed a slightly lower accuracy than we had during training. After many hours of investigation, we found the culprit: a slight difference between the training and production pipelines with respect to the method used when resizing input images. Without proper end testing, we would probably never have identified this issue.

Bonus tip: keep an eye on trending algorithms and training methods!

When beginning a new project, my suggestion is to always start with the most well-known and robust approaches (CNNs for image classification and RNNs for time-series forecasting, for instance). But once you have a working model and you start your continuous improvement loop, that’s when trying more recent approaches becomes beneficial.

In our case, we are currently investigating graph neural networks to improve our recommendation systems. We are also looking at semi-supervised learning to capitalize on unlabeled data when training classification models.

To keep an eye on trending algorithms and training methods, keep the habit of reading at least one medium article daily and following resources like Google and Facebook AI blogs.

10 tips to improve your machine learning models with TensorFlow

Further reading

Written by Samuel MERCIER