Part III — Artificial Intelligence: Successfully Navigating from Experimentation to Business Value

Fabio Cesari
YNAP Tech
Published in
5 min readNov 25, 2019

This is the last of a three-part series on structured experimentation in Artificial Intelligence (AI).

In this article I will share an example of how we used the tools adopted by the R&D Data Science team in a recent experiment. Please see the first article in the series for an introduction to the subject and the second article for an overview of the tools we use.

As you can imagine, understanding visual content is a big part of the picture for our business (no pun intended). Bringing state-of-the-art deep learning techniques to visual recommendation systems and digital production tasks is an area where the industry is progressing very quickly.

A good example of how the toolchain we used helped us to quickly produce results is represented by a Semantic Segmentation experiment we ran recently, in which both Amazon Sagemaker and Neptune played their role.

Semantic Segmentation refers to the task of associating each pixel in an image to a class: it can be interpreted as a classification at pixel level.

Below you can see that each pixel was classified as either belonging to a garment, to the person or to the background.

Sample demo app running our segmentation model converted to TFLite format

Sagemaker made it very easy for us to run a hyperparameter optimization task over various deep learning models, which resulted in 200+ training jobs in a few days.

The automatic model tuning feature of Amazon Sagemaker allowed us to easily prepare the experiment, defining which hyperparameters we wanted to tune and the desired number of parallel training jobs.

We used it to explore the hyperparameter space intelligently, based on a Bayesian optimization model that chose hyperparameter combinations based on the results of previous training jobs and quickly stopped the ones that showed no improvement on the chosen metric, which in our case was the precision on the clothing classification of the validation set. Neptune helped us organize, interpret and communicate results.

We divided our experimentation into four stages:

Stage 1

We started by choosing the best performing learning rate and optimizer algorithm (Adam, NAdam, RMSProp,..) for a set of network architecture candidates, such as a custom FCN, Unet and the popular DeepLabV3+ in its MobileNet and Xception variants, keeping other hyperparameters fixed.

Due to the large performance gap showed by these early results, we decided to discard the FCN and UNet models, focusing on DeepLabV3+. We expected that DeeplabV3+ would perform better than the other models and it was reassuring to see that this was confirmed experimentally.

Best learning rate and optimizer for each of the architecture candidates.

Stage 2

Once we found optimal learning rates and optimizer settings for each one of our architecture candidates, we selected the training loss function amongst a set of candidates: the classical categorical cross-entropy, the weighted BCE dice loss (a variant of the cross-entropy loss that deals better with class imbalance) and a custom version of the categorical cross-entropy with class-wise tunable weights, later referred to as “clothing_crossentropy_loss”. After tuning the class-wise weights for our custom loss in a separate hyperparameter tuning job, we compared the three loss candidates on a fixed DeepLabV3+ architecture. We found out that our custom loss performed slightly better than the other candidates.

It was close, but ultimately our custom clothing_crossentropy loss function produced the best results.

Stage 3

We found optimal values for other hyperparameters, such as data augmentation ones, on the optimal learning rate, optimizer and loss obtained at the earlier stages for the DeepLabV3+ MobileNet architecture variant.

Searching for optimal data augmentation settings

Stage 4

Finally, we performed a full training of our best architectures using the optimal values for their hyperparameters.

To further monitor our experiments, during training we logged several segmentation masks to Neptune calculated based on the validation set.

Here are a few examples:

From left to right: input / ground truth / prediction

Summing up, we managed to approach this problem in an orderly fashion and arrive at the best model, which is now readily available (with all auxiliary metadata) for everyone working on this project and those that will join us in the future.

Model deployment

After selecting the best model, it’s essential to be able to quickly make it available to applications as a new model or as a new version of an existing one. With Sagemaker this boils down to creating a model Endpoint, which we consume through the Sagemaker Python SDK by a Lambda function that takes care of exposing it to the outside world through Amazon API Gateway and our CDN / Web Application Firewall.

The main role of API Gateway is to act as a router that directs HTTP REST requests to the appropriate Lambda function, which contains the logic needed to handle the request, parse input parameters and prepare the input data that is then fed into the model, as well as post-process the model output and return it to the client.

The way we structured the routing mechanism on API Gateway also allows us to easily implement model versioning and deprecation policies, as well as handle monitoring, logging, API keys, and publish documentation in Swagger format.

There’s much more to say on model deployment, scaling and KPI monitoring, it could be the topic for a future article.

The resulting model can also be used to segment people from the background

Conclusions

In this article we shared our approach to the experimentation process in our R&D Data Science team, which allows us to work quickly, efficiently and transparently.

Thanks to this approach, the team’s productivity has increased significantly and experimentation time accelerated, because we can do more experiments in less time. Moreover, the transition from a successful experiment to a production-ready API is no longer an issue.

We hope that by following our approach you will be able to create a system that works for your team and bring value to your organization.

Please share your story and tell us which tools you are using to make your team better.

--

--

Fabio Cesari
YNAP Tech

Head of Research & Development at YOOX NET-A-PORTER GROUP