Part II — Artificial Intelligence: Successfully Navigating from Experimentation to Business Value

Fabio Cesari
YNAP Tech
Published in
4 min readOct 24, 2019

This is the second of a three-part series on structured experimentation in Artificial Intelligence (AI).

In this article I will explain why and how we use two tools in our R&D Data Science team. Please see the first article for an introduction to the subject.

Choosing the right tools for the task

Having set our vision and goals, we looked for methodologies and tools that could help us achieve them.

We quickly realised that choosing just one solution that does everything wasn’t going to work. The “No free lunch” theorem applies to software products as well, and end-to-end solutions come at the price of flexibility.

This is why we decided to go with a combination of two products: Amazon Sagemaker and Neptune.ml

The whole is greater than the sum of its parts

Amazon Sagemaker

We use Amazon Sagemaker as our primary way of running experimentation jobs in the cloud.

We use it to provision notebooks and execute training jobs, with the assurance that libraries and frameworks are always aligned and with the ability to scale as needed. In addition to making it simple to switch between different hardware and environment configurations, a big benefit of using Sagemaker is that the transition from training to experimentation to deployment is almost seamless.

Moving from lab to production is usually a hassle, as you have to set up and manage the infrastructure to serve models for both real-time and batch inference. Sagemaker simplifies everything to just a few API calls, allowing you to create endpoints that expose trained models or use them to perform batch inference at scale.

Sagemaker covers three stages in our data science pipeline

Automating infrastructure provisioning for research, training and inference saves us a lot of time and energy. Our R&D data science team can focus its efforts on building cutting edge models and not setting up environments or making sure that they are aligned between notebooks, training instances and inference instances in terms of drivers, frameworks and libraries.

But running a lot of experiments has a flip side too. Navigating through a large number of experiments without getting lost can get difficult very quickly. Which brings us to the second piece of the puzzle.

Neptune.ml

Neptune allows us to track and organize our experiment metadata in a central knowledge repository.

We use Neptune to:

  • Quickly evaluate live and past experiments using key metrics and charts
  • Examine secondary metrics and training logs, enabling us to drill down to understand exactly how the model is performing and even see how the hardware is being used
  • Log attention maps and predictions during training in order to spot problems (data augmentation bugs, inappropriate loss function, etc.)
  • Save model diagnostic and evaluation charts like the ROC curve or the confusion matrix
  • Tag, group and filter experiments, which allows us to easily compare different approaches and keep our work organized
  • Automatically save training code, notebooks and Sagemaker configuration parameters along with experiment data, so that we can easily see which code version and setup has produced the results we see
  • Compare experiments across a number of metrics, by visualizing combined charts of the same metrics from different experiments
Click to enlarge image

Metrics are stored in custom channels and numerical ones are rendered as charts, whereas images are shown in galleries. Hyperparameters and training code are also stored along with experiment data.

The training code run by Sagemaker interacts with Neptune in a very straightforward way. After importing the Neptune SDK and initializing it with a project name, you can then create a new experiment, add any tags to describe it and start the training process.

Metrics, images, logs and other data are pushed to Neptune while training is in progress. For instance, you can calculate custom metrics and push them within Keras callbacks. You can then quantitatively evaluate experiments by ranking them across your custom metrics.

Comparing metrics from different experiments has never been easier (click to enlarge image)

In the next and final part of the series, I will show how we have used these tools for building a Semantic Segmentation service.

--

--

Fabio Cesari
YNAP Tech

Head of Research & Development at YOOX NET-A-PORTER GROUP