TWIMLcon 2021: what did we learn?

Glovo Engineering
The Glovo Tech Blog
5 min readMay 24, 2021

In January 2021, the Glovo Machine Learning Platform team took part in one of the biggest events in the Machine Learning Operations (MLOps) space: the TWIML Con AI Platforms conference.

There are mixed opinions about conferences that take more than a week (TWIML is a 2 week conference) and especially those that run online (which is for obvious reasons standard this year). However this particular event, despite some purely commercial talks which due to their subjectivity brought questionable value, turned out to be a very useful source of information for the Glovo ML Platform team. The main focus of the conference not surprisingly was on the “hottest” MLOps components these days: Feature Stores and ML workflow management components.

In this report, we would like to give an overview of the 4 most exciting talks and share our learnings and thoughts:

How Spotify does ML at scale?

The Spotify use case for ML is in many ways similar to Glovo’s, however it operates at a larger scale. As at Glovo, they have a mostly decentralized data science (DS) structure and try to link DS projects with company and product goals.

They are handling isolated components on their own and dedicating time to create in-house tools for common problems like:

  • Feature Store, which is a purpose-built repository of ML features. Spotify developed their proprietary tool — Jukebox. It’s an interesting tool to analyse, as we are slowly but steadily getting to Feature Store planning as a part of the Glovo ML Platform offering.
  • Kubeflow Pipelines, a tool we tested as the ML workflow orchestrator, however finally we decided to go for an alternative solution (Argo Workflows) for various reasons (while taking advantage of the Kubeflow Pipelines SDK to generate pipelines using Python is viable as an alternative to make pipelines generation easier for Glovo data scientists).
  • Proprietary Serving Component. It implements a simple and intuitive paradigm: a user submits input to the component (a model or a set of features) and it returns the intuitively expected output (an endpoint or a model).
  • ML Home, a centralized repository of ML internal projects with a user-friendly UI. Spotify sees it as an enabler for team collaboration and a tool for ML project discoverability. It’s interesting that we had a similar idea not long ago and we might be coming back to it in the long term to provide a seamless centralized solution .

As in many other companies, Spotify rolls out DS products to a small representative sample of users, however it’s exceptional that they put emphasis on the representativity of that sample going beyond a simple random selection.

Unified MLOps Feature Stores and Model Deployments (Splice Machine)

Splice Machine is an Open Source tool that is based on a single database. Conceptually, their idea is so simple that it looks surprising how they can do so much based on it. Using a single database engine they can provide close to single-digit millisecond latency, be ACID-compliant and handle not only online but also precomputed models in an unified way.

Their solution offers a range of functionalities: from model deployment to a feature store. Overall, from the use case point of view, Splice Machine is a representative tool appearing in 2020/2021 covering a full MLOps cycle from feature-creation to model-monitoring. A possible low risk path would be to start experimenting with this tool based on their free community version.

From a Glovo perspective, a feature store (internally developed or not) could be another milestone to make ML work easier for DS, since right now the features are calculated per model on an ad-hoc basis.

TwimlCon Build an End-to-End ML Workflow for Your Organization

This talk was given by Mohamed Elgendy, who is the VP of engineering in Rakuten. In his talk, Mohamed attempted to analyze the end-to-end ML workflow and outlined the current main challenges to operationalize ML in mid-size companies.

In his view, the main challenges that organizations face to scale their ML operations are:

  • Limited data access
  • Limited ML infra
  • Disconnected ML workflow
  • Technology mismatch
  • Limited visibility
  • ML engineers spend too long tackling technical debt
  • Giants companies build their own ML platform

Many of the points resonate with the reality that we see in our day-to-day activities at Glovo. Mohamed’s idea of a good ML workflow includes finding answers to the following questions:

  • Do we have a streamlined workflow to handle all the stages of the ML workflow?
  • Are our experiments reproducible?
  • Do we have enough automated builds, training, and deployment?
  • Do our ML systems fail silently?

While possibly not exhaustive, this list made us think which parts of our solution should be prioritized for internal development and which modules should be purchased as well as a better modelling of our workflows which are currently being improved using Argo Workflows. One general approach to this problem is to develop internally the functionalities which touch the end-customer or are strategic for the business, and consider outsourcing the rest.

The Algorithmia view on the 2021 trends

Diego Oppenheimer, the CEO of Algorithmia, gave a talk about his view on the most significant ML trends in 2021, outlining 10 trends, on which we can reflect:

  • Despite all the tooling and efforts which are supposed to make the ML model deployment faster, the aggregated technical debt, complications in governance control and integration issues are still driving deployment time up.
  • DSs still spend too much time on deployment. According to Diego, 38% of organizations spent more than 50% of data scientists’ time on deployment, building and tuning ML models.

He also listed the 5 challenges that lead to inefficiencies in ML lifecycle:

  • Stuck in the lab
  • Disconnected teams
  • Technology mismatch
  • Stakeholder buy-in
  • Hidden technical debt

All are absolutely real and some very painful for an organization of our size in the e-commerce delivery business.

There was a lot of exciting content presented at TWIML 2021. The conference turned out to be a nice mix of vendors and buyers, which did not leave the feeling of getting a skewed perspective. Overall, it seems that 2021 will be a year of big revelations in ML efficiency, cross-team collaboration and consolidation of currently fragmented components of an MLOps solution.

The Glovo ML Platform team: Lorenzo Arribas, Amir Hossein, Miguel Fagundez and Maxim Khalilov

--

--