Can AutoML replace Data Scientists?

Learn the key benefits of automated machine learning

Aleix López Pascual
Analytics Vidhya
4 min readAug 19, 2021

--

Introduction

Automated Machine Learning, also known as AutoML, is the process of automating the end to end process of applying machine learning to real-world problems. A typical machine learning process consists of several steps, including ingesting and preprocessing data, feature engineering, model training, and deployment. In conventional machine learning, every step in this pipeline is monitored and executed by humans. Tools for automatic machine learning (AutoML) aim to automate one or more stages of these machine learning pipelines making it easier for non-experts to build machine learning models while removing repetitive tasks and enabling seasoned machine learning engineers to build better models faster.

source: Randal S. Olson et al. Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science (2016)

What are the benefits of AutoML?

As data science becomes a more integrated part of our lives, businesses need more solutions in this field and demand more data scientists to build these solutions. Without data science methods, companies might be unable to understand their processes, monitor performance levels, or take certain actions to prevent huge losses.

The demand for data scientists is increasing year over year. The same report also indicates that it takes 43–51 days on average to fill a data scientist position. Considering the scarcity of data scientists and the amount of time for building data science solutions, AutoML can help businesses satisfy their demand for data scientists and improve the return on investment for machine learning projects.

AutoML solutions are useful for senior data scientists as non-experts alike:

  • For the experienced data scientist: Without AutoML, hours are lost doing necessary but manual tasks such as selecting features and tuning hyperparameters instead of deeper levels of analysis. AutoML seeks to remove those barriers by letting automated processes run in the background while data scientists can focus on more complex issues. Thus, by eliminating or minimizing tedious tasks, data scientists are not being replaced but empowered to do the things that only they can do.
  • For non-experts in machine learning: AutoML allows them to create quick and useful projects easily and put them in production without engineering support. In this way, we can significantly increase the number of projects rolled out and make better use of all our data.

The advantages of AutoML can be summed up in three major points:

1. Cost reductions

  • Increased productivity for data scientists by automating repetitive tasks.
  • Democratization of machine learning reduces the demand for data scientists.

2. Automating the ML pipeline also helps to avoid errors that might creep in manually.

3. Speed to production. Ability to deploy accurate models quicker than a data scientist. Thus, possibility to roll out more projects.

Can AutoML replace Data Scientists?

In a unique paper published in 2015, Google engineers looked at “technical debt” i.e. the long-term costs & complexities associated with ML solutions. A key observation, not surprising to most practitioners in the field, is that only a small fraction of an ML solution is the actual learning algorithm.

Technical debt that accumulates over time is due to a significant amount of “glue code” that is required. “Glue code” is defined as the code that is necessary to get data into and out of ML learning algorithms. Google estimates glue code to be 95% of the total code base with only 5% of actual ML learning code.

Take a look at the following graphic from the paper to get an idea about the functional blocks and their relative size in a real-world ML solution.

source: Google, Inc. Hidden Technical Debt in Machine Learning Systems (2015)

While AutoML is increasingly covering bigger portions of glue code, it is still only a small fraction of a total solution.

Even if we only take a look at model building, data scientists have many advantages when compared to current AutoML approaches:

1. Most AutoML tools optimize for model performance however that is just one of the specifications of real-life machine learning projects. For example:

  • If a model needs to be embedded in edge devices, computing and storage requirements force companies to choose simpler models.
  • If explainability is desirable, only certain types of models can be used.

2. On Kaggle, the largest community of machine learning competitions, humans are still easily beating models generated by AutoML tools. AutoML solutions have yet to win any data science competition.

3. AutoML nowadays can only deal with limited types of problems.

AutoML is not a replacement for data scientists.

Curious about AutoML? I recommend you to read my second article where I tell my experience comparing AutoML solutions.

--

--

Aleix López Pascual
Analytics Vidhya

Senior Data Scientist @ Glovo | Competitions Expert @ Kaggle | Writer @ Medium | MSc in High Energy Physics, Astrophysics and Cosmology