Automated Machine Learning-Streamlining ML Pipelines

Iris John
3 min readAug 3, 2023

--

Conventional machine learning includes tasks such as data preprocessing, feature extraction and selection, model selection ,model training , hyper-parameter optimisation , model validation, deployment, and performance evaluation.Auto-ML aims to automate this entire process right from data pre-processing to model optimisation so that even non — experts can use machine learning for respective use cases without needing much technical skills and not go through un- necessary theory.

In simple terms, auto-ML is like a smart computer program that takes a specific problem statement with a machine learning solution (like classification, regression, clustering ) and also receives associated training data, and outputs a suitable machine learning pipeline, that is close to the best possible solution for that particular problem, according to some predefined performance metrics.The core goal of auto ML is Machine Learning For Everyone.

Some popular Auto-ML tools / libraries to know :

  1. Google Cloud AutoML
  2. Auto-Sklearn
  3. Auto-PyTorch
  4. Aible
  5. Eval ML
  6. Autoviml

Once we input the data set ,what steps are involved in auto-ML?

  • Data Pre- processing : It identifies the data type (boolean , number , text etc) and also does task detection process. It means that it identifies what kind of solution is required
Some types of tasks automl detects
  • Feature engineering : It includes pre-processing the data , identifying missing values , skewed data selection and feature extraction and selection
  • Model Selection : One of the main steps that includes finding the right model and a neural network architecture suitable for the data and the problem.
  • Automation & Validation: Here we train and evaluate the performance of the model

The core part of any AutoML system is its machine learning pipeline optimisation engine. This consists of three essential parts: a search space, a search strategy, and a performance estimation strategy.

  1. Search Space: is a defined collection of machine learning pipelines among which we search for the most suitable one
  2. Search Space Strategy: Search space might contain numerous number of ml pipelines and architecture.We need to explore and find out the one that suits the best.Some approaches used here are :
  • Bayesian Optimisation
  • Reinforcement Learning
  • Gradient Based etc

3. Performance Estimation Strategy : this step is all about figuring out how our model is performing .Some methods involved here are :

  • Lower Fidelity samples: Use fewer training epochs , train on a subset of data , train on downsamples data etc.These will not give you the exact values / result but will give us an estimation .
  • Learning Curve Exploration : starting to train the data early assuming that the steps taken till are quite promising.If the model fails to give the minimum expected results we can start optimising it early

AutoML offers accessibility, streamlining the machine learning pipeline for non-experts and accelerating AI development and automates complex tasks like model selection and hyperparameter tuning, saving time and enabling rapid experimentation. Additionally by exploring various algorithms and configurations, AutoML maximises model performance, leading to better decision-making and reproducibility.

However, AutoML has limitations. Its “black box” approach can obscure model interpretability, posing challenges in sensitive domains. It heavily relies on quality and diverse training data; insufficient or biased data can result in suboptimal models. Human expertise remains crucial for understanding data context, defining problem statements, and interpreting results.

--

--

Iris John

Product Analyst |Exploring Data | Research Enthusiast