Orchestrating machine learning experiments for MLOps using Apache Airflow

Andrea Capuano
Analytics Vidhya
Published in
5 min readJul 26, 2020

--

Nowadays that more and more machine learning models are going to production, the need to operationalize the overall Machine Learning workflow becomes crucial to companies who adopt artificial intelligence capabilities.

We can use Apache Airflow platform to orchestrate the different phases of machine learning

Machine learning experiments usually follow a predefined set of phases, such as:

  • Data ingestion: Collect and integrate data from different sources
  • Data validation: Ensure the collected data is valid and consistent with expectations
  • Data preparation: Validate, preprocess, extract features and transform the data to get it ready for the machine learning task
  • Model training: actual training of machine learning models, hyperparameters tuning
  • Model evaluation: Evaluate model performance, accept or reject its results
  • Model deployment: Deploy models if the performances of the previous step are acceptable to go in production

Orchestrating the different phases through a well defined and repeatable workflow can boost up productivity in your overall machine learning pipelines. By both promoting well structured codebase and creating a way to reproduce systematically the steps. Hence, provide capabilities such as Continuous training and…

--

--

Andrea Capuano
Analytics Vidhya

Software Engineering, Artificial Intelligence, Random thoughts