Real-World Machine Learning Pipeline (ML Engineering)

A few months back I had put a presentation highlighting disconnect between academics/popular courses and enterprise in the machine learning world.

Long story short for most institutes, Data Science is all about ML algorithms and some Data Analysis. Data Engineering is lightly touched upon in a handful of colleges.

But for enterprise to implement machine learning solution most of the time and money is spent on Data Collection, Data Cleaning, Data Engineering, Model Deployment, Model Monitoring, Dev Ops, Stakeholder communication. ML algorithm is a small fraction of the entire lifecycle.

60% of machine learning work is actually getting data ready (Data Collection, Data Analysis, Feature Engineering from domain understanding) for ML algorithm to work and 25% of time goes in building frameworks for Model Deployment, Model Monitoring among others. Hardly 15% of the time is spent in writing ML code that includes feature selection, hyperparameter tuning, model selection, etc.

The interesting fact is to perform the last 15%, we also have AutoML framework to help us in some or most part of it.

Top 5 challenges for enterprise as well confirm that ML code (15% of work that academics focus on) even though very important has never been big of a challenge (Thanks to academics for cover it). Major ML implementation challenges include

  • Data Collection
  • Deploying and Reproducing the model in production
  • Model Monitoring
  • Keeping the model relevant by adapting to changing business scenarios
  • Communicate and interpret model output to various stakeholders

While there is a lot of content out there this article is to consolidate my experience working on Machine Learning and Data Engineering projects for the large enterprise over the years and help students, researcher and newcomers in this space understand how real-world ML pipeline looks like in typical enterprise.

I have been publishing videos on this topic covering individual components of real-world machine learning and will continue doing it for another couple of months. Below is how a typical pipeline in enterprise executing machine learning and artificial intelligence projects and video to explain components of ML

You can also follow my YouTube channel () where I have been creating videos on End to End Machine Learning lifecycle.

Currently, Business Understanding, Data Understanding, Data Collection and Data Analysis video is already out there and will be uploading other components by Jan’ 2020. You can check most of the videos in my YouTube playlist.

To subscribe to my channel you can use link above or click on the link — .

Data Driven Investor

from confusion to clarity, not insanity

Srivatsan Srinivasan

Written by

Data Scientist | Data Engineer

Data Driven Investor

from confusion to clarity, not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade