How machine learning lifecycle is different from a software development lifecycle?

Rajesh Hegde
Dblue.ai
Published in
3 min readApr 22, 2020
Image credit — https://pixabay.com/photos/staircase-spiral-architecture-600468/

What is Software Development Life Cycle (SDLC)?

The SDLC is a methodology followed for developing software projects with clearly defined processes for creating good quality software. The SDLC methodology broadly involves the following phases,

  1. Requirement analysis
  2. Planning
  3. Architecture and sub-system design
  4. Coding
  5. Testing
  6. Deployment
  7. Maintenance

What is Machine Learning Life Cycle (MLLC)?

How a life cycle is defined for software development, machine learning model development also needs a similar methodology. MLLC is a cyclical process a team has to go through to build and manage the good quality models. Following are the broad phases MLLC involves,

  1. Business Requirements
  2. Gathering and Data preparation
  3. Exploratory Analysis
  4. Training
  5. Model Selection and Verification
  6. Deployment
  7. Monitoring

How MLLC is different than SDLC?

Software vs Model

Softwares are built based on the requirements provided by during the first phase of SDLC. But in machine learning, a model is build based on a specific dataset. Software systems wouldn’t fail once deployed as long as the requirements are not changed. But that is not the case with machine learning, the underlying characteristics of the data might change and your models may not be giving the right result.

Well built software can handle various scenarios but can’t say the same thing about the machine learning models. Data used for training will change sooner or later, that is inevitable.

Let’s take a model that classifies dog vs cat, over a period of time your model may be tasked to classify new bread of dog or cat. The model will spit out an answer for sure. It will try to relate to the closest thing it saw during the training. But can you be sure that the result is correct, you may not right? Over time, the target which you are predicting can change or characteristics of the data can change or the features used are no longer sufficient to predict.

Maintenance vs Monitoring

In software, monitoring is most of the time checking the uptime or response time. Whenever a bug is caught, the team will fix it up and deploy it again. In machine learning, as we discussed earlier, the model might give the result but the result might be wrong. Monitoring the model inputs and the output becomes very essential.

Few things every operational model need to monitor are,

  • The integrity of the data
  • Distributions of model inputs
  • Package dependencies
  • Model performance metrics
  • Deployment infrastructure performance

ML Monitoring tools currently available are still nascent. At Dblue.ai, we think it’s high time to provide a best in class tool for automatically monitoring anomalies with the model and data. So, we built MLWatch to proactively monitor machine learning models and data.

MLWatch will monitor your production machine learning model predictions in real-time and alert you on performance degradation, data anomalies, and data drift. Build trust in your model by understanding the behavior and getting increased visibility into prediction data. Detect model bias to ensure your models are fair to all segments of the data population.

Originally published at https://dblue.ai on April 22, 2020.

--

--

Rajesh Hegde
Dblue.ai

Technology Evangelist | Startup Enthusiast | Co-Founder & CTO at https://dblue.ai | Google Cloud Certified Professional Data Engineer