CRSIP-DM Methodology

Published in

Analytics Vidhya

3 min readJan 11, 2021

CRISP-DM or Cross Industry Standard Process for Data Mining is a clear set of steps /framework for executing any data science / data mining project. This ensures we have a streamlined process in a project. Straightaway jumping to cleaning & modelling a data might not be the best way to get a data science project going.

So what are the steps?

Business Understanding
Data Understanding
Data Preparation
Modelling
Evaluation
Deployment

Business Understanding

Business Understand is the first step in CRISP-DM. Here the Analyst / Data Scientist needs to dig deep into what the customer needs out of the data mining project. Understand the business context ,pain points, constraints etc. Then, next the metrics have to be defined. These must be measurable. Only then can an improvement can be shown with respect to goals. The tools & techniques which could be probably used might be listed down based on the client input. Finally a flowchart of the process may be created for referring back & understanding.

Data Understanding

This is a very important step in CRISP-DM. Understanding the data sources, the data types, the flaws in the source data can be very crucial for the performance of any model we apply. So to obtain a clean data, we need to have a very good understanding of all data available to us. A clear & concise report which specifies what are the data sources, format, features, records, anomalies etc.

Understanding the data can also include doing univariate, multivariate, distribution analysis etc. This helps us understand the nature of the data. It helps understand how the independent variables impact the dependents in the dataset in question.

Data Preparation

This is the stage where we apply our business understanding & data understanding to work. Based on business logic constraints etc., we source the data from data lakes, databases, clusters or wherever the data is available. Then use the tools & technologies available to prepare a data which would be suitable for modelling using joins integration etc. Also, we do things like missing value imputation, outlier treatment, transformation, standardization of data, encoding, feature engineering, new variables creation, etc.

Data Modelling

Now that we have a data which is cleaned, with the right features in it, we are ready to apply modelling techniques to solve our specific business problem(s). Usually multiple ML algorithms should be applied based on the output required to build models here so that at a later stage we could verify the outcome of each.

Model Evaluation

Now we have model(s). So we could evaluate the results of the model by using various parameters. Make a summary report of the various outcomes of each model & also rank them according to output & business understanding we have. Discuss the results with the client & other stakeholders for more clarity of results.

Deployment

Final but important step, we have the results, we present the same to the client finally & once approved the same is pushed to the production environment. Also, the process does not finish here. The real-time performance of model has to be evaluated & if there are any issues the proper maintenance needs to be done again based on CRISP-DM strategy.

Please Note — Though the steps in the CRISP-DM is mentioned sequentially, each step has a feedback loop, this means, when there are issues in the current step we should go back to the previous steps again & again till we have what is required for each step.

References :

CRSIP-DM Methodology

Written by Deepak P Nair