Why You Need CRISP-DM for Data Science?

Jawakar
4 min readMar 1, 2020

--

Hi! have you ever heard the word CRISP-DM? are you having any idea of this topic? if not then don’t worry you have come to the right place. This CRISP-DM method will make you a better Data Scientist. This method was mainly implemented to overcome or save time consumed by Data Mining and this is used in the wide variety of business applications and industries. This follows a six-step process as shown in the diagram below

The six stages of CRISP-DM

  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Modeling
  5. Evaluation
  6. Deployment

Business Understanding

This stage is the most important stage because this is the foundation of this model. You need better communication skills and clarity about the problem statement given by stakeholders. The choice made by you should be robust based on metrics. Ethical considerations should be done by checking whether the data required is under privacy or public. You should communicate in a better way to stakeholders because they don’t see the things as you see so that you should perfectly meet their needs.

Data Understanding

Data understanding depends on business understanding. This is the stage where you start to collect the data. By here you should be knowing the source of the data and quantity and quality of data available and should be careful that which data is relevant to the objective.

Data Preparation

Data preparation

Data preparation is the only stage that requires most of the time nearly up to 70% and this time can be reduced to 50% by automating the task. The format of the data is selected and is checked for the need for annotating data. Here the data is extracted, transformed and loaded. The loaded data should be normalized and standardized as per the problem statement.

Modeling

Once data is prepared for use, it must be expressed through whatever appropriate models, give meaningful insights, and hopefully new knowledge. This is the purpose of data mining: to create knowledge information that has meaning and utility. The use of models reveals patterns and structures within the data that provide insight into the features of interest. Models are selected on a portion of the data and adjustments are made if necessary. Model selection is an art and science.

Evaluation

Data Evaluation

The selected model must be tested. This is usually done by having a pre-selected test, set to run the trained model on. This will allow you to see the effectiveness of the model on a set it sees as new. Results from this are used to determine the efficiency of the model and develop its role in the next and final stage. In this stage, if anything doesn’t work towards the required need then the problematic stage is corrected.

Deployment

Model Deployment

Deployment is the process of using your newly developed model to make improvements within your organization. This deployed model is also used by stakeholders. The general deployment process consists of several interrelated activities with possible transitions between them. These activities can occur at the producer side or at the consumer side or both. This stage is associated with the feedback stage where the feedbacks are considered and changes are done to the product.

Conclusion

Every stage in the CRISP-DM cycle is iterative and never ends in order to maintain the quality of the developed product. When next time you develop a product you must follow this CRISP-DM model to make a better one. Thank You.

References:

  1. https://www.coursera.org/learn/data-science-methodology/supplement/VJHUS/introduction-to-crisp-dm
  2. https://www.ibm.com/support/knowledgecenter/SS3RA7_15.0.0/com.ibm.spss.crispdm.help/crisp_overview.htm

--

--