Image by Inzata

Understanding CRISP-DM and its importance in Data Science projects

Zipporah Luna
Analytics Vidhya
Published in
2 min readJul 21, 2021

--

A quick overview of the CRISP-DM. This is part 1 of the 7-part series’ summary explanation of the openSAP’s 6-week Getting Started with Data Science (Edition 2021) course by Stuart Clarke.

What is CRISP-DM?

CRISP-DM or CRoss Industry Standard Process for Data Mining is a process model with six phases that naturally describes the data science life cycle. It’s like a set of guardrails to help you plan, organize, and implement your data science (or machine learning) project.

Image by Mark Muir

Why is it important?

A good data science project must have a reliable and repeatable process for people with little data science background to follow and understand easily. This is where CRISP-DM comes in as you can use the CRISP-DM methodology as a template to ensure you have considered all of the different aspects specific to your project.

There are 6 phase of the CRISP-DM:

  1. Business Understanding
  • Determine Business Objectives
  • Assess Situation
  • Determine Data Science Goals
  • Produce Project Plan

2. Data Understanding

  • Collect Initial Data
  • Describe Data
  • Explore Data
  • Verify Data Quality

3. Data Preparation

  • Select Data
  • Clean Data
  • Construct Data
  • Integrate Data
  • Format Data

4. Modeling

  • Select Modeling Technique
  • Generate Test Design
  • Build Model
  • Assess Model

5. Evaluation

  • Evaluate Results
  • Review Process
  • Determine Next Steps

6. Deployment

  • Plan Deployment
  • Plan Monitoring & Maintenance
  • Produce Final Report
  • Review Project

In the next few weeks, I will be providing a summary explanation of each the phases. Each phase has its own task and its own projected output. I will also explain how it is applied for you to be able to understand why it is very important to follow a project methodology when working with a data science project.

CRISP-DM methodology is not required to be followed step-by-step as different data science projects will have different requirements. You can use the CRISP-DM methodology as a template to ensure you have considered all of the different aspects specific to your project.

To have a detailed explanation of the full course, enroll in the 6-week course at https://open.sap.com/courses/ds3.

--

--

Zipporah Luna
Analytics Vidhya

Data Analyst | Markets & Competitor Insights | Market Researcher | Football Enthusiast