Process Models for Data Science Projects: CRISP-DM and KDD

Fatih Eren Özçelik
Kodluyoruz
Published in
3 min readOct 6, 2022

The most common process models for data science projects are CRISP-DM and KDD. The aim of this article is to briefly describe the stages of both models and to identify the differences between them.

Photo by Stephen Dawson on Unsplash

CRISP-DM Stages

CRISP-DM Process Diagram
  1. Business Understanding
    Business Understanding stage focuses on identifying the requirements of the project. Understanding what the requirements are and setting goals accordingly affects the entire project process. This stage is of great importance in all projects in general. If it is skipped, a very different result than desired can be achieved at the end of the process.
  2. Data Understanding
    In the Data Understanding stage, the initial data is collected according to the needs decided in the previous stage, the collected data is examined and its properties are defined, a deeper data exploration is made and finally data quality is measured and how clean the data is determined.
  3. Data Preparation
    At this stage, final datasets are prepared before proceeding to the modeling stage. For this stage, these steps are followed:
    📌 Which datasets will be used and the reasons are determined
    📌 Datasets are cleaned again and ready for modeling
    📌 New useful attributes are derived within the datasets
    📌 New datasets are created by combining data collected from multiple sources
    📌 Data is re-formatted according to business needs
  4. Modeling
    Models are created using different techniques with the information obtained from the previous stages. Then these models are evaluated by testing and continue like this until the desired result is obtained from the model.
  5. Evaluation
    At this stage, it is decided which model best meets the needs. It is checked whether the requirements determined in the Business Understanding phase are met.
  6. Deployment
    If all previous steps have been completed and the model has been successful, decision is made to deploy the model. Plan for model deployment is developed, maintenance plans are made for the post-project phase, final reports are documented for the whole process and what is going well and what could be better is observed.

KDD Stages

http://www2.cs.uregina.ca/~dbd/cs831/notes/kdd/1_kdd.html
  1. Selection
    Creating target datasets by acting on the available data/database
  2. Pre-processing
    Improving and cleaning the created target datasets, getting rid of faulty or missing data
  3. Transformation
    Converting pre-processed data into utilizable data
  4. Data Mining
    Searching for patterns depending on the project goal by sifting through the transformed data
  5. Interpretation/Evaluation
    Interpretation, evaluation, documentation and visualization of cleaned, transformed and patterned data for helping humans to understand easier the output

CRISP-DM vs KDD

By Quantum on Medium — Edited

📌 CRISP-DM combines the Selection and Pre-processing stages under the Data Understanding stage.

📌 CRISP-DM stages are reversible. In this way, when an error is made, it is possible to go back and correct the error and make changes without completing the entire cycle.

📌 CRISP-DM differs from KDD with the Business Understanding phase. With the Business Understanding phase, CRISP-DM covers all the steps of building a reliable data science project.

Photo by Carlos Muza on Unsplash

--

--