CRISP-DM: A structured approach to data-driven decision making

Data Mastery Series — Episode 1: CRISP-DM

Donato_TH
Donato Story
4 min readJan 17, 2023

--

If you are interested in articles related to my experience, please feel free to contact me: linkedin.com/in/nattapong-thanngam

CRISP-DM framework (Image by Author)

The CRISP-DM framework is a structured approach to data mining projects that is widely adopted and recognized as a best practice in the industry. It includes six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. The framework allows data scientists, analysts, and stakeholders to work together to align project goals, objectives, and deliverables.

1. Business Understanding:

1. Business Understanding (Image by Author)

The first phase of CRISP-DM is Business Understanding. This phase aligns the project goals and objectives with the business context. It involves defining the business objectives and goals, identifying evaluation matrix, key stakeholders, resources, and timelines, and identifying impact, risk and constraint. The Business Understanding phase requires ongoing review and collaboration to ensure alignment with business goals and objectives.

2. Data Understanding

2. Data Understanding (Image by Author)

The second phase of CRISP-DM is Data Understanding. This phase identifies available and needed data and its meaning and symbol. It also includes exploratory data analysis and understanding data characteristics such as quality, completeness, and distribution. Data Understanding phase requires collaboration between data and business teams for understanding the meaning and symbol of data.

3. Data Preparation:

3. Data Preparation (Image by Author)

The third phase of CRISP-DM is Data Preparation. This phase cleans, transforms, and prepares data for modeling. It includes handling missing and outliers data, handling imbalance data, and feature engineering and Feature selection. Good feature engineering and selection, based on business and data understanding, improve modeling outcomes.

4. Modeling:

4. Modeling (Image by Author)

The fourth phase of CRISP-DM is Modeling. This phase selects and fine-tunes appropriate techniques for the best model. It also includes comparing different models and selecting the best one and testing models using unseen data. Model accuracy is important, but understanding feature impact is also necessary for decision-making.

5. Evaluation:

5. Evaluation (Image by Author)

The fifth phase of CRISP-DM is Evaluation. This phase selects the best matrix and measures model performance against success criteria. It also includes understanding key parameters that impact the model and evaluating the business impact of the model. Matrix selection is key to evaluating model performance, understanding impact, and assessing business value.

6. Deployment:

6. Deployment (Image by Author)

The final phase of CRISP-DM is Deployment. This phase puts the model into production and creates documentation and user guides. It also includes developing monitoring and maintenance plan and evaluating the business impact and return on investment. Regular monitoring, evaluation, and improvement of model performance post-deployment is essential to ensure the model’s continued effectiveness.

In summary, the CRISP-DM framework is a structured approach to data mining projects that includes six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment. It is widely adopted and recognized as a best practice in the data mining industry and can be used by small and large organizations in various industries such as healthcare, finance, retail, manufacturing and many more. It allows data scientists, analysts and stakeholders to work together to align the project goals, objectives and deliverables, and ensure the continued effectiveness of the model post-deployment.

Please feel free to contact me, I am willing to share and exchange on topics related to Data Science and Supply Chain.
Facebook:
facebook.com/nattapong.thanngam
Linkedin:
linkedin.com/in/nattapong-thanngam

--

--

Donato_TH
Donato Story

Data Science Team Lead at Data Cafe, Project Manager (PMP #3563199), Black Belt-Lean Six Sigma certificate