Project Management in Data Science using SEMMA

Mukesh Kumar
Accredian
Published in
4 min readApr 26, 2022

Preface

These days, data is the heart of corporations because it gives a competitive edge in the industry, improves potential growth, and provides valuable benefits (enhanced) to its customers. It serves as the basis for analyzing and developing prototypes that mimic the real-time behavior of customers.

Photo by Alvaro Reyes on Unsplash

In short, data provides insights, and in its raw form, we cannot gain any of it. We need data in its cleansed form that we can analyze and study to get its actual worth. Moreover, we are procuring excessive amounts of data and analyzing it without any process, which can lead to a complete mess.

SEMMA Process & its Phases

SAS Institute designed the process of SEMMA for extracting insights from raw data, which stands for Sample, Explore, Modify, Model, and Assess. Some popular applications where SEMMA is possibly employed are Consumer Retention & Procurement, Finance Factoring, and Analyzing Risks concerning various applications such as loans. The SEMMA process includes five stages which I will describe in elaboration as follows:

  • Sample: We initiate by selecting a representative dataset from the available sources (overly large databases) and attempt to identify conditional (dependent) and autonomous (independent) features that influence the modeling process. After analyzing the sample, data gets partitioned into the training, testing, and validation sets.
  • Explore: Next, we explore data based on single and multi-feature with the help of visual plots and statistics. We study the interrelated association between several features to determine voids in the data features. We also analyze and record the observations for all the features that could impact the outcome.
  • Modify: For this step, we use the recordings observed in the previous step and parse the data with appropriate operations to make it ready for the model development phase. We may also re-iterate over the Explore step if required.
  • Model: This step focuses on utilizing several data mining methods to develop models which will help solve the business objective throughout the process.
  • Assess: In the last step, we evaluate the efficacy and reliability of the models developed utilizing different metrics concerning testing and validation sets generated initially at the process.

SEMMA Process vs KDD Process

The SEMMA process is almost similar to the KDD process, and the difference only lies in segregating the work among stages. The Sample stage in SEMMA is comparable to the Data Selection stage in the KDD process. The Explore stage is similar to the Data Acquisition and Cleaning, where we use cleansed data to analyze it. The Modify stage is equivalent to the Data Transformation stage in the KDD process. The Model stage is also similar to the Data Mining stage, where we apply intelligent methods to extract patterns from the data. Finally, the Assess stage is identical to the Pattern Evaluation in the KDD process, and after critical decisions, we identify the next steps.

SEMMA Process vs CRISP-DM Framework

The SEMMA process is also almost similar to the CRISP-DM process. The Sample and Explore stage in SEMMA is comparable to the Data Understanding stage in the CRISP-DM framework. The Modify stage is equivalent to the Data Preparation stage in the CRISP-DM framework. The Model stage is also similar to the Modeling stage, where the machine develops several models to learn patterns from the data. Finally, the Assess stage is identical to the Evaluation in the CRISP-DM process, and after essential judgments, we identify the next steps.

Alternative to SEMMA

You may find several other frameworks to use as an alternative to the SEMMA process. They also help gain knowledge from raw data and iterate over the entire process giving back refined results if required. These frameworks are:

All these alternatives are almost similar with the same objective of solving business-oriented problems and gaining knowledge.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their Data Science or AI journey. If you are one of them and looking for a way to counterbalance these cons, check out the certification programs provided by INSAID on their website. If you liked this story, I recommend you to go with the Global Certificate in Data Science because this one will cover your foundations plus machine learning algorithms (basic to advance).

& That’s it. I hope you liked this traditional data science framework and learned something valuable.

Follow me for more forthcoming articles related to Python, R, Data Science, Machine Learning, and Artificial Intelligence.

If you find this read helpful, then hit the Clap👏. Your encouragement will catalyze inspiration to keep me going and develop more valuable content.

--

--

Mukesh Kumar
Accredian

Data Scientist, having a robust math background, skilled in predictive modeling, data processing, and mining strategies to solve challenging business problems.