Explore a Machine learning/Deep Learning end-to-end pipeline

Ibtissam Makdoun
2 min readMay 14, 2022

The purpose of this article is to explain the process of building a machine learning model at a very high level.

Explore a Machine Learning/Deep Learning pipeline

Step 1: Explore and clean the data

First, we start with a full dataset and the first step is to explore this data in order to understand the type of features we have, what they look like and how they relate to the target variable/class or the value we are trying to predict. We also use the cleaning techniques (Remove stopwords, remove punctuation, lemmatization, stemming …)

Step 2: Split data into training, validation and testing

The next step is to split our data into training, validation and testing sets. The training set will be used to fit the model. The validation set will be used to evaluate the model while we train it. The testing set is used to evaluate the model on unseen data. Normally, we split data into 80% of the dataset in training, 10% for the validation and 10% for the test set; However, the percentage can change depending on the size of the dataset and many other factors.

Step 3: Fit the initial model and evaluate

In this step, we use something called fivefold cross validation to fit an initial model…

--

--

Ibtissam Makdoun

Researcher in Data Science and content creator. Find therapy in Notebooks and Pencils.