Standard Machine Learning Pipeline

From Dataset to Optimization

prakshaal jain
Analytics Vidhya
2 min readOct 7, 2021

--

This blog will provide a high-level flow of a standard machine learning pipeline. This blog will go from dataset to Optimization but will not discuss individual steps in detail.

Image by the author
  1. Dataset: We first start with the dataset, understanding the variables and establish the business understanding.
  2. Data Retrieval: Normally, the data is either stored in a CSV file or a database. Retrieve this data in your system.
  3. Data Preparation: Broadly, this step contains some sub-steps, which are Data preprocessing, Feature extraction and feature engineering, Feature scaling and selection.
  4. Build a simple Model: Post the data preparation, develop a simple model. Sometimes simple models are much more reliable and economical.
  5. Model Evaluation: Come up with a model evaluation metric. This metric should also relate to the business problem you are trying to solve.
  6. Build the Model: Now, once you have decided the metric and a simple base model, apply other machine learning algorithms and evaluate the difference you observe from the simple model. One more thing to keep in mind while building this model is the space and time complexity of the application.
  7. Tuning: This is the step where you do hyperparameter tuning, and also, if the model performance is not satisfactory, reiterate the process from feature selection and feature engineering.
  8. Deployment and Monitoring: Now is the time when your model goes out in the world and shines! Make sure to give it an API, put it in a docker, and periodically monitor the process. Again watch the space and time complexity at runtime.
  9. Optimization and Retraining: When your model is up and running and performing well, you come across a new technology or get some new data. Now is the time you restart the process again with a slight change. You will compare performance metrics for the new model and old model instead of the base model.

Hope this gives clarity and a high level picture of what a typical Machine Learning pipeline looks like, above all this comes the domain knowledge and the business understand of why we are doing what we are doing.

--

--

prakshaal jain
Analytics Vidhya

MBA Business Analytics, NMIMS, Mumbai (21–23), Former Data Science Engineer at Utopia Global, Inc.