Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Steps to be followed on building Machine Learning Projects

--

Photo by Ismail Salad Osman Hajji dirir on Unsplash

In this blog, I put forward some of the criteria to follow when you are working on the machine learning project. This criteria is suggest by my faculty his name is srikanth varma ,(10 year industrial experience in machine learning field).

I just share these criteria hope you can able to take up these things in your mind while working to get better design of your machine learning project.

Lets begin, first we can see the title of the criteria then we can see each criteria with the detailed description…..

Steps to be followed:

  1. Understand the Business problem and Requirement, definition of the problem
  2. Data acquisition
  3. Data preparation: cleaning, pre-processing
  4. Exploratory Data Analysis
  5. Modelling, Evaluation, Interpretation
  6. Result communication
  7. Model Deployment
  8. Operation of the model
  9. Optimization of the model

Detailed description of each criteria:

  1. Understand the Business problem and Requirement, definition of the problem:

This is very important aspect, guys before proceeding the project you must understand business problem and requirement end to end. There must be a good proposal of your project. once you understand clear way you can able to start with right process in machine learning.

2. Data acquisition :

Acquiring the data is very important one, the important concept plays in the data acquisition is Extract, Transform and Load, shortly called as ETL. It is a general procedure of copying data from one or more sources into a destination system

Extract:

Data extraction, involves extracting data from homogeneous or heterogeneous sources.

Transform:

Data Transform processes data by data cleaning and transforming them into a proper storage format/structure for the purposes of querying and analysis.

Load:

Data load describes the insertion of data into the final target database such as an operational data store, data mart ,data lake or data warehouse.

one of the tool used for the concept is SQL which tries to handle the data in the databases, data warehouse, big data(spark,hadoop).

ETL plays major role in data acquisition

3. Data Preparation:

Guys, data preparation is very important because in the real world business problem the data will not be in form of structured data like dataset available in Kaggle(a data hub for the data science). you must do some process like feature engineering,data pre-processing,data cleansing.

Feature Engineering:

Feature engineering is a transform or introduce a new feature on the top of existing one to leverage fully on the machine learning model.

Data pre-processing:

Data pre-processing is a technique that is used to convert the raw data into a clean data set. In other words, whenever the data is gathered from different sources it is collected in raw format which is not feasible for the analysis.

Data cleansing:

Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate record from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.(as per the Wikipedia)

Note: Real world problem will not be in structured data type,it is in unstructured data type

4. Exploratory Data Analysis:

It is important to analyze the data,70 percentage of time is spend in analyzing the data. Data analysis is something questioning on the data try to solve the question. python pandas is one the tool used for analyzing the data and also used to visualize the data. Data visualization is the important concept to view data graphically which provide idea how the data are distributed. seaborn, matplotlib are some the python library used for data visualization.

Data analysis and Data visualization play a major role in exploratory data analysis.

5. Modelling, Evaluation, Interpretation.

Modelling:

Decide the machine learning model based on the required problem and not apply toward your favor of the model.

Model is decided based on problem,not by yours!!

Evaluation:

Evaluation technique is kind to checking whether your model performs in better accuracy. Try to apply more evaluation technique like accuracy score,F1 score, ROC curve,confusion matrix for the business problem.

Interpretation:

Interpretation is process of giving some reason why the particular output is predicted.By providing such reason the related business person or client can easily understand why such output is predicted.

Reasoning is the good way of understanding why such output is predicted!!

6. Result communication :

Result communication is a best criteria to communicate from machine learning engineer to generic person. The process should be able to understand even for general person who is not working in this field. At each process you need to make general observation show that general person can understand in lucid manner.

Process need to understand even for the person,belongs to non machine learning field

7. Model Deployment:

Once you perform your model and all the process is successfully done you must deploy your model into online(world wide web). python flask and Django is one of the tool to deploy your model with local host. Heroku(a platform as service) used to deploy your model in online.

check through this link for Heroku platform: https://www.heroku.com

Model should be deploy in the online

8. Operation of the model:

After every process done,you need to retrain your model and need to check whether the model pipeline is working fine and need to handle the pipeline failure in this criteria.

Model pipeline is need to check every time in your operation

9. Optimization of the model:

Optimization is an important task once you completed the whole process you need improve or update the model by including more data and features and also need to optimize the code.

Optimization technique shows how the model is effectively perform well!!

From this i conclude that you have to take above criteria in your mind when you are working in the machine learning project..

— — Dhilip vasanth(A young learner in machine learning)

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

No responses yet