Steps to Consider in developing Machine Learning Models

Any Machine learning model that you build, must follow these six steps

Chandan Mohan
Analytics Vidhya
3 min readOct 30, 2019

--

For any machine learning models, there are few essential steps to consider before it is deployed for the service. These series of steps are similar to all the machine learning models that you develop. Now let us look at the steps in order.

1. Collecting Data: For any Machine Learning model data is the main resource that needs to be considered, and models might need a vast amount of data. Collecting data is one of the difficult jobs. There are free services and paid services which offer such data if needed, where these data might be structured data in the form of JSON or XML or CSV ..etc. But if someone tries to collect data on there own, Then they need to consider that the data they collected might be an unstructured form.

2. Data Preparation: Now data been collected but this data might contain some corrupted information or missing values, That is, in the process of data collection someone might have unknowingly changed the data which is human error or some columns might be empty, which should be solved at this stage. with the help of statistics, we can solve this problem using mean, median or mode and other statistical methods.

3. Data Analysis: Data need to be analysed at this stage, that is, if it as any hidden relation between features in the dataset. With the correct feature engineering with domain knowledge, we can solve 70% of the problem. If we overcome the data analysis stage then the work has almost been completed. According to the survey, 70% of the data scientist time is spent on data analysis and feature engineering.

4. Training Model: Post data analysis the given data is divided into Train, Test and validation data where:

Training dataset: 65% of the data is used for the training algorithm.

Validation dataset: 5% — 10% of data is used for validation of the algorithm. The validation process is used in the tuning process.

Test dataset: 30% of data is used to test the algorithm performance.

then the training dataset is given as an input to the algorithm to learn. the algorithm might be anyone like Regression, Classification, KNN ..etc.

5. Testing Model: Once the algorithm shows a good performance on the training dataset. Then we test it on the test dataset, where the algorithm should predict the outcome of the test dataset. If the performance is seen good even on test data, then we proceed to the final step.

6) Deploy Model: In the final stage, we deploy the model to the cloud where the algorithm can serve its purpose, even the model can be used as an API.

Here are the 6 steps involved in developing the machine learning model

Email: chandan.naidu97@gmail.com

Linkedin: chandannaidu

--

--

Chandan Mohan
Analytics Vidhya

Hello !!! My name is Chandan, a George Washington University grad student passionate about Machine Intelligence & Cognitive Computing.