Machine Learning — Why it is an iterative process?
It is been mentioned several times that Machine learning implementation goes through an iterative cycle. Each step of the entire ML cycle is visited again and again.
The question is — What makes the ML cycle iterative ? Why it becomes necessary to perform same steps repeatedly ?
The answer lies in the nature of the problems that Machine learning is trying to solve. Let’s understand this further.
Machine learning as a field has no boundaries set at this point of time. It is an evolving field of technology. As new algorithms are being developed, there is new opportunity found in real world to solve or vice versa.
Machine learning as a solution primarily used in the areas where traditional programming ceases to have a viable solution. for ex.
- Too complex problem to code such as Face recognition, Text extraction and understanding from variety of documents with different languages
- Insanely high amount of data available such as stock market predictions, Government agencies are trying to get meaningful insights about the population based on census data
- Dynamic nature of information availability such as product or movie recommendations on Netflix, Amazon etc…which is highly dependent on your last transaction or what is the current transaction that you are executing.
By the innate nature of Machine learning, it is applicable to all those applications/systems which are trying to learn and doesn’t require you to code explicitly to solve the problem. However, it is not right to assume that there is no coding required at all.
Machine learning field allows you to code in a way so that the application or system can learn to solve the problem on it’s own.
Learning is a iterative process. Even when an infant tries to learn to walk, it has to go through same process of walking,falling,standing,walking, balancing etc..again and again until it achieves a certain degree of confidence to walk and run independently.
The same fundamental concept applies to Machine learning as well where it goes through the Machine learning cycle repeatedly until desired confidence level is achieved.
Going through this cycle is necessary to ensure that Machine learning model is capturing the patterns, characteristics and inter-dependencies from the given data.
A machine learning solution is only as good as the data that drives it.
You are never guaranteed to have a perfect model (Read : Generalized model, there is never a perfect model) until it has gone through a significant amount of iterations to learn the various scenarios from the data.
In fact, with this iterative process you are trying to obtain the Best model which is able to perform equally good on unseen data. The term “Best” is measured through various metrics based on the problem in hand for ex. in case of prediction problem with a continuous variable as output, can be measured with RMSE(Root mean square error) or R² (R-squared). Whereas for a classification problem, the confusion matrix is better suited to measure model performance.
Summary
To achieve the desired ML model performance, having a high quality data is a crucial requirement. However, for most real world problems primarily 3 reasons that impose challenges in implementation of any ML model.
- Implementation | Integration | Data Quality
Hence, it becomes necessary to go through the below mentioned ML pipeline steps repeatedly.
- Build
- Collect and prepare training data — Involves data Collection or data Engineering followed by EDA — Exploratory Data analysis
- Data Pre-processing and Feature Engineering
- Choose Algorithm
- Train
- Model training
- Hyper parameter Optimization
- Manage training requirements
- Model Evaluation, Tuning and debugging
- Deployment
- Deploy model in production
- Address scalability requirements
- Monitor quality, detect drift and retrain if required.