Machine Learning: Model Types, Workflow and Complications
Machine learning is a subset of artificial intelligence (AI) that focuses on designing systems that can learn from and make decisions or predictions. Instead of being explicitly programmed to perform a task, a machine learns from large amounts of data.
Machine learning is analysing patterns and relationships within data and making predictions. It serves as the foundation of many modern AI advancements.
Why do we need predictive models?
Predictive models help us anticipate future outcomes based on historical data. For instance, a botanist may use machine learning to train a model to identify different types of flowers.
How do we create a predictive model?
- Gather data: A botanist collects some samples.
- Each sample has a set of features and a label.
- An algorithm is used to find the relationship between the features and the label.
- The result is a model that encapsulates those relationships.
- The model can now accurately predict the label of a new sample based on its features.
Types of machine learning models
Supervised learning: Training data comes with labels.
Two main types are:
- Regression: Used for predicting numerical values, e.g., predicting bike rentals for a month.
- Classification: Predicting categories, e.g., assessing health risks in groups or individuals.
Unsupervised learning: Works with unlabeled data.
One primary type is:
Clustering: Groups data based on similarities, e.g., clustering vehicles by fuel emissions.
- Data Collection: Source quality data. Predictions are only as good as the data fed into the model.
- Preprocessing: Address missing data, manage outliers, and format data for analysis. Proper preprocessing ensures the model captures core relationships in the data.
- Select Model: Choose a suitable algorithm based on your specific problem — whether it’s Regression, Classification, or Clustering as we covered earlier.
- Training: We introduce the model to the data, allowing it to identify patterns and relationships. Adjust and monitor its parameters during this phase.
- Evaluation: Validate the data. We want to test the model’s accuracy using datasets it hasn’t seen before to ensure it’s generalising well.
- Deployment: If the model performs at an acceptable level, it’s integrated into applications, ready for real-world predictions— plan for monitoring, scaling, and handling various loads of requests and traffic. Good news, there’s most definitely a cloud solution for that!
Outliers can introduce noise and inaccuracies into the machine learning process, their treatment should be determined by the specific context of the problem being addressed. A great explanation about outlier dectection and why it’s important .
Considerations and complications
- Ensure Data Quality: The principle of “Garbage in, garbage out” holds here. The quality of predictions is directly tied to data quality.
- Avoid Overfitting: Overfitting happens when a model is excessively complex, causing it to memorize training data rather than discern patterns. Such models don’t generalize well to new data.
- Promote Fair and Unbiased Data: Using biased training data can lead to skewed predictions. It’s essential to ensure data is representative and doesn’t perpetuate existing biases.
- Scalability: When deploying models, consider how they’ll handle real-world usage, varying workloads, and traffic volumes.
Learning resources
Microsoft Learn
My favourite option for learning and tailored towards obtaining Azure AI certification. Microsoft Learn provides a series of learning paths tailored to various Microsoft technologies, including AI and Azure’s machine learning tools:
- Introduction to Azure Machine Learning
- Principles of machine learning
- Create machine learning models
- Build AI solutions with Azure Machine Learning
General AI and Machine Learning
- Coursera: Machine Learning by Andrew Ng: This is one of the most popular introductions to machine learning.
- Stanford University: CS231n: Convolutional Neural Networks for Visual Recognition: This course dives deep into deep learning in the context of visual data.
- Google’s Machine Learning Crash Course: ML Crash Course: This is an accessible introduction to machine learning using TensorFlow.
- MIT OpenCourseWare: Introduction to Deep Learning: A course provided by the Massachusetts Institute of Technology.