Machine Learning Crash Course: Model selection and evaluation

Code Primer
4 min readJan 8, 2023

--

Welcome back to the machine learning crash course! If you haven’t read the previous parts of the series, it’s highly recommended that you do so, as we’ll be building on the concepts and techniques that we covered earlier.

In this part of the course, we’ll be focusing on model selection and evaluation. Model selection is the process of choosing the best model for a given task, while model evaluation is the process of assessing the performance of a model. Both are essential steps in the machine learning process, and can have a big impact on the quality of the model and its predictions.

Let’s start by looking at some of the different types of models that are available in scikit-learn.

Overview of different types of models

Scikit-learn is a popular library for machine learning in Python, and it provides a wide range of algorithms for classification, regression, clustering, and more. Some of the most common types of models in scikit-learn are:

  • Linear models: These models make predictions using a linear combination of the features. Examples include linear regression, logistic regression, and support vector machines.
  • Decision trees: These models make predictions by partitioning the feature space into regions, and selecting the most likely class for each region. Decision trees are simple to understand and interpret, but can be prone to overfitting.
  • Random forests: These models are an extension of decision trees, and make predictions by averaging the predictions of many decision trees. Random forests are more robust and less prone to overfitting than decision trees, but are more complex to interpret.
  • Neural networks: These models are inspired by the structure and function of the brain, and are composed of multiple interconnected “neurons” that can learn to recognize patterns and make predictions. Neural networks are powerful, but can be difficult to train and interpret.

There are many more types of models available in scikit-learn, and the best choice for a given task will depend on the characteristics of the data and the requirements of the task.

Model selection criteria

When choosing a model for a given task, there are several criteria that you should consider:

  • Accuracy: This is the most obvious criterion, and refers to the ability of the model to make correct predictions. However, accuracy can be misleading if the classes are imbalanced (i.e., if there are many more examples of one class than another).
  • Ease of interpretation: Some models, such as decision trees and linear models, are easy to interpret and understand. This can be important if you need to explain the model to others, or if you want to understand how the model is making predictions.
  • Computational efficiency: Some models, such as neural networks, can be computationally intensive to train and use. If you have a large dataset or limited computational resources, you should choose a model that is efficient to train and use.
  • Scalability: If you anticipate that your dataset will grow in the future, you should choose a model that is scalable and can handle large datasets.

Model evaluation metrics

Once you have selected a model, you need to evaluate its performance. There are many metrics that you can use to evaluate a model, such as:

  • Accuracy: This is the proportion of correct predictions made by the model. It is a simple and intuitive metric, but can be misleading if the classes are imbalanced (i.e., if there are many more examples of one class than another).
  • Precision: This is the proportion of true positive predictions made by the model, relative to all positive predictions. Precision is useful when you want to minimize false positives, such as in the case of a spam filter.
  • Recall: This is the proportion of true positive predictions made by the model, relative to all actual positive examples. Recall is useful when you want to minimize false negatives, such as in the case of a medical diagnosis.
  • F1 score: This is the harmonic mean of precision and recall. The F1 score is a balanced metric that is useful when you want to balance precision and recall.
  • Mean absolute error: This is the average absolute difference between the predicted values and the true values. The mean absolute error is useful for regression tasks, and is insensitive to outliers.
  • Mean squared error: This is the average squared difference between the predicted values and the true values. The mean squared error is also useful for regression tasks, but is sensitive to outliers.

Bias-variance tradeoff

The bias-variance tradeoff is a fundamental concept in machine learning, and refers to the tradeoff between a model’s ability to fit the training data well (low bias) and its ability to generalize to unseen data (low variance).

https://docs.aws.amazon.com/images/machine-learning/latest/dg/images/mlconcepts_image5.png

If a model has high bias, it means that it is oversimplified (underfitting) and cannot capture the complexity of the data, leading to poor performance on the training set and poor generalization to unseen data.

If a model has high variance, it means that it is too complex and overfits the training data, leading to good performance on the training set but poor generalization to unseen data.

Finding the right balance between bias and variance is crucial for building a good model, and is an ongoing challenge in machine learning. We will try to tackle these problems in the coming parts of this series.

That concludes our discussion on model selection and evaluation! In the next part of the course, we’ll dive into supervised learning, and look at some of the most commonly used algorithms for classification and regression tasks. Stay tuned!

Next Part:

Previous Part:

--

--

Code Primer

Welcome to Code Primer! Find easy-to-follow coding tutorials for beginners and experienced developers alike. Covering a variety of topics, we've got you covered