How to compare multiple machine learning models?

Published in

Nerd For Tech

3 min readSep 22, 2021

In this article, we will discuss the performance metrics that we must use when we have to compare multiple machine learning models. Performance metrics are the backbone of every machine learning model. They will tell us how accurate we are training and evaluating our model.

In regression-based machine learning problems, it is common to use correlation coefficient (R), Root Mean Square Error (RMSE) or MSE, and Bias as the performance metrics to evaluate the performance of the trained machine learning model (Singh et al. 2022). The formulas for calculating R, RMSE, and MSE are given below;

where SSE is the sum of squares of errors, SST is the sum of squares of the total, yobs is the observed, and ysat is the predicted values. However, these metrics are good to evaluate the performance of a single machine learning model. For comparison of multiple machine learning models (or with other benchmark algorithms), we need some other performance metrics for a robust conclusion.

According to a recent research article (Singh et al., 2021), we need to add some additional performance metrics for comparing two or more machine learning models. They suggested that for multi-model comparison, it is recommended to use Akaike’s Information Criterion (AIC), corrected AIC (AICc), and Bayesian Information Criterion (BIC). All these metrics penalise the machine learning model for a high number of parameters. The model with a lower value of AIC, AICc, and BIC is preferred. We will briefly discuss these criteria (a detailed description can be found in the corresponding references).

Akaike’s Information Criterion (AIC) by (Akaike 1969) [2]

2. Corrected AIC (AICc) by (Hurvich and Tsai, 1989) [3]

3. Bayesian Information Criterion (BIC) by (Schwarz 1978) [4]

where ntrain is the number of training samples, and p is the number of parameters that the machine learning model evaluates internally.

Hence, for a more robust comparison of multiple machine learning models, we can use AIC, AICc, BIC along with R, RMSE, and bias (Singh et al., 2021).

References

[1]. Singh Abhilash, Kumar Gaurav, Atul Kumar Rai, and Zafar Beg “Machine learning to estimate surface roughness from satellite images,” Remote Sensing, MDPI, 13 (19), 2021, DOI: 10.3390/rs13193794.

[2]. Akaike, H. (1969), “Fitting Autoregressive Models for Prediction”. Annals of the Institute of Statistical Mathematics, 21, 243–247.

[3]. Hurvich, C.M., and Tsai, C.L. (1989), “Regression and time-series model selection in small samples”. Biometrika, 76, 297–307.

[4]. Schwarz, G. (1978), “Estimating the Dimension of a Model”. Annals of Statistics, 6, 461–464.

[5] Singh, A., Amutha, J., Nagar, J., Sharma, S., & Lee, C. C. (2022). LT-FS-ID: Log-Transformed Feature Learning and Feature-Scaling-Based Machine Learning Algorithms to Predict the k-Barriers for Intrusion Detection Using Wireless Sensor Network. Sensors, 22(3), 1070.

Note: If you have any queries, please write to me (abhilash.singh@ieee.org) or visit my web page.

How to compare multiple machine learning models?

Don’t forget to subscribe to my YouTube channel.

Written by Abhilash Singh