Model Evaluation
and cross-validation
Model Evaluation involves assessing the performance of a machine learning model using various metrics. The choice of metrics depends on the type of problem (e.g., classification, regression).
Common Metrics:
- Classification Metrics:
- Accuracy: The proportion of correctly classified instances.
- Precision: The proportion of true positive predictions among all positive predictions.
- Recall (Sensitivity): The proportion of true positive predictions among all actual positive instances.
- F1 Score: The harmonic mean of precision and recall.
- ROC-AUC Score: The area under the receiver operating characteristic curve.
- Confusion Matrix: A table showing the true positives, true negatives, false positives, and false negatives.
2.Regression Metrics:
- Mean Absolute Error (MAE): The average of absolute differences between predicted and actual values.
- Mean Squared Error (MSE): The average of squared differences between predicted and actual values.
- Root Mean Squared Error (RMSE): The square root of MSE.
- R-squared (R²): The proportion of variance in the dependent variable that is predictable from the independent variables.
Cross Validation :
Cross-validation is a method where the entire dataset is not used for training. In this technique, a portion of the dataset is set aside for testing the model. There are various types of cross-validation, with K-Fold Cross-Validation being the most commonly used.
types of cross-validations :
1. LOOCV (leave one out cross-validation)
when I will train my model first record is validation data. remaining records are train data.
like this we will use in 2nd cv(cross-validation )
If have a dataset of 1000 . I need to do this 1000 times. so it time a time-consuming task.
it will do overfitting's .
it was used in past time currently its not usable so its bad practices. don’t use this.
2. hold-out cross-validation: —
sposse we have 1000 records in our dataset. we will split it into training data and validation data. Ration of training and validation data is 70% and 30%.
training data is always greater then validation data.
we will take random statement.
this method can't work we imbalanced dataset.
3. k Fold cross-validation :-
let’s assume we have k = 10. we can say we will do 10 experiments.
we have 1000 records in the dataset.
we will find the accuracy of every cv we have 10 experiments to do. when we finish 10 experiment then we will find average of 10 accuracy and average accuracy is the our final model.
this will not work for an imbalanced dataset.
4. stratified k fold cross validation :
its almost like k fold cross validation. this will work with imbalanced dataset up to some extent.
5. Time series cross-validation: —
It works on a time series dataset. we can’t pick up randomly data in time series. we need to sequentially split based on time. we put recent data in validation and old data in training.
Python’s Gurus🚀
Thank you for being a part of the Python’s Gurus community!
Before you go:
- Be sure to clap x50 time and follow the writer ️👏️️
- Follow us: Newsletter
- Do you aspire to become a Guru too? Submit your best article or draft to reach our audience.