Multiclass Classification for Corporate Credit Ratings (Using Credit Risk Analytics Book by Harald, Daniel and Bart)

17 min readFeb 3, 2022

Credit Rating Models

Predicting financial difficulties and defaults has a great importance in the business world. Not only those responsible for providing credit in banks and insurance companies need tools that enable them to examine the degree of risk of their clients; Shareholders, investors and executives are also required to assess the risk and financing policy of their investments or enterprises.

For a long time, researchers have been trying to understand the prediction of credit default and to improve it, and researches on the financial strength of companies began as early as the 1930s. One of the most common and oldest methods of predicting a company’s default is to analyze its financial and accounting ratios as well as its business environment.

Many studies have reached to the conclusion that the financial ratios of a defaulting firm are significantly different than those of a going concern firm. One of the classic studies in the field of financial ratios analysis and predicting defaults is the study of W. Beaver (1967).

In order to anticipate financial difficulties, Beaver used a single financial ratio each time (and based on a cut-off point which he had set), and classify firms as “defaulting” and “non-defaulting”. His main conclusion was that financial ratios in particular and accounting data in general, can predict default up to five years in advance.

The “cash flow to total debt” ratio proved to be the best predictor of default: it was able to predict 78% of defaults in the year before the event and correctly predicted 95% of the cases in which stability was maintained. However, it has been found that the ability to predict defaults decreases considerably as the time lag until they occur increases.

In more recent studies, credit rating models have been constructed by relying on a number of financial ratios (rather than a single ratio), including consideration of the relationships between them. These models weight the company’s financial ratios using importance coefficients and by that they create a metric that reflects the borrower’s relevant credit rating or probability of default.

To implement the credit rating models, the data scientist must select objective economic and financial risk metrics for each group of borrowers. For example: for consumer credit, the credit rating model should include attributes such as the household annual income, its cumulative assets and the age of the main wage earners.

When it comes to business credit, the company’s leverage ratios (aka financial risk ratios), operating profitability ratios and internal liquidity ratios are the main explanatory features. Once the metrics have been identified, it is important to choose an appropriate machine learning model to assess the probability of default or to determine the relevant credit rating.

With this in mind, this is what we are going to do today: Learning how to use Machine Learning to help us predict credit ratings. Let’s get started!

The Data

The diabetes data set was originated from a credit risk analytics book by Harald, Daniel and Bart and it can be downloaded from here.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inlinedf = pd.read_csv(‘ratings.csv’)
print(df.columns)

Index(['spid', 'rating', 'COMMEQTA', 'LLPLOANS', 'COSTTOINCOME', 'ROE', 'LIQASSTA', 'SIZE'], dtype='object')

df.head()

The credit ratings data set consists of 5,000 data points, with 8 features each:

print(“dimension of credit data: {}”.format(df.shape))

dimension of diabetes data: (5000, 8)

“rating” is the feature we are going to predict, and it has 10 classes. Each class has 500:

print(df.groupby(‘rating’).size())

import seaborn as snssns.countplot(df[‘rating’],label=”Count”)

df.info()

Data Preprocessing

We do not need the feature “spid” (this is the company id), so, we will drop it.

df.drop([‘spid’], axis=1, inplace=True)

Let’s create our feature matrix, “X”, and our target vector, “y”

X = df.loc[:, df.columns != “rating”]
y = df.loc[:, df.columns == “rating”]

Let’s split our data set as follows: 67% to the training set and 33% to the test set. Since our data set before the split is already balanced, I would like to keep it that way — so I’ll set “stratify” as equal to “y”.

from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, stratify=y)

Throughout the article I’ll use a random_state of 42. I do not wish to earn extra points due to the choice of the random_state by the algorithms — I am very good at my job — so I do not need that “help”.

k-Nearest Neighbors

The k-NN algorithm is arguably the simplest machine learning algorithm. Building the model consists only of storing the training data set. To make a prediction for a new data point, the algorithm finds the closest data points in the training data set — its “nearest neighbors.”

First, Let’s investigate whether we can confirm the connection between model complexity and accuracy:

from sklearn.neighbors import KNeighborsClassifiertraining_accuracy = []
test_accuracy = []
# try n_neighbors from 1 to 10
neighbors_settings = range(1, 11)for n_neighbors in neighbors_settings:
 # build the model
 knn = KNeighborsClassifier(n_neighbors=n_neighbors)
 knn.fit(X_train, y_train)
 # record training set accuracy
 training_accuracy.append(knn.score(X_train, y_train))
 # record test set accuracy
 test_accuracy.append(knn.score(X_test, y_test))plt.plot(neighbors_settings, training_accuracy, label=”training accuracy”)
plt.plot(neighbors_settings, test_accuracy, label=”test accuracy”)
plt.ylabel(“Accuracy”)
plt.xlabel(“n_neighbors”)
plt.legend()
plt.savefig(‘knn_compare_model’)

The above plot shows the training and test set accuracy on the y-axis against the setting of n_neighbors on the x-axis. Considering if we choose one single nearest neighbor, the prediction on the training set is perfect. But when more neighbors are considered, the training accuracy drops, indicating that using the single nearest neighbor leads to a model that is too complex. The best performance is somewhere around 4 neighbors.

The plot suggests that we should choose n_neighbors=4. Here we are:

knn = KNeighborsClassifier(n_neighbors=4)
knn.fit(X_train, y_train)print(‘Accuracy of K-NN classifier on training set: {:.2f}’.format(knn.score(X_train, y_train)))
print(‘Accuracy of K-NN classifier on test set: {:.2f}’.format(knn.score(X_test, y_test)))

Accuracy of K-NN classifier on training set: 0.69

Accuracy of K-NN classifier on test set: 0.52

Logistic regression

Logistic Regression is a version of regression used for classification.

from sklearn.linear_model import LogisticRegressionlogreg = LogisticRegression(random_state=42).fit(X_train, y_train)print(“Training set score: {:.3f}”.format(logreg.score(X_train, y_train)))
print(“Test set score: {:.3f}”.format(logreg.score(X_test, y_test)))

Training set score: 0.485

Test set score: 0.453

The default value of C=1 provides with 48% accuracy on the training and 45% accuracy on the test set.

logreg001 = LogisticRegression(C=0.01, random_state=42).fit(X_train, y_train)print(“Training set accuracy: {:.3f}”.format(logreg001.score(X_train, y_train)))
print(“Test set accuracy: {:.3f}”.format(logreg001.score(X_test, y_test)))

Training set accuracy: 0.337

Test set accuracy: 0.316

Using C=0.01 results in lower accuracy on both the training and the test sets.

logreg100 = LogisticRegression(C=100, random_state=42).fit(X_train, y_train)print(“Training set accuracy: {:.3f}”.format(logreg100.score(X_train, y_train)))
print(“Test set accuracy: {:.3f}”.format(logreg100.score(X_test, y_test)))

Training set accuracy: 0.495

Test set accuracy: 0.464

Using C=100 results in a little bit higher accuracy on the training set and little bit lower accuracy on the test set, confirming that less regularization and a more complex model may not generalize better than default setting.

Therefore, we should choose default value C=100.

Decision Tree

A decision tree is a simple, decision making-diagram, which is basically a way of considering features one by one.

from sklearn.tree import DecisionTreeClassifiertree1 = DecisionTreeClassifier(random_state=42)
tree1.fit(X_train, y_train)print(“Accuracy on training set: {:.3f}”.format(tree1.score(X_train, y_train)))
print(“Accuracy on test set: {:.3f}”.format(tree1.score(X_test, y_test)))

Accuracy on training set: 1.000

Accuracy on test set: 0.700

The accuracy on the training set is 100%, while the test set accuracy is much worse. This is an indicative that the tree is overfitting and not generalizing well to new data. Therefore, we need to apply pre-pruning to the tree.

We set max_depth=14, limiting the depth of the tree decreases overfitting. This leads to a lower accuracy on the training set, but an improvement on the test set.

tree2 = DecisionTreeClassifier(max_depth=14, random_state=42)
tree2.fit(X_train, y_train)print(“Accuracy on training set: {:.3f}”.format(tree2.score(X_train, y_train)))
print(“Accuracy on test set: {:.3f}”.format(tree2.score(X_test, y_test)))

Accuracy on training set: 0.900

Accuracy on test set: 0.731

Feature Importance in Decision Trees

Let’s visualize the coefficients learned by the models.

df_features = [x for i,x in enumerate(X.columns) if i!= len(X.columns) ]

Feature importance rates how important each feature is for the decision a tree makes. It is a number between 0 and 1 for each feature, where 0 means “not used at all” and 1 means “perfectly predicts the target”. The feature importances always sum to 1:

Feature importances: [ 0.10678475 0.23736728 0.17098942 0.18010008 0.09578607 0.20897239 ]

Then we can visualize the feature importances:

def plot_feature_importances_credit(model):
 plt.figure(figsize=(8,6))
 n_features = len(X.columns)
 plt.barh(range(n_features), model.feature_importances_, align=’center’)
 plt.yticks(np.arange(n_features), df_features)
 plt.xlabel(“Feature importance”)
 plt.ylabel(“Feature”)
 plt.ylim(-1, n_features)
 
plot_feature_importances_credit(tree2)
plt.savefig(‘feature_importance’)

The single decision tree gives a lot of importance to the “LLPLOANS” feature, but it also chooses “SIZE” to be the 2nd most informative feature overall.

Random Forest

Random forests are a large number of decision trees, combined (using averages or “majority rules”) at the end of the process. Let’s apply a random forest consisting of 100 trees on the credit ratings data set:

from sklearn.ensemble import RandomForestClassifierrf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)print(“Accuracy on training set: {:.3f}”.format(rf.score(X_train, y_train)))
print(“Accuracy on test set: {:.3f}”.format(rf.score(X_test, y_test)))

Accuracy on training set: 1.000

Accuracy on test set: 0.779

The random forest gives us an accuracy of 77.9%, better than the logistic regression model, the k-Nearest Neighbors and the single decision tree, without tuning any parameters. However, we can adjust the max_features setting, to see whether the result can be improved.

rf1 = RandomForestClassifier(max_depth=12, n_estimators=100, random_state=42)
rf1.fit(X_train, y_train)print(“Accuracy on training set: {:.3f}”.format(rf1.score(X_train, y_train)))
print(“Accuracy on test set: {:.3f}”.format(rf1.score(X_test, y_test)))

Accuracy on training set: 0.910

Accuracy on test set: 0.790

It actually did, this indicates that the default parameters of the random forest are not the best ones.

Feature importance in Random Forest

plot_feature_importances_credit(rf1)

Similarly, to the single decision tree, the random forest gives a lot of importance to the “LLPLOANS” feature, and chooses “SIZE” to be the 2nd most informative feature overall. The randomness in building the random forest forces the algorithm to consider many possible explanations, the result being that the random forest captures a much broader picture of the data than a single tree.

Gradient Boosting

Gradient boosting machines also combine decision trees, but start the combining process at the beginning, instead of at the end. Let’s apply a gradient boosting on the credit ratings data set:

from sklearn.ensemble import GradientBoostingClassifier
gb = GradientBoostingClassifier(random_state=42)
gb.fit(X_train, y_train)print(“Accuracy on training set: {:.3f}”.format(gb.score(X_train, y_train)))
print(“Accuracy on test set: {:.3f}”.format(gb.score(X_test, y_test)))

Accuracy on training set: 0.920

Accuracy on test set: 0.805

We are likely to be overfitting. To reduce overfitting, we could either apply stronger pre-pruning by limiting the maximum depth or lower the learning rate:

gb1 = GradientBoostingClassifier(max_depth=1, random_state=42)
gb1.fit(X_train, y_train)print(“Accuracy on training set: {:.3f}”.format(gb1.score(X_train, y_train)))
print(“Accuracy on test set: {:.3f}”.format(gb1.score(X_test, y_test)))

Accuracy on training set: 0.813

Accuracy on test set: 0.784

gb2 = GradientBoostingClassifier(learning_rate=0.01, random_state=42)
gb2.fit(X_train, y_train)print(“Accuracy on training set: {:.3f}”.format(gb2.score(X_train, y_train)))
print(“Accuracy on test set: {:.3f}”.format(gb2.score(X_test, y_test)))

Accuracy on training set: 0.763

Accuracy on test set: 0.727

Both methods of decreasing the model complexity reduced the training set accuracy, as expected. However, in this case, none of these methods increased the generalization performance of the test set.

Feature importance in Gradient Boosting

We can visualize the feature importances to get more insight into our model even though we are not really happy with the model:

plot_feature_importances_credit(gb)

We can see that the feature importances of the gradient boosted trees are very similar to the feature importances of the random forests, it gives weight to all of the features in this case.

Support Vector Machine Classification

Support vector is an observation at the edge of the pathway. A support vector machine classification is a construction of a pathway to classify observations.

from sklearn.svm import SVCsvc1 = SVC(random_state=42)
svc1.fit(X_train, y_train)print(“Accuracy on training set: {:.2f}”.format(svc1.score(X_train, y_train)))
print(“Accuracy on test set: {:.2f}”.format(svc1.score(X_test, y_test)))

Accuracy on training set: 0.33

Accuracy on test set: 0.33

The model underfits quite substantially (where training and test set performance are quite similar but less close to 100% accuracy), with a poor score on the training set and only 33% accuracy on the test set.

SVM requires all the features to vary on a similar scale. We will need to re-scale our data that all the features are approximately on the same scale:

Feature Scaling for Support Vector Machine

Feature scaling is a procedure to ensure that the features are measures on similar scales. SVM works better with a min-max scaling (aka normalization) rather than with a z-score normalization (aka standardization). Min-max scaling is a feature scaling by subtracting the minimum feature value and dividing by the difference between the maximum and the minimum feature values.

from sklearn.preprocessing import MinMaxScalerscaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)svc2 = SVC(random_state=42)
svc2.fit(X_train_scaled, y_train)print(“Accuracy on training set: {:.2f}”.format(svc2.score(X_train_scaled, y_train)))
print(“Accuracy on test set: {:.2f}”.format(svc2.score(X_test_scaled, y_test)))

Accuracy on training set: 0.73

Accuracy on test set: 0.68

Scaling the data made a huge difference! From here, we can try increasing either C or gamma to fit a more complex model.

svc3 = SVC(C=100, random_state=42)
svc3.fit(X_train_scaled, y_train)print(“Accuracy on training set: {:.3f}”.format(svc3.score(X_train_scaled, y_train)))
print(“Accuracy on test set: {:.3f}”.format(svc3.score(X_test_scaled, y_test)))

Accuracy on training set: 0.822

Accuracy on test set: 0.730

Here, increasing C allows us to improve the model, resulting in 73% test set accuracy.

Multilayer Perceptrons

Multilayer perceptrons (MLP) is a supplement of feed forward neural network. It consists of three types of layers — the input layer, output layer and hidden layer. Let’s apply a multilayer perceptron consisting of 1 hidden layer only on the credit ratings data set:

from sklearn.neural_network import MLPClassifiermlp1 = MLPClassifier(random_state=42)
mlp1.fit(X_train, y_train)print(“Accuracy on training set: {:.2f}”.format(mlp1.score(X_train, y_train)))
print(“Accuracy on test set: {:.2f}”.format(mlp1.score(X_test, y_test)))

Accuracy on training set: 0.57

Accuracy on test set: 0.56

The accuracy of the Multilayer perceptrons (MLP) is not as good as the other models at all, this is likely due to scaling of the data. deep learning algorithms also expect all input features to vary in a similar way, and ideally to have a mean of 0, and a standard deviation of 1. We must re-scale our data so that it fulfills these requirements.

Feature Scaling for Multilayer Perceptron

Feature scaling is a procedure to ensure that the features are measures on similar scales. MLP works better with a z-score normalization (aka standardization) rather than with a min-max scaling (aka normalization). Z-score scaling is a feature scaling by subtracting the mean and dividing by the standard deviation.

from sklearn.preprocessing import StandardScalerscaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)mlp2 = MLPClassifier(random_state=42)
mlp2.fit(X_train_scaled, y_train)print(“Accuracy on training set: {:.3f}”.format(mlp2.score(X_train_scaled, y_train)))
print(“Accuracy on test set: {:.3f}”.format(mlp2.score(X_test_scaled, y_test)))

Accuracy on training set: 0.784

Accuracy on test set: 0.730

Let’s increase the number of iterations:

mlp3 = MLPClassifier(max_iter=1000, random_state=42)
mlp3.fit(X_train_scaled, y_train)print(“Accuracy on training set: {:.3f}”.format(mlp3.score(X_train_scaled, y_train)))
print(“Accuracy on test set: {:.3f}”.format(mlp3.score(X_test_scaled, y_test)))

Accuracy on training set: 0.839

Accuracy on test set: 0.740

Increasing the number of iterations increased both the training set performance, and the test set performance.

Let’s increase the alpha parameter and add stronger regularization of the weights:

mlp4 = MLPClassifier(max_iter=1000, alpha=1, random_state=42)
mlp4.fit(X_train_scaled, y_train)print(“Accuracy on training set: {:.3f}”.format(mlp4.score(X_train_scaled, y_train)))
print(“Accuracy on test set: {:.3f}”.format(mlp4.score(X_test_scaled, y_test)))

Accuracy on training set: 0.734

Accuracy on test set: 0.687

The result is bad, we decrease both the training accuracy and the test accuracy.

Therefore, our best MLP model so far is the MLP model after scaling and increasing the number of iterations.

Feature importance in Multilayer Perceptron

Finally, we plot a heat map of the first layer weights in a neural network learned on the credit ratings data set.

plt.figure(figsize=(20, 5))
plt.imshow(mlp3.coefs_[0], interpolation=”none”, cmap=’viridis’)
plt.yticks(range(len(X.columns)), df_features)
plt.xlabel(“Columns in weight matrix”)
plt.ylabel(“Input feature”)
plt.colorbar()

From the heat map, it is not easy to point out quickly that which feature (features) have relatively low weights compared to the other features.

Comparing the Models Performance

algorithms = [“k-Nearest Neighbors”, “Logistic Regression”, “Decision Trees”, “Random Forest”,
 “Gradient Boosting”, “Support Vector Machine”, “Deep Learning”]tests_accuracy = [knn.score(X_test, y_test), logreg100.score(X_test, y_test), tree2.score(X_test, y_test),
 rf1.score(X_test, y_test), gb.score(X_test, y_test), svc3.score(X_test_scaled, y_test),
 mlp3.score(X_test_scaled, y_test)]compare_algorithms = pd.DataFrame({ “Algorithms”: algorithms, “Tests Accuracy”: tests_accuracy })
compare_algorithms.sort_values(by = “Tests Accuracy”, ascending = False)

Okay. I was really surprised seeing some 80% accuracy from the Gradient Boosting. That is just too perfect. Another model that we could depend on is the Random Forest. It works relatively fine in my opinion.

import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(8,8))
sns.barplot(x = “Tests Accuracy”, y = “Algorithms”, data = compare_algorithms)
plt.show()

Predictions With Multiclass Gradient Boosting

When we train a model, the hope is that we’ll later be able to take the trained model, apply it to new data, and have the model generalize and accurately predict on data it hasn’t seen before.

For example, suppose we have a model that categorizes images of sign language digits (say we have ten classes, labeled as 0 through 9, and each class is made up of images of hands showing the sign for that particular digit) and that the training data contained thousands of images of sign language digits from a particular data set online.

What Is Inference?

Now suppose that later we want to take this model and use it to predict on other images of sign language digits from a different data set. The hope is that, even though our model wasn’t exposed to these particular sign language digits images during training, it will still be able to accurately make predictions for them based on what it’s learned from the sign language digits data set from which it was trained.

We call this process inference, as the model is using its knowledge gained from training and using it to infer a prediction or result.

model = gb
model

GradientBoostingClassifier(random_state=42)

At this point, the Gradient Boosting model we’ve been working with has now been trained. Given the results we’ve seen from the training data, it appears that this model should do well on predicting on a new test set.

Note that the test set is the set of data used specifically for inference after training has concluded.

Evaluating The Test Set

To get predictions from the model for the test set, we call model.predict().

predictions = model.predict(X_test)

To this function, we pass in the test samples X_test, The output from the predictions won't be relevant for ust.

Note that, unlike with training set, we do not pass the labels of the test set y_test to the model during the inference stage.

To see what the model’s predictions look like, we can iterate over them and print out the most probable prediction.

for i in predictions:
 print(i)

From the printed prediction results, we can observe the underlying predictions from the model, however, we cannot judge how accurate these predictions are just by looking at the predicted output.

If we have corresponding labels for the test set, (for which, in this case, we do), then we can compare these true labels to the predicted labels to judge the accuracy of the model’s evaluations.

Confusion Matrix for Multiclass Gradient Boosting Predictions

Although we were able to read the predictions from the model easily, we weren’t easily able to compare the predictions to the true labels for the test data.

With a confusion matrix, we’ll be able to visually observe how well the model predicts on test data.

The confusion matrix we’ll be plotting comes from scikit-learn.

We then create the confusion matrix and assign it to the variable cm. T

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true=y_test, y_pred=predictions)

To the confusion matrix, we pass in the true labels y_test as well as the model’s predicted labels predictions for the test set.

Below, we have a function called plot_confusion_matrix() that came directly from scikit-learn's website. This is code that they provide in order to plot the confusion matrix.

def plot_confusion_matrix(cm, classes,
 normalize=False,
 title=’Confusion matrix’,
 cmap=plt.cm.Blues):
 “””
 This function prints and plots the confusion matrix.
 Normalization can be applied by setting `normalize=True`.
 “””
 plt.imshow(cm, interpolation=’nearest’, cmap=cmap)
 plt.title(title)
 plt.colorbar()
 tick_marks = np.arange(len(classes))
 plt.xticks(tick_marks, classes, rotation=45)
 plt.yticks(tick_marks, classes)if normalize:
 cm = cm.astype(‘float’) / cm.sum(axis=1)[:, np.newaxis]
 print(“Normalized confusion matrix”)
 else:
 print(‘Confusion matrix, without normalization’)print(cm)thresh = cm.max() / 2.
 for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
 plt.text(j, i, cm[i, j],
 horizontalalignment=”center”,
 color=”white” if cm[i, j] > thresh else “black”)plt.tight_layout()
 plt.ylabel(‘True label’)
 plt.xlabel(‘Predicted label’)

Next, we define the labels for the confusion matrix. In our case, the labels are titled ‘1’,’2',’3',’4',’5',’6',’7',’8',’9',’10'.

import itertoolscm_plot_labels = [‘1’,’2',’3',’4',’5',’6',’7',’8',’9',’10']

Lastly, we plot the confusion matrix by using the plot_confusion_matrix() function we just discussed. To this function, we pass in the confusion matrix cm and the labels cm_plot_labels, as well as a title for the confusion matrix.

plot_confusion_matrix(cm=cm, classes=cm_plot_labels, title=’Confusion Matrix’)

Looking pretty good! Just checking out the diagonal in blue that contains all the correctly predicted samples, we can get an idea that the model did pretty well. Each class had 165 samples, and we see a decent amount of above 139 here.

Total, the model gave 321 incorrect predictions out of 1650 total, which gives us an accuracy of 81% on the test set. Not bad. As mentioned earlier, there are still improvements that could be made to this model, so if you implement any and get better than 81% accuracy on the test set, share in the comments!

Reading A Confusion Matrix

Looking at the plot of the confusion matrix, we have the predicted labels on the x-axis and the true labels on the y-axis. The blue cells running from the top left to bottom right contain the number of samples that the model accurately predicted. The white cells contain the number of samples that were incorrectly predicted.

print(“\033[1m The result is telling us that we have: “,(cm[0,0] + cm[1,1] + cm[2,2] + cm[3,3] + cm[4,4] + 
 cm[5,5] + cm[6,6] + cm[7,7] + cm[8,8] + cm[9,9] ),”correct predictions.”)
print(“\033[1m The result is telling us that we have: “, ( cm.sum() — (cm[0,0] + cm[1,1] + cm[2,2] + 
 cm[3,3] + cm[4,4] + cm[5,5] + cm[6,6] + cm[7,7] + 
 cm[8,8] + cm[9,9] )),”incorrect predictions.”)
print(“\033[1m We have a total predictions of: “,( cm.sum()) )

There are 1650 total samples in the test set. Looking at the confusion matrix, we can see that the model accurately predicted 1329 out of 1650 total samples. The model incorrectly predicted 321 out of the 1650.

As you can see, this is a good way we can visually interpret how well the model is doing at its predictions and understand where it may need some work.

Classification Report for Multiclass Gradient Boosting Predictions

from sklearn.metrics import classification_reportprint(classification_report(y_test, predictions))

Summary

We practiced a wide array of machine learning models for classification and regression, what their advantages and disadvantages are, and how to control model complexity for each of them. We saw that for many of the algorithms, setting the right parameters is important for good performance.

We should be able to know how to apply, tune, and analyze the models we practiced above. It’s your turn now! Try applying any of these algorithms to the built-in data sets in scikit-learn or any data set at your choice. Happy Machine Learning!

Source code that created this post can be found here. I would be pleased to receive feedback or questions on any of the above.

Reference: Introduction to Machine Learning with Python