Visualizing the Classifier Decision Function With SkLearn and Blender.

Octavio Gonzalez-Lugo
Geek Culture
Published in
5 min readApr 29, 2021

One of the most common questions about machine learning is how does the algorithm makes decisions? In the case of deep learning, that is a hot topic of research, however, for some classical methods, a deterministic view on how the algorithm takes a decision can be shown. For example, let’s define a simple binary classification problem with the wine quality data set.

The labels of the data set are set to a hard boundary at a score of five and then the data is scaled and the two principal components of the data sets are used as input data for the classifier.

By fitting a support vector classifier from sklearn to the data set we can have access to the decision function of the algorithm. The decision function is a representation of how the algorithm makes a classification decision. That decision function depends on the kind of algorithm being used. But for a support vector classifier, the function values are proportional to the distance between the data point and the separating hyperplane.

By plotting both the data and the decision function we can observe red areas meaning that the probability to classify a data point as bad wine is higher in that place, and blue areas means the opposite. While white areas represent the transition between both values. A higher color intensity means a greater probability to assign such value.

To create that plot three things were needed, the data set, the labels of the data set, and the decision function evaluation. All that data can be saved into a CSV file and load it into blender to recreate the visualization. A common choice to work with CSV files will be pandas, however, it is not available in blender, thus it is a good idea to find different ways to work with CSV files without pandas. The first option will be used to save the file with help of the CSV library. And to load the data the genfromtxt function from numpy will be used.

With all the data in place, we can start to create a similar visualization in blender. First, we modify the starting cube so that the cube encapsulates all the objects in the visualization, then a small bevel is added to all the edges of the cube. And the edges of the cube are smoothed so that there are no sharp edges in the background. The camera and lamp are relocated so that all the objects in the visualization are visible.

Then the data used for the scatter plot is loaded and added to the scene. Each data point will be a new geometry in the scene. Also, two options are added to the function, the first one will scale each data point by the same value to all the data set, or will scale each data point according to a list of values. For that second option to work the array used as a list of scales needs to be of the same length as the data set. That will help us to introduce or encode new data into the visualization.

Then the data set is separated into the two respective classes and added to the scene. And a new material is defined for each geometry with the following.

A different color is applied to each class in the data set to make them recognizable, by rendering the current progress we get the following.

Now the plane for the desition function is added and located with a small separation between the plane of the scatter plot and scaled accordingly. Then the plane is modified with the desition function data to have a landscape of the desition function, that deformation is used to determine the color for the color map. To create a divergent color map for the desition function tow color maps are going to be appended together. And the point in which both color maps intersects will be the point where both colormaps will be appended. Following the same logic as the original visualization, the color map will go from red to white to blue. Then the color map is created by adding a simple material and added to the mesh geometry with the following.

Rendering the complete scene for this visualization results in the following.

We can modify the visualization by taking advantage of the scaling options added at the beginning. Wine quality scores are scaled between zero and one for each class in the data set and obtain the following.

Know you now how to make a visualization that contains many aspects of the data being analyzed as well as a visual representation of the algorithm decision making. Also, different methods to export or import data in CSV format without pandas. The complete code for the matplotlib visualization can be found by clicking here, and the blender makeover by clicking here. The data set for this visualization can be downloaded from the UCI Machine Learning Repository by clicking here. See you in the next one.

--

--

Octavio Gonzalez-Lugo
Geek Culture

Writing about math, natural sciences, academia and any other thing that I can think about.