Generalized Linear Models — Testing

Writing the tests for Generalized Linear Machines for Shogun using the GoogleTest framework.

Tej Sukhatme
3 min readAug 25, 2020

Overview

It was really naive of me to think that just writing the GLM code would be enough. Testing is a huge part of the development process which ensures that the code which is released or deployed is always working. The Shogun Continuous Integration is really efficient and aids with this process, checking every pull request as it comes in. All you have to do is write a few tests using the google test framework which will test the integrity of your code. All of the unit tests and the meta example integration tests are checked every time.

Once I was done writing the basic GLM code, I had to make sure it works. For this purpose, I wrote a basic unit test that would run the GLM algorithm on a small sample dataset and compared the results with the original pyGLMnet code. And not to my surprise, the values were off by thousands. I had to do something to fix this so I made all the functions public and did a function by function analysis and comparison. Firstly I worked out all the math I had written by hand in a notebook and found a few errors in the code there. Following this, there were several other errors related to my interpretation of the shogun linalg library I had to fix. Finally, everything was okay the code was returning more or less the same answers. Still the answers weren’t exactly the same. This was simply because the dataset I was using was really small and 1000 iterations weren’t enough to converge and so the final weights depended a lot on the original randomly selected weights.

To fix this issue, I set the weights manually before training the algorithm and finally I got the test to pass every time.

After this, I started to add a simple test to check the most important aspect of the code: the get_gradient() functions. I had to make sure they worked fine as most of the mathematics related to the code lied there. This was relatively simpler as the code was already working. With this done, my test writing was done.

Let us look at the tests in more detail:

Generating data

Let’s talk about generating the data. There is a simple function provided to us by pyGLMnet called simulate_glm() which can do this for us. All I needed to do was write a simple python script that would generate the training feature and labels. The first thing I did was initializing the weights and bias for generating the data:

GLM get_gradient() test

This test simply compares the results of the two gradient calculation functions. I first calculated the gradient using the pyGLMnet function:

And then I compared it to the gradients calculated by the shogun get_gradient() function.

GLM Complete test

In this test I simply compared the final predicted values.

This was how I wrote the unit tests for the Poisson Regression for Shogun.

--

--