Generalized Linear Models — Code

Writing the code for Generalized Linear Machines for Shogun in C++.

Tej Sukhatme

3 min readAug 25, 2020

Overview

Both the documentation and the code is heavily inspired by pyGLMnet.:

glm-tools/pyglmnet

Documentation (stable version)] [Documentation (development version)] Pyglmnet provides a wide range of noise models…

github.com

The first thing I did was to separate all the calculations into new functions in a different class called GLMCostFunction. At first, I had derived this class from the FirstOrderCostFunction class but I later got rid of that inheritance as it proved to be more of a hindrance.

The original GLM class dealt with the gradient descent actually minimizing the loss function. I made it so that it derived from the Iterative Machine class which makes it really convenient to write any iterative algorithms. There are two functions init_model() and iteration() that you have to override and your work is done. The train_machine() as defined in the IterativeMachine class takes care of everything by first calling init_model() followed by iteration() a fixed number of times unless the algorithm converges.

This was the main pull request that I have opened:

WIP Write Generalized Linear Machine class by Hephaestus12 · Pull Request #5006 ·…

5005 #5000 This is the basic framework for the Generalized Linear Machine class. This class is supposed to implement…

github.com

In the future, this GLM is supposed to have all the different kinds of distributions like BINOMIAL, GAMMA, SOFTPLUS, PROBIT, POISSON.

I made an enum for this which will decide which distribution is going on:

There were several things I learned, like, using cmath instead of math.h or a lot of ways to make C++ code better like making parameterized constructors call the non-parameterized constructors to help with the abstraction.

I was first using the <random> library to generate random numbers but as Viktor pointed out, the distr is not uniform across various stdc++ implementations. And so I then had to make the GLM class inherit from RandomMixin as well which sorted out the problems related to random number generation.

Also, the optimization classes of Shogun have been really useful when optimizing the algorithm. There are entire classes for tasks like gradient descent updates or for managing constant and variable learning rates, even IterativeMachine for that matter. All this really takes the load of the coder’s shoulders and he/she can focus on getting the code to work.

Another really neat thing about Shogun is linalg. Since there are no direct libraries for Linear Algebra in C++, using linalg makes it a breeze without increasing the computation times by going through loops by running Eigen3 and ViennaCL in the background.

One point to note is that Shogun uses matrices with each row corresponding to one feature and each column an example. This is called column-major ordering, which is quite the opposite as compared to most python libraries.

Let’s see how the code works:

init_model()

There is a simple init_model() function which runs once before the iterations begin. All it does is take care of all the initializations for the weights and the bias.

Here we are setting the weights to be random values taken from a random distribution. That is where RandomMixin comes in which is used when inheriting the LinearMachine class.

iteration()

Let’s look at the iteration function. This function performs simple gradient descent:

You may be wondering where we get our gradient_w and gradient_bias from, that is where all of the math comes in.

Firstly I defined a few functions to help with better abstraction of the code.

compute_z()

The compute_z function:

non_linearity()

The function to implement the non-linearity:

gradient_non_linearity()

And finally the gradient of this non-linearity:

With these simple functions out of the way, let’s get to the actual gradient calculation. We will be using the expression we derived in the previous blog post.

get_gradient()

I wrote similar code for finding the gradient with respect to the bias, it’s just that we don’t have to worry about all the complex linear algebra and we can simply deal with scalars.

apply_regression()

Finally comes the stage where you have to apply the model to a dataset once it is trained. This is done using the apply_regression() method.

This was how I implemented Poisson Regression in Shogun.