Essential Machine Learning with Linear Models in RAPIDS: part 1 of a series
By: Paul Mahler
This blog is the first in a series about regression analysis in RAPIDS, an open GPU data science platform. There are many varieties of regression techniques, and we’re working to include them all in RAPIDS. In this blog edition, I use Ordinary Least Squares (OLS) and Ridge regression to choose a model to predict Washington, D.C. bikeshare rentals¹.
Some triathletes ride ultra-light and aerodynamic bikes when in competition. Others ride 40-pound beasts with three…dcist.co
I want to take a moment to tell the origin story of regression analysis, which will explain why it has that name. I believe that of all the common machine learning techniques (K-means, kNN, PCA), “regression analysis” has the most opaque name. OLS regression was first invented to analyze exceptional genetic traits and their heritability. These early studies seemed to show the offspring of exceptional individuals “regressed to the mean”. The inventor was Sir Francis Galton (half-cousin of Charles Darwin²), who had previously invented the standard deviation and first observed the “wisdom of the crowds” in certain estimation tasks.
I am trying to predict daily demand for short-term bike rentals made in 2012, and I have data from 2011 to build the model. Let’s start with OLS.
OLS gets its name because we are picking values for parameters that minimize the sum of the square of each prediction error we make. OLS works best when we have some theories about the system we are trying to model. Below I have made a dummy variable for rainy days, because I believe that people don’t like riding bikes in the rain. I have fit two models to the 2011 DC Bike Share data (available here from UCI Machine Learning Library). In the first model, I picked variables that I think will have predictive value. In the second model, I just use all the variables and hope for the best. Finally, I test both of them by seeing how well they make predictions on 2012 data. We are using Mean Squared Error as our evaluation metric.
Here we see that my model, based on the idea that people don’t like riding in the bad weather or high wind, and knowing that people behave differently are weekdays, dramatically outperformed an OLS model with every variable included. We have reason not be satisfied with this model. Some of our variables may exhibit collinearity, particularly the pre-made dummy variables for characteristics about the day (day of week, holiday indicator, workday indicator). This brings us to Ridge Regression.
Ridge Regression is a technique that can be used for a lot of purposes. Here we’re going to use it here to address possible collinearity. The effect of colinearity in OLS models is that estimated parameters may be far from the true value. We still pick parameters that minimize the sum of each prediction error, but we also try to keep the sum of the square of all those parameter estimates small. Ridge regression lets you pick a hyperparameter, alpha, that changes the cost of these parameter estimates. When alpha = 0, then we are just doing an OLS regression.
In practice, the best way to pick alpha is to try different values and see which one results in the best performing model. Because RAPIDS is so fast, we’ll just search through 100 different values of alpha.
With our very simple parameter search, we see that the best performing model is a Ridge regression with Alpha = 0.1. There have been volumes written about hyperparameter tuning strategy, and I encourage you read more about it as you incorporate Ridge regressions into your work.
Now that we have the best model we think we can get, we can make predictions about how many bikes will be rented, based on some weather measurements and facts about the day.
We’ve seen how to do OLS and Ridge regressions using RAPIDS. In the next blog, we’ll discuss single variable Lasso regression, as well as multi-GPU OLS. See you next time!
¹ https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset, Fanaee-T, Hadi, and Gama, Joao, ‘Event labeling combining ensemble detectors and background knowledge’, Progress in Artificial Intelligence (2013): pp. 1–15
² Darwin, Francis (1887). The Life and Letters of Charles Darwin. New York: D. Appleton & Co.