Towards Machine Learning in Pharo: Visualizing Linear Regression

A small tutorial that will teach you how to fit a regression line to Boston Housing data and visualize it with Roassal3

Oleksandr Zaitsev
Nov 14, 2019 · 5 min read

This is a small tutorial on how to estimate prices of houses in Pharo using linear regression model from PolyMath. We will then visualize the data points together with the regression line using the new charting capabilities of Roassal3.

The main purpose of this blog post is to demonstrate the new charting functionality of Roassal3 that were introduced yesterday. The visualization that we will build is not very pretty, but it will give you a taste of the amazing things that we will be able to do in the near future.

Installation

Pharo is a pure object-oriented programming language and a powerful environment, focused on simplicity and immediate feedback (think IDE and OS rolled into one). If you are new to Pharo, you can install it by following the instructions on https://pharo.org/download.

After you have installed and oped your Pharo image, open Playground (Ctrl+OW) and execute (select it and press Ctrl+D) the following Metacello script to install Datasets library. This library will allow you to download different datasets (including Boston Housing dataset which we will be using in this tutorial) and load them into your image as DataFrame objects:

Metacello new
baseline: 'Datasets';
repository: 'github://PharoAI/Datasets';
load.

Now run this script to install PolyMath. It is a library for scientific computing in Pharo which contains a PMLinearRegression class:

Metacello new
repository: 'github://PolyMathOrg/PolyMath:v1.0.1/src';
baseline: 'PolyMath';
load.

Finally, execute this script to install Roassal3. The features that I will be showing you have been added several hours ago. There is a strong probability that API will be changed in the following days, so to make sure that everything works at the time you read this post, you will have to load Roassal3 on a specific commit b9fa9e1:

Metacello new
baseline: 'Roassal3';
repository: 'github://ObjectProfile/Roassal3:b9fa9e1';
load.

The changes that I am presenting in this blog post are not integrated into the masters branch of Roassal3 yet, so you will have to load them separately. To do that, open Iceberg (Ctrl+OI), click on Roassal3, find Roassal3-Matplotlib package and press Load.

Loading Boston Housing Dataset

To load Boston Housing dataset, simply run

boston := Datasets loadBoston.

This will give you a DataFrame with 14 columns:

1. CRIM: per capita crime rate by town
2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS: proportion of non-retail business acres per town
4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
5. NOX: nitric oxides concentration (parts per 10 million)
6. RM: average number of rooms per dwelling
7. AGE: proportion of owner-occupied units built prior to 1940
8. DIS: weighted distances to five Boston employment centres
9. RAD: index of accessibility to radial highways
10. TAX: full-value property-tax rate per $10,000
11. PTRATIO: pupil-teacher ratio by town
12. B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13. LSTAT: % lower status of the population
14. MEDV: Median value of owner-occupied homes in $1000's"

We will be using columns RM and MEDV to study the relation between the average number of rooms and the price of a house.

rooms := boston column: 'RM'.
price := boston column: 'MEDV'.

Fitting a Line to Data with PolyMath

The idea behind a simple univariate linear regression is the following:

  1. We are given a collection of points (x, y) — in our case, x is the number of rooms and y is the price — and
  2. We need to find two values k and b such that the line y = kx + b is the best approximation of all given points. Those values are called “slope” and “intercept”.

More specifically, if we want to use slope k and intercept b to predict yᵢ (price) for every xᵢ (number of rooms) as

We need select k and b in such way that the sum of square differences between the real yᵢ and the predicted yᵢ´ is the smallest:

I will not go into details of how this is done (but I encourage you to read about different ways to compute a linear regression — it’s very interesting). We can find slope and intercept for given points using a PMLinearRegression class from PolyMath. To do that, we simply add points to it and extract the values of slope and intercept:

regression := PMLinearRegression new.1 to: rooms size do: [ :i |
point := (rooms at: i) @ (price at: i).
regression add: point ].
k := regression slope.
b := regression intercept.

Now we can build a prediction for the price of a building based on the number of its rooms:

predictedPrices := k * rooms + b.

These predictions are all on the same line, which is called the “regression line”.

Plotting the Regression Line with Roassal3

Finally, we can build a chart using Roassal3 visualization library, that will show us all our points and a line that goes through them.

We start by creating an empty chart:

chart := RSChart new.

Now we create a scatterplot of points:

points := RSScatterPlot new
x: rooms
y: price.

We also create a line:

regressionLine := RSLinePlot new
x: rooms
y: predictedPrices.

Then we add our scatterplot and line to the chart:

chart
addPlot: points;
addPlot: regressionLine.

We give our chart a custom title and custom labels for both of its axes:

chart
title: 'Boston Housing';
xlabel: 'Number of rooms';
ylabel: 'Price'.

Now we can see the chart by selecting word chart and inspecting it (Ctrl+I) or (Ctrl+G):

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade