# CoreML — Boston Prices exploration

In the previous post of this series we described some of the basics of linear regression, one of the most well-known models in machine learning. We saw that we can relate the values of input parameters

to the target variable

to be predicted. In this post we are going to create a linear regression model to predict the price of houses in Boston (based on valuations from 1970s). The dataset provides information such as Crime (CRIM), areas of non-retail business in the town (INDUS), the age of people who own the house (AGE), average number of rooms (RM) as well as the median value of homes in $1000s (MEDV) as well as other attributes.

Let us start by exploring the data. We are going to use Scikit-learn and fortunately the dataset comes with the module. The input variables are included in the `data`

method and the price is given by the `target`

. We are going to load the input variables in the dataframe `boston_df`

and the prices in the array `y:`

import pandas as pd

from sklearn import datasets

boston = datasets.load_boston()

boston_df = pd.DataFrame(boston.data)

boston_df.columns = boston.feature_names

y = boston.target

We are going to build our model using only a limited number of inputs. In this case let us pay attention to the average number of rooms and the crime rate:

X = boston_df[['CRIM', 'RM']]

X.columns = ['Crime', 'Rooms']

X.describe()

The description of these two attributes is as follows:

Crime Rooms

count 506.000000 506.000000

mean 3.593761 6.284634

std 8.596783 0.702617

min 0.006320 3.561000

25% 0.082045 5.885500

50% 0.256510 6.208500

75% 3.647423 6.623500

max 88.976200 8.780000

As we can see the minimum number of rooms is 3.5 and the maximum is 8.78, where as for the crime rate the minimum is 0.006 and the maximum is 88.97, nonetheless the median is 0.25. We will use some of these values to define the ranges that will be provided to our users to define the predictions.

Finally, let us visualise the data:

We shall bear these values in mind when building our regression model in subsequent posts.

You can look at the code (in development) in my github site here.

*Originally published at **Quantum Tunnel Website**.*