Boston House Price prediction using ML

Utsav Jivani
The Startup
Published in
4 min readAug 14, 2020

In this post, we will perform data analysis in python on the Boston house price dataset. Before getting started it is inevitable to understand the data. So, Let’s understand the data first.

    - CRIM     per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT percentage lower status of the population
- MEDV Median value of owner-occupied homes in USD 1000's
- TAX full-value property-tax rate per USD 10,000

First, let's understand what we are going to calculate in this data. We have to understand that which are dependent value and which is independent values. So, here we can see that "MEDV" is a dependent value because it contains the value of the median value of owner-occupied homes. This means that this value depends on other factors like RM, LSTAT, TAX, AGE. Because these factors are the main amongst the other which are directly proportional to the MEDV, which helps to find the price of houses.

Let's get started.....

1. Import the libraries which we needed to install

Importing the libraries

2. Calling the dataset and convert it into a pandas data frame

After calling the dataset from sklearn.datasets we have to store it into variable Boston. This dataset has 2 value factors like data and target, which are in the simple form. So, we have to convert it into a python data frame using the pandas library. Here, we convert the array into a data frame using pd.DataFrame() method.

creating a data frame

Here, you can see that data and target data frames are in separate tabular form so we have to concatenate it and make it one using concat() method.

Concatenation of a data frame

Now let’s remove the unwanted columns from the data frame.

Removing columns

3.1 Plot the heatmap to see the correlation of the data

Herewith the help of heatmap we can check the correlation on the graph. We are using the seaborn library for visualization. Have a look at the seaborn library in case you are not familiar with it. In this correlation data, we have used a seaborn heatmap.

Creating Heatmap

3.2 Pairplot to visualize

Here we will see every single detail of correlation on scatter and bar plot. With the help of seaborn pairplot we have plotted the data.

4. Regression plot

What we have seen earlier that can be performed on multiple values but the regression plot always has one independent value and one dependent value. So, here we plot the linear regression plot using regplot. The purple colour linear graph is between RM as independent value and MEDV as a dependent value. In the green colour graph, LSTAT values are as independent value and MEDV values are as dependent value.

Here, we have used the subplots() method of matplotlib.pyplot to plot more than one plot at a time. To show data on the regression plot we have used regplot function of a seaborn library.

Plotting values on regression plot

5. Create the linear model

We can create a simple model but here I created a function called “linear_model” which will calculate the linear model, prediction of the value and mean square error.

Linear model function
Getting value using the function

From the result, we can see that mean_square_error value of LSTAT is slightly lower than RM. So we can say that whenever value provided with the LSTAT the predicted value will show more accuracy.

6. Let’s take input from the user

In this section, we will take values from user and predict the value of house prices.

Dynamic input
Output of the user values

Fill free to reach me at jivaniutsav007@gmail.com.

Thank You.

--

--

Utsav Jivani
The Startup

Currently, pursuing a master’s in computer science from Lakehead University. Deep learning is my profession.