Finding the best house in a city as per your requirements is a challenging task.

Co-author: Soumya Shrivastava

Our EDA and model help a customer to find a house with less effort. The analysis is done on a dataset which has data of Iowa, Ames, USA. This city is developed around the 1840s and our dataset is having the data from the 1870s. This pretty much gives us an idea of buying behaviour of the city.

We observed Fireplace, Kitchen, Garage and Basement are the most considered features while purchasing a house. So, we have generated 4 variables as Fireplace sector, Kitchen sector, Garage sector and Basement sector which gives the ratings on different scales. These variables are created by considering all related columns for each feature into one as a sector with ordinal values. For example, the number of kitchens and Kitchen quality are mapped to Kitchen sector. The mapping of each sector will be discussed in detail below.

Firstly, let us look at the construction pattern of houses (various building types) in the city.

Building Construction vs Years

This graph shows the pattern of house construction from 1872–2011. It can be observed that there is a huge dip and no pattern during the years 1980’s — 2000. The reason behind this is the population growth was very stagnated during these years and hence the constructions were very less. The peak rises were observed when the population has increased significantly and the highest peak was observed before 2011 around 2006–2007 just before the 2008 Recession.

The construction pattern is reflecting the population growth in this state.

Years vs Average Sale Price

The average SalePrice has dropped during 2008 during the recession.

The spread of neighbourhoods on Iowa map

This is the spread of neighbourhoods across the state. Most of the neighbourhoods are populated in and around the Iowa State University and Des Moines, the state capital.

It can also be observed that other areas where the population can be seen is well connected with train stations, highways and airports.

Feature Engineering for obtaining mapped variables

Understanding the sectors

As mentioned above the four sectors of the house which are impacting highly towards the Sales Price of the house are 1. Kitchen Sector 2. Basement Sector 3.Fireplace Sector 4.Garage Sector

Each house is having different columns which give the idea of the number of kitchens, basements, fireplaces and number of cars a garage can accommodate. Also, each of these sectors is given Quality rating viz. Excellent, Good, Typical, Fair, Poor, Not Applicable. The quality rating ‘Not Applicable’ means that the feature is not present in the house.

We have merged these two columns i.e., Number of rooms/ and its Quality under each sector and created 4 columns out of 8 columns.

Here goes the brief explanation of how we have merged each sector and mapped them with rating,

Consider Garage Sector:

Ratings of garage in each house is one of these — Excellent (Ex), Good (Gd), Typical (TA), Fair (FA), Poor (Po), Not Available (NA)

In our dataset, the number of cars a garage can accommodate a house range from 0 to 4.

Mapping each of the value in Number of cars Garage can accommodate with each of the value in Garage Quality.

Value under a number of cars mapped to all quality values

Note: We are not concatenating 4 with NA because NA implies the absence of that particular sector in that house. Hence, we will be concatenating NA with 0 as 0 also implies the absence of that feature in the house.

After mapping, we have created the following variable as a Garage Sector. A rating is given to each value in the mapped variable.

Ratings have been allotted as per the number of cars a garage can accommodate and the quality of Garage. Higher the number of cars a garage can accommodate with better quality, higher would be the overall rating of that sector in a house.

Garage sector ratings

Similarly, other sectors are also mapped with ratings based on the mapped values. The scale varies because the number of mappings varies for each sector.

  1. Basement Sector
Basement sector ratings

Note: In Basement Sector, the basement is given as square feet these values are first binned and labelled from 0–5. Then mapped with basement quality.

There was no basement with Poor (Po) quality, so poor quality is not mapped with any value. Due to this, there is a slight change in the ratings of the mapped data.

2. Kitchen Sector

Kitchen sector ratings

Note: In the Kitchen sector, number of Kitchen a house has is ranging from 0–3 and there was no Kitchen with Poor (Po) quality. So, there is a slight change in the ratings of the mapped values. The rating for Kitchen sector is ranging from 0 to 12.

3. Fireplace Sector

Fireplace sector ratings

Note: In the Fireplace sector, number of Fireplaces in a house is ranging from 0–3. So, there is a slight change in the ratings of the merged data. The Fireplace sector rating is ranging from 0 to 15.

Now let’s understand how every neighbourhood has its preferences for each sector.

Basement sector

Basement sector ratings in selected neighbourhoods

This graph explains the ratings preferred for Basement sector in different neighbourhoods. We have considered only 9 neighbourhoods out of 25. These neighbourhoods have average sale price varying from least to highest (98576$ — 335295$). The selected 9 values are representing the highest price medium price and least price.

The Y-axis is having the order from least to the highest average sale price of that neighbourhood.

It can be observed that houses in the least average price category are preferring ratings ranging from 6–8, medium average price preferring from 10–14 and highest average price preferring from 14–19.

This indicates the house in highest average price neighbourhood has basement area labelled from 4 and above (i.e., >3900 Sq. ft) and quality as and above typically average (TA).

Kitchen sector

Kitchen sector ratings in selected neighbourhoods

It can be observed that houses in the least average price category are preferring ratings ranging from 3.5–4.9, medium average price preferring from 5–6.4 and highest average price preferring from 6.5–9.0.

This indicates the house in highest average price neighbourhood has 2 or more kitchens and quality as and above typically average (TA).

Fireplace sector

Fireplace sector ratings in selected neighbourhoods

It can be observed that houses in the least average price category are preferring ratings ranging from 0–3, medium average price preferring from 3.5–7.0 and highest average price preferring from 7.0–10.0.

This indicates the house in the least average price neighbourhood has no fireplace in the house.

Meanwhile, a house in the highest average price has 2 or more fireplaces and quality as and above Fair (Fa).

Garage sector

Garage sector ratings in selected neighbourhoods

It can be observed that houses in the least average price category are preferring ratings ranging from 6–9, medium average price preferring from 12–14 and highest average price preferring from 14–17.

This indicates the house in highest average price neighbourhood has a garage that can accommodate 3 or more cars and quality as and above Good (Gd).

So, it can be concluded that by understanding the preferences of customers for various features and more importantly for the highest valuing sectors sellers can make wise investments in modifying or constructing a house.

While a buyer can understand the budget of a house according to his preference or select a neighbourhood in his budget to purchase a house.

--

--

Shubhasreepv
Finding the best house in a city as per your requirements, is definitely a challenging task.

Data science student at Praxis Business School. An inquisitive person with ability to analyse problems and provide action-oriented results.