House Recommendations: Intelligent Recommendations for Nearby Houses

Published in

ScrapeHero

6 min readJul 28, 2023

A recommendation system can suggest things to people based on their preferences and past behavior. These systems are used in various platforms, like Netflix or Amazon, to recommend movies, products, or content that users might like, making their experience more personalized and enjoyable.

Types of recommendation engines

Content-based: The content-based recommendation system leverages user preferences and item attributes, collectively known as “content” to generate personalized recommendations.

By analyzing your features and likes, the system identifies items that align with your interests, providing tailored suggestions based on this analysis.

Collaborative filtering: Collaborative filtering methods refer to the category of recommender systems that primarily rely on historical user interactions with target items.

Consequently, these methods utilize all past data regarding user engagements with the target objects as input to the collaborative filtering system.

Hybrid: As the name suggests, hybrid systems combine both content-based and collaborative filtering techniques. They leverage the strengths of both methods to provide more accurate and diverse recommendations.

By using various approaches, these systems can cater to a broader range of user preferences.

Examples of recommendation systems in real life

Some examples of recommendation systems usage are seen in the following:

Facebook suggests “People You May Know.”
Netflix recommends “Other Movies You May Enjoy.”
Amazon displays “Customers who bought this item also bought …”
Google shows “Visually Similar Images.”
YouTube offers “Recommended Videos.”
LinkedIn presents “Jobs You May Be Interested In.”
Waze guides you with the “Best Route.”
Spotify curates “Best Music” for you.

Getting Started

In this blog, we will delve into an intriguing topic: Leveraging a straightforward yet potent machine learning approach to recommend similar houses within your neighborhood.

Our aim is to explore the fascinating world of house recommendation systems, shedding light on the magic behind these intelligent algorithms that help you discover your dream home just around the corner.

Join us on this exhilarating journey, where we bridge the gap between machine learning and real estate, empowering you to make informed decisions when seeking your ideal abode.

Data

Data is scraped from Realtor website to get property information such as Address, Sqft, Price, Number of Bedrooms, and more from the Realtor property listing page.

If you want to crawl using User Interface (UI), here’s an option for you. ScrapeHero offers ScrapeHero Cloud that has already done most of the work for you. Just pass the URL of the property, It’ll crawl for you.

Want to know more about the data? Click here.

Sample Data:

Columns: _id, address, city, housetype, imageUrl, latitude, longitude, numBedrooms, price, search_results, sourceUrl, sqft, state, zipcode

Requirements:

pip install scikit-learn pandas

Lets Discuss the Code

import pandas as pd

# Read data 
df = pd.read_csv("realtor_data.csv")
df.info()

We have approximate 2.4M records in our dataset.

Now, let’s convert the price which is an object datatype to float and replace the commas.

df["price"] = [float(str(i).replace(",", "")) for i in df["price"]]
df.price = df.price.astype(float)

Let’s head into preprocessing step, but before that lets select only the relevant features for the model. I have done some trial and error and found that these features were giving good results.

df_train = df[["latitude", "longitude", "city", "state", "housetype", "numBedrooms", "price", "sqft"]]

from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()

df_cat = ['city', 'housetype', 'state']

# Encode labels in column
for column in df_cat:
    df_train[column] = label_encoder.fit_transform(df_train[column])

The purpose of LabelEncoder is to transform these categorical labels into numerical values, which can be more easily processed by machine learning algorithms. It assigns a unique integer to each distinct category in the column, effectively converting the categorical data into a numerical representation.

For example, let’s say you have a “city” column with the following categorical values: “New York”, “Los Angeles”, “Chicago”, and “Houston”. After applying LabelEncoder, these city names would be mapped to numerical labels like 0, 1, 2, and 3, respectively.

Normalize Data:

Now let’s normalize the data because it is generally recommended to scale down or normalize the feature values while using the Nearest Neighbors algorithm. Why?

The Nearest Neighbor algorithm determines the “closeness” or similarity between data points based on a distance metric, such as Euclidean distance.
If the features have different scales or units, the feature with a larger scale may dominate the distance calculation.

As a result, the Nearest Neighbor algorithm may end up being biased towards features with higher scales, neglecting the impact of other features with smaller scales.

By scaling the features, you bring them to a similar scale, usually within the range of 0 to 1 or -1 to 1. This normalization process ensures that all features are equally important when calculating distances, preventing any single feature from overshadowing others and making the algorithm more accurate.

# Normalize the numerical features
numerical_features = ['latitude', 'longitude', 'numBedrooms', 'price', 'sqft']

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_train[numerical_features] = scaler.fit_transform(df_train[numerical_features])

The StandardScaler is a preprocessing technique used to scale and standardize numerical features in a dataset. It transforms the data such that it has a mean of 0 and a standard deviation of 1. The purpose of standardizing the features is to bring them to a similar scale, making them comparable and ensuring that they contribute equally to the analysis or machine learning model.

Lets Train the Model

# Fit the Nearest Neighbors model
nn_model = NearestNeighbors(metric='euclidean')
nn_model.fit(df_train)

We will be using euclidean distance because it captures the differences in numerical values and helps compare the similarity between geographical locations and property characteristics. Find out other distance metrics here.

Defining a function to recommend similar houses.

def recommend_similar_houses(query_house_index, num_recommendations=5):
    distances, indices = nn_model.kneighbors(df_train.iloc[query_house_index].values.reshape(1, -1), n_neighbors=num_recommendations+1)
    recommended_indices = indices.squeeze()[1:]
    interested_house  =  df.iloc[[query_house_index]].dropna(axis=1)
    recommended_houses = df.iloc[recommended_indices]

    return interested_house, recommended_houses

Function uses a K-Nearest Neighbors (KNN) model to suggest similar houses based on a given query house. Here's how the function works:

Parameters:

query_house_index: The index of the house in the dataset that serves as the query point. This house is used as the starting point to find similar houses.
num_recommendations: The number of similar houses to recommend. By default, it suggests 5 similar houses, but you can change this value as needed.

Now let’s look at the example usage.

query_house_index = 33  # Index of the query house in the dataframe
interested_house, recommended_houses = recommend_similar_houses(query_house_index)

Interested House:

Recommended House:

Let’s look at another example:

Interested House:

Recommended House:

Great results! 😍

Conclusion

Here, we have discussed a very simple and straightforward approach.
ScrapeHero uses advanced deep-learning techniques to revolutionize the way we discover similar houses in our neighborhood.

Through cutting-edge recommendation system, they have seamlessly merged the power of artificial intelligence with the intricacies of real estate, offering a remarkable solution that simplifies the house-hunting experience like never before.

Hope you learned something new today, Happy Learning!

If you’ve found this article helpful or intriguing, don’t hesitate to give it a clap! As a writer, your feedback helps me understand what resonates with my readers.

Follow ScrapeHero for more insightful content like this. Whether you’re a developer, an entrepreneur, or someone interested in web scraping, machine learning, AI, etc., ScrapeHero has compelling articles that will fascinate you.