Predict Apartment Interiors Online : AI for Real Estate

Raja Dev
5 min readSep 21, 2021

--

Looking behind the data with Keras DNN, what could be the actual state of interiors, differing with the details published in the Ad?

Motivation

Remember the last time you’ve taken an apartment on rent. Interiors play a vital role in bringing tenants faster to the apartment.

Image by Author

Tenants would want the interiors which are designed for their life style. But these details are often left blank or incorrectly specified in the advertisements due to various reasons like technical, misinterpretation etc.

Creating a comprehensive and correct classification of the Interiors could bring in the following advantages:

· Data Standardization across the organization for the Ad Publisher. It removes the owner bias from the system and replaces the missing values with the predicted values.

· Tenants can take better decisions as the classification is standard and more reliable making the comparative analysis much accurate.

· Apartment owners can see their Ad reaching more viewers and get better responses. Otherwise, the Ads with missing values may automatically filtered out by the viewers.

This article walks you through an example, so that you can get hands-on with Deep Neural Networks in a very simple way.

Image by Author

Data Source

Download the data set apartment-rental-offers-in-germany from Kaggle.

It comprises the details of 268k apartments in Germany, collected over a period of 3 years, with 49 features.

The volume and variety of the data is good for building an intelligent Deep Learning Neural Network that can learn independently from each feature.

Feature Reduction

To keep the analysis simple, let’s reduce the input features from 48 to 5.

These 5 features are those which we think could be most related to the Interior Design. Our target feature is ‘Interior Quality’.

After loading the csv file and filtering out the features, this is how our data frame looked like:

Image by Author

Data Inspection

Our aim is to define a Neural Network that learns from these 5 features and predicts the target feature.

Interior Quality has 5 possible values. Four Domain values: normal, simple, sophisticated and luxury. One to represent missing value ‘NA’.

Let’s further simplify the target feature for the sake of this discussion. Reduce the possible values to 2. ‘luxury’ is translated to 1 and all other values are translated to 0.

Problem Statement

The trained model should take {region, yearRange, noRooms, totalRent, livingSpace} as input values and tell what is the probability of that apartment having a luxury interior.

Data Preparation

Translate the Pandas Dataframe to a Tensor Dataset. Data Frame is good for exploration and analysis by humans. But, to train a Network Model we need the data to be supplied in the form of Tensor Data Sets.

These Data Sets have several built in transformation functions — like map, reduce and filters, which will be used by the Models while building the networks.

Split the Dataset to Training (80%) and Test datasets (20%). Further split the training dataset to training (80%) and validation (20%) datasets.

Image by Author

Input Layer:

With the data prepared, now it’s the time to define our input layer.

The training dataset has 4 numeric features: [yearRange, no of rooms, totalRent and livingSpace]and one categorical feature - region.

Neural Networks learn independently from each feature. Normalize each numeric value, encode the outcome and add it as a node to the Input Layer.

Categorical feature has to be handled a little different. Encode the values to a finite number of tokens. Here, the categorical feature ‘Region’ may have several distinct values. But we ask the encoder to translate all of them to not more than 5 tokens. Now encode each token and add it to the Input Layer.

Image by Author

Observer that, the Input Layer now contains 9 nodes. Four nodes from 4 numeric features and five nodes from 1 categorical feature.

Hidden Layers:

As we define, Keras builds these layers internally, transforms the data and calculates the weights for each connection.

Layer Architecture:

Image by Author

Hidden Layer 1: Define a Dense layer of 32 units. It performs a dot product of (9 * 32). Nine from it’s previous layer (Input Layer with 9 nodes). Weights are calculated by Keras.

Hidden Layer 2: Define a Dropout layer of 32 units with 50% drop out. It performs a dot product of (32*32) and recalculates the connection weightages. Drops out 50% of these connections, which could potentially cause overfitting problem.

Output Layer:

Now the Dropout layer has 32 nodes with (32*32)*0.5 connections.

Define a Dense layer with 1 unit as its successor. This layer learns from all 32 nodes of the Dropout layer and generates a value.

The value in this Single Node is the output, the network generates for a given combination of values in the Input Layer.

Verify Model Definition:

Plotting the above model would generate the following transformation diagram.

The data flow in this model agrees with our plan described in the Layer Architecture Diagram above.

Run the Model

Running the model and validating the results has yielded a 94.5% accuracy in our lab. These results are highly encouraging and applaudable.

We’ve a good indication that the Network Trained is highly intelligent and classifying the input with excellent accuracy. However, we need to validate the model with some new input, which is not part of the validation dataset.

New Data Validation :

Have validated the model by providing the following samples as inputs.

Sample1 = { ‘region’: ‘Nordrhein_Westfalen’, ‘yearRange’: 8, ‘noRooms’:3, ‘totalRent’:2205, ‘livingSpace’:46 }

Sample2 = { ‘region’: ‘Sachsen’, ‘yearRange’: 9, ‘noRooms’:3, ‘totalRent’:1300, ‘livingSpace’:83 }

Result:

Sample 1, has a 34.8 percent probability of having a luxury interior. Sample 2, has a 17.4 percent probability of having a luxury interior.

Cross Check:

This seems to be true. When cross verified in the raw data, the apartments similar to Sample 1 are more with luxury interiors, compared to the apartments similar to Sample 2.

Conclusion

  • The selected features have a good correlation with the target variable.
  • The Network Model was able to learn very well from the datasets.
  • The predictions are reliable with 94 percent accuracy.

--

--

Raja Dev

data scientist, engineer, programmer, architect, love to write stories of connecting science to business. like to encourage newcomers and enthusiastic authors.