[WEEK VII] Prediction Of Real Estate Price

Batuhan Ündar
bbm406f18
Published in
3 min readJan 14, 2019

Team Members: Ali Batuhan ÜNDAR, Enes Koçak, Muhammed İkbal Arslan

Oh, we meet again. This is the last one, how do you feel? Sad? It’s okay, every good thing should come to an end at some point.

source: http://www.scoopnest.com

This week I will be talking about last of our efforts. I don’t wanna spoil too much so , thankfully, it will be brief.

Data & other tales…

Like we said last week, we decided to stick to our own collected data. That of course brings whole lot of pain with it.

There is something wrong here but I can’t really put my finger on it… source: can you guess it?

As you may notice from our sample image above, it is not what exactly one would call “clean”.

But nonetheless, we trained a CNN using this data. Results were, how to say, not really surprising.

Crazy graph rave party…

Here is the first of the two example results gathered from two different training sessions.

Training results are OK-ish but, oh god the test ones…

The model above trained using kitchen images. We created labels based on the price. After squashing prices between 1 and “label count” we, then, rounded them up. There is an obvious balance problem with this since most of the data gathered on first two labels while the remaining data scattered towards last.

The overachiever

The below one is combination of method above with grayscale images and smaller sample size.

Don’t mind the mess, it’s pyplot being silly.

Yes, I did not edit the image. It was like this when it came out. The issue pretty similar to the previous one but now it is exaggerated because of the low epoch count and low sample size. There is an obvious lack of learning here. Also notice high accuracy and high loss values compared to the others. Perhaps unintentionally we removed some noise…

The new guy

This one is fresh from the oven (Oven, because my GPU is currently hot enough to bake some cake).

Flat-line. It’s dead, captain

This one is different than the other in single way. We labeled it statically. We choose a highest price statically (2M in this case) and then we split the labels between 0 and this number. For example:

For label count of 20, the increment between two classes is 100K. Data labeled using their prices directly.

If the price is above 2M they are just added to the last label. Meaning last label is for houses with prices 2M and above.

As you can see, while losses show some learning, accuracies tells a different story.

GT920M 2GB vs VGG16

We did few more tests with similar results, but in the end there is an obvious limitation of time. The trainings above take average of 7–8 hours on GPU and even longer on CPU. First example took 10 hours to train.

Aftermath… source: 4chan

Then, what?

We still haven’t decided what to do next.We might actually build the neural network we were talking about, even if the results aren’t great. Or perhaps we stick to our continuous data. You will be hearing our decision on presentation day.

print(“done”)

--

--