Simple Real Estate Market Analysis with Large Language Models and Retrieval Augmented Generation

M. Baddar
BetaFlow
Published in
5 min readJan 16, 2024
image source : https://surveysparrow.com/blog/funny-customer-service-memes/

Overview

Market Analysis is a crucial step for any investor to understand the current market status and projected direction for possible investment opportunity. Real Estate markets are not an exception : individuals and firms targeting a specific market , must conduct a thorough analysis before any investment decision

In real estate markets, like any other market, supply and demand are the major price controlling factor for houses. To get an insight into real estate price movement we need first to have a better understanding of the supply and demand dynamics in that market in a give period of time.

Teaser

For people who are in short in time : this article is about asking this Question

How was the supply and demand situation in the real estate market, in Germany in 2023?

This question will be directed to Chat-GPT and our AI-Powered API AnswerMe , and here are the two results:

If this image makes you curious , then the next few lines are for you.

Asking ChatGPT

Let’s take the German real estate market as use case. We can ask ChatGPT about the latest supply demand dynamics in that market in 2023 to get a flavor on how the prices will move into the upcoming few years :

Given the fact that ChatGPT has been train with data up to 2022, it can’t generate answers about 2023. In normal market conditions, we might live with that as Real estate market is one of the most stable and predictable market sectors.

However , given the recession situation around the world besides the fact of increasing interest rates in EU and USA, we need to embed recent data to understand the most recent market dynamics.

Asking AnswerMe

To solve this problem, please meet AnswerMe an API for Question Answering over Documents power by Large Language Models using Retrieval Augmented Generation.

Sounds like a very long line to follow , right ? let’s dissect it part by part :

  1. Question Answering over Documents : Question Answering using Machine Learning and NLP technologies has been a major hot topic in research an industry. There are many approaches which can be divided into open vs closed domain and open vs closed book approaches.
    Open vs closed domain means whether the model is design to handle all domains well , or it is domain specific (health, legal, economic, financial, scientific, etc… ). Open vs closed books means whether model has access to special document or report or not.
  2. Large Language Models (LLMs): is the technology behind the booming OpenAI ChatGPT technology and other similar ones. It is basically a Generative Model that generates answer based on input text.
  3. Retrieval Augmented Generation : Is an approach for steering LLMs to answer question given a set of documents (Open Book Question Answering ).

Under the Hood

Based in the quick fundamentals that has been introduced , our AnswerMe API typically works as follows :

  1. An API user should first create a Rapid API account , as illustrated here. If you wonder what is Rapid API : it is one of the largest API hubs where developers can have access to thousands of API in different domains like AI, FinTech , Sport etc…

2. The the API client app uploads a document for parsing, chunking , embedding and indexing using the following API call:

import requests

url = "https://answer-me.p.rapidapi.com/psurl"

querystring = {"filename":"<filename>.pdf"}

headers = {
"X-RapidAPI-Key": <API Key>,
"X-RapidAPI-Host": "answer-me.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring)

print(response.json())

The upload API call uses AWS Pre-Signed URIs, which allows loading large file securely. Currently AnswerMe supports PDF formatted files only.

For the purpose of answering this question, we use this Deutsche Bank report about the real estate market in Germany.

3. After waiting for around a minute, the document is chunked into a set of sub-documents , indexed and embedded. These sub-documents are a smaller , usually fixed parts of the documents (A couple of paragraphs or so), which are usually overlapped to enhance the content generation smoothness.

Embedding simply means converting text from textual format to a numerical format that can be processed by computers. For more information about all these operation and others , in the context of Retrieval Augmented Generation , see this wonderful tutorials page

4. Next step is to apply Large Language Models to the processed document, using this API call

import requests

url = "https://answer-me.p.rapidapi.com/answerme"

querystring = {"question":"<Question to be asked>","filename":"<filename>.pdf"}

headers = {
"X-RapidAPI-Key": <API Key>,
"X-RapidAPI-Host": "answer-me.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring)

print(response.json())

This step simply starts generating an answer , using the Large Language Model and the embedded-indexed format of the uploaded document as a kind of Context. It is like saying the model “Hey, please give me an answer inspired by the info you have learned from this document”. In this case the question we send is the one we mentioned above.

Summary

In this article we have shown how to use AnswerMe , to answer questions using Large Language Models and Retrieval Augmented Generation using domain-specific documents.

For more information about how to use our AnswerMe API , check out comprehensive tutorial.

If you need support regarding the API or for customized LLM and Generative Models solution , send us an email

Also Follow us on twitter for more similar LLM and GenAI contents

--

--

M. Baddar
BetaFlow

AI/ML Engineer, with focus on Generative Modeling. The Mission is enabling individuals and SMEs applying this technology to solve real-life problems.