Renting a place to live in Porto is hard these days.
This all started a few months ago when me and my girlfriend first started to search for a nice apartment near Porto to start our lives together. We had some demands regarding the house: close to the subway, fairly close to the city center, a garage, … And of course, a good price.
Everyday we went through the routine of checking multiple classified advertisements websites, looking for apartments that matched our criteria. When we found a posting that we liked, we then rushed to call to schedule a visit. And, unfortunately, the outcome would be that the house was already rented or reserved. Generally, good houses would be rent on the same day of the advertisement posting or the first day that allowed visits. We started checking the same websites everyday, first thing in the morning. But still, the outcome would be the same every time. Finding a nice place to live started to look like an impossible task. We needed to change tactics.
There must be a better way
I am a software engineer, I am used to make the impossible… possible. We just need to identify our problem, think of a possible solution and execute. If we fail, try again. That is pretty much what I do every single day, why not apply it to my house search.
Our problem was pretty clear, other people were being faster checking advertisements and contacting the ad poster than I was. So I decided to become the faster shooter in the west.
What if I had some software to check the websites that I usually go through everyday, find the new postings as they are posted and notify me. If I find anything that I find interesting, I can contact the advertisement poster as soon as possible.
I needed to put plan in motion but I did not want to waste too much time on something that might not work as I intended. I had the perfect tools for the job: Elixir, the Phoenix framework and Facebook Messenger chat bots. My plan was to create a simple web scrapper to go through the classified advertisements websites, get the metadata for the advertisements matching my search criteria, store them, get the new ones and notify me through a chat bot in Facebook Messenger. I knew that with the available tools I had, I could put something together in just a couple of hours.
Putting the plan in motion
First step, creating a new project. I decided to create Phoenix project as an umbrella project. An umbrella in Elixir is just a way of organizing a project into different standalone modules that depend on each other. This way, it is pretty straightforward to use parts of the project in other applications. It is also a very neat way of separating the components of your application into very organized, reusable and easy to understand modules.
mix phx.new --umbrella rent_bot
Phoenix is pretty cool and gives us a ready to use project with everything we need, from the basic functionality of a web application of receiving and responding to requests, database configuration, unit tests and some documentation.
Crunching HTML all day long
The real work started now. I decided to start by the web scraping of one of the classified advertisements websites and see how much effort was needed to get the metadata that I need.
First of all, I needed to know how could I reliably get the page with advertisements with the all of the filters for my search and also how I could programatically navigate through the multiples pages of results in the search of this sites. It turns out that, generally, when you make a search all the filters and the page for the pagination are on the URL as parameters. That way, I just needed to get the URL with the parameters for the filters I wanted and the page I could increment until I wouldn’t get any result.
Next, I needed to process the HTML of that page. I did a quick search for HTML parsing libraries for Elixir and I found Floki. Floki parses the given HTML and allows me to search for the desired DOM elements using regular CSS selectors. Just what I needed!
I had an HTML parsing library, but Floki is only just that. You still need to get the HTML and pass it to the library. For that I used the popular HTTPoison library that allows you to make HTTP requests in Elixir.
With every tool I needed, I just needed to write some functions make the HTTP request to the website I wanted, pass the HTML from the response to Floki and find the elements of every advertisement in the page and in each one of those elements, get the metadata. It turns out it is pretty simple to do all of that with the power of Elixir and the available libraries.
It almost looks like pseudo-code but Elixir gives an incentive to write small and descriptive functions that allow everyone to quickly understand the basic logic of the application. In this case we are creating a function called
import that a page number as argument. In the first line of the function we are creating a string with the base URL and the page number. Then, using the awesome pipe operator, we first give the URL we created to a function called
get_page_html. We don’t know how that functions is implemented, but we can make a pretty good guess that that function makes the HTTP request to get the HTML of the page in that URL.
It turns out that it does just that! In two lines of code! That functions returns the body of the request, in this case is the HTML of the requested page.
The next function in the pipe is the
get_dom_elements. Also, just by looking the at the function name, we can guess that in that function the HTML will be parsed and search for the target elements that match our CSS selectors. Remember that the input of this function in the output of the previous function in the pipe operator, the
get_page_html function that returned the HTML of the page.
How cool is this? You can parse and get the DOM elements you need in one line of code!
The last function in the pipe is the
extract_metadata function. And you guess it, it process the DOM elements found and somehow extracts the metadata.
This function looks more complex but after you analyze it it turns out to be much more simple than it looks. We receive a list of DOM elements from the previous function (the
And that is it, we have a list of houses to rent!
The beauty of all the code I did so far is that with some minor changes to the elements to search in the DOM, everything else is the same for other websites I want to search.
With that, the next step is to find a way to schedule a task to run regularly to check for new entries in all of the websites. With another quick search for an Elixir library for task scheduling, I found Quantum. Quantum allows me to schedule recurrent tasks using Cron-like notation.
In my configuration file, I’ve created a new task scheduler for each of the provider websites I wanted to search for advertisements. Each task is configured to every five minutes, run the function
import_ads of the given module (
RendBotWeb.Tasks.XYZ) with the given arguments (the list
 argument is the page number to start the search).
As you can see, even this function is pretty simple after you analyze it. We start by doing some logging to the console to notify the start of the task. We then call our import function that we built before specifying the page number we want to search. Then we have a condition, if the returned list from that function is an empty list, we stop the process. Otherwise, we take the return list and give it to a
process_entries function. Again, naming is important to make your code readable. Just by looking at the code, you can guess that the
process_entries function will do some processing on our list and return a new list with just the entries that are new.
And of course it does just that! It maps over the entries list and passes each entry to the
insert_entry function. That function takes the entry and first queries the database to see if that entry is already there. If it is, it return
nil, otherwise inserts the entry in the database and returns it. The final step of the
process_entries function is the a filter for the
nil values in the list.
Going back to the
import_ads function, we then see if the new entries list has more than zero elements, we call a
notify_subscribers function and continue the importing process to the next page of results. Wait, who are these subscribers anyway?
Finally, a Facebook Messenger chat bot
Creating a Facebook Messenger bot is pretty straightforward. On your application, you just need two endpoints. An endpoint to receive a GET request to validate the application and another endpoint to receive POST requests with the messages for your bot.
Creating and validating your application in the Facebook Developers platform is also pretty easy and painless.
The validation endpoint is pretty easy to implement. You need a random string of your choosing to be your verification token. In the Facebook Developer App settings, you put that verification token. When asked to validate your application, Facebook will send a GET request with your verification token and some other parameters. One of those parameters is the
hub.challenge parameter. Your endpoint should return the value of that parameter as response.
As everything else in Elixir, it is pretty easy to do that.
As you can see in this code snippet, we use pattern matching right on the function parameters to get just what we need from the given parameters. On the first line of the function, we get the verify token from the project configuration file. Using that, we compare it with the verify token sent on the request and if it matches, we respond with the challenge parameter value and a 200 status code. Otherwise, the request is unauthorized. Pretty neat!
All that is left is to handle the POST requests with the messages incoming for the bot. In the context of our application, the bot is just listening for some specific test to register the user as a subscriber. This way, we can store a unique ID created for the user that is interacting with our bot and use it to send messages back at any time.
We just iterate over each entry in the entries list that is part of the request parameters. For each message we just check if the text is our super secret string that subscribes a user to our platform, and if it match we save the
sender_psid that identifies this user. Other messages are ignored and receive a generic message as response.
So with the IDs of the users interested in the notifications, we go back to our
notify_subscribers function that we saw in the import function.
Again, it is just simple iteration through the subscribers and for each one send a card message with the given ad details.
And boom, you have a complete system working!
The next thing I did was to create a release using Distillery, create a Docker image with my application and set it to run on an EC2 machine on AWS. All of this in the two hours after dinner on the day that I decided to try something new.
A few days passed and after several messages from our bot, we found an advertisement for an apartment that looked pretty good and I had all of the features that we wanted. The ad was posted just a few minutes ago and we decided to call. Because of our amazing tool we were the first ones to call, schedule a visit and rent the house.
And that is how you find a new place to live the smart way!
All the code I used is available in GitHub. Feel free to send me questions, issues or pull requests.