Building A Vacation Recommender (Part 1)

Mario Brendel
4 min readFeb 8, 2019

--

Since I love vacations I thought it might be a good idea to build an application around this topic. I would like to share this journey with you and show you every important part and where I got stuck :). Even if no one reads these blog entries I can hopefully look back on the journey with a smile on my face. So without any further ado: Building A Vacation Recommender.

What is the goal?

Hmmm so the topic itself is pretty vague but my first goal would be a vacation on the maledives with my beautiful wife and son. But what does this have to do with the vacation recommender you may ask.

Well what I would like to achieve is a tool that can show me:

  1. Current offers
  2. How good is the offer compared to data collected in the past
  3. What offers may occur in the feature
  4. When will be the cheapest flights
  5. Graphical analyses
  6. Recommendations based on the points given above

Most of these points can get arbitrarily complex. So lets see where the journey will take us :).

Where to start?

I would say we just start chronological with the task “Current offers” and see where this takes us. I’ve wrestled a little bit with myself(5 minutes) on what language to use for the actual web scraping and I’ve decided to use Java for that matter. The reason is that in my main job I’m using VueJS(Frontend) and SAP/ABAP(Backend) and SAP is not necessarily sexy… So I thought I take another language that I have mastered and see where this leads us :).

Lets go!

So enough talking. Lets code.

For the actual web scraping I would like to use Selenium in combination with Java. Furthermore I will also use maven and the pom.xml(located at the project root) will be the first file that we are looking at:

...
<dependencies>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.141.59</version>
</dependency>
</dependencies>

<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
...

As you can see nothing to fancy. We will add the selenium dependency to use selenium and we will also declare Java8 as our compiler source and target. You can of course use a higher Java version.

Project Structure

Our project will be fairly simple for now. I’m pretty sure I’m going to overengineer this in the future (Adding JavaEE on top, Deployment Pipelines to AWS, Docker etc.) but this isn’t a topic for now. At the moment it is much more important to solve our current problems.

Our Main.java is currently looking like this:

import crawlers.SecretEscapesCrawler;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

import java.net.URL;
import java.util.Objects;

public class Main {

private static final URL CHROME_DRIVER = Main.class.getClassLoader().getResource("chromedriver_windows.exe");

public static void main(String[] args) {
System.setProperty("webdriver.chrome.driver", Objects.requireNonNull(CHROME_DRIVER).getPath());
WebDriver driver = new ChromeDriver();
SecretEscapesCrawler.getInstance().searchMaledives(driver);
}
}

So lets see what I’m doing here step for step. At first I store the resource path of the chromedriver so that I can actually set the driver within the main method. After that I’m creating a new instance of the driver and pass it on to my first crawler. The SecretEscapesCrawler — huge fan of their site btw :). But before we actually crawl SecretEscapes we first ensure that their robots.txt allow us to do this. Seems good to me. So lets take a look at the SecretEscapesCrawler.java:

public class SecretEscapesCrawler {

private static final String SEARCH_ADDRESS = "https://www.secretescapes.de/search/search?query=malediven";

private static SecretEscapesCrawler sc = new SecretEscapesCrawler();

public static SecretEscapesCrawler getInstance() {
return sc;
}

public void searchMaledives(WebDriver driver) {
driver.get(SEARCH_ADDRESS);
By discountPriceSelector = By.cssSelector(".discount-highlight__price");

WebDriverWait wait = new WebDriverWait(driver, 10);
wait.until(ExpectedConditions.presenceOfElementLocated(discountPriceSelector));
List<WebElement> elements = driver.findElements(discountPriceSelector);

elements.forEach(e -> {
System.out.println(e.getText());
});

driver.close();
}

}

The most important lines are probably:

driver.get(SEARCH_ADDRESS);
By discountPriceSelector = By.cssSelector(".discount-highlight__price");
WebDriverWait wait = new WebDriverWait(driver, 10);
wait.until(ExpectedConditions.presenceOfElementLocated(discountPriceSelector));
List<WebElement> elements = driver.findElements(discountPriceSelector);

This is all the magic you need to get going. The driver.get will start a chrome window and open the defined address. Afterwards we wait for the “.discount-highlight__price” to appear. If it doesn’t appear within 10seconds the WebDriverWait will throw an exception and crash our application. If the element appears we just print the text within the element for now (i.e. 50%).

This will be it for now. Next time we will configure a login so that we can see the actual prices and we will also collect the data in a file or database :).

But why selenium?

Of course I could track the network requests and work with the API of each site but this is way more time consuming. And since the time needed for opening the browser and websites isn’t a problem, I don’t plan to get rid of selenium for this particular use case.

If you have any questions or wishes please feel free to leave a comment :)

--

--