Solving the Problem of One Billion Computations
Posted on February 24, 2016 by Toni Marques
Of Skyscanner’s 770 staff, almost half are engineers, and a good number work on our hotels product. You might wonder why we have so many people working on simply showing hotels and their prices. However, there’s much more science than meets the eye in the way the things are displayed on our hotel search pages.
Let’s have a look on hotel found on www.skyscanner.net:
For this hotel, we’ve displayed the top three offers in terms of prices. You can easily click ‘show more prices’ and display more, but the majority of clicks will be on one of the top three offers. As such, order is important… and so is the engineering behind this. We faced a challenge therefore in the criteria for sorting these, and the way in which to save over a billion weights. Here’s how we did it.
For this sorting we use the following criteria: 1. Cheaper offers appear first
2. If there is price parity, the order is decided by a weight computed by an algorithm
This means that we have built an algorithm just for ordering partners when price parity occurs. The most challenging part comes from product requirements. Partners in Skyscanner Hotels products have a weight with the following granularity:
That means that every partner will have a different weight in every market, device and hotel. To an idea of the size of challenge this presents:
We have to build a platform able to save up to 1.6billion weights!
This huge amount of data carries two big problems in the backend development: 1. How to save all this data?
2. How to consume this in real-time?
Storing the data
As I said, we need to save 1.6bn items. Those values should be recomputed every day to ensure they’re correct and up-to-date. We needed a process to execute our algorithm once a day and to do batch writing on a big database.
We needed to be able to store a huge amount of data and to do it as quickly as possible. As this data had to be consumed in real-time, the data model has to be suitable to be easily consumed by the backend. Those are around 150 bytes in size. That meant we needed a database with scalability and good throughput.
Consuming the data
Previously, we had only per-partner and market granularity, so we were able to store those values on-memory, to be immediately consumed in real-time. This was no longer possible because of the volume of data.
It is also worth highlighting that for the website, a single user search means that at least 15 hotels are displayed. All the price parities occurring for those 15 hotels will have to be resolved by asking for the weights. That means that an increase in our traffic would mean a massive consumption of this data. For a common search for 15 hotels we have to resolve around 50 price parities on average.
As such, the system should be able to support a huge amount of queries.
The solution: the Cloud
The main problem arising here is scalability and we felt it was a good idea to tackle it with a Cloud solution and more precisely, Infrastructure as a Service (IaaS). Luckily, our company is moving forward to Amazon Web Services (AWS), so it was the perfect movement for us.
With this solution we are able to cover the technical issues:
- Having all the data into a distributed database with enough throughput.
- Being able to easily duplicate the data through different regions with the infrastructure (not with the code).
- Being able to consume this data in a scalable way by auto-scaling groups. This implies that we will use the needed quantity of servers to process the data that our traffic needs.
Our AWS architecture
Now what we have is our backend consuming a service in the Cloud to get the weights for the needed partners, markets, devices and hotels. The architecture of the whole system has been deployed in AWS, as seen below.
So… Why DynamoDB? There are two main reasons:
- The type of data we want to save is on a key-value storage because we want to save every single item by its partner, device, market and hotel ID.
- The solution provided by AWS for key-value storage is highly scalable and resilient. Scalability is absolutely necessary for us.
How is the backend fed with this data? Well, we have a light API accessing to DynamoDB for the weight retrievals. This API runs on an auto-scaling group of EC2 instances. This means we will be able to upscale the servers if our traffic increases a lot. By setting an auto-scaling policy of CPU we are elastic to traffic changes.
Which regions do we work on? Our backend is getting information from three different regions in AWS: eu-west-1, ap-southeast-1 and ap-northeast-1. There are DynamoDB tables covered with the API on those three regions. The algorithm runs just on eu-west-1 region and the data is duplicated to all the others.
This is just an example of how the Cloud has helped us to address our product requirements. In the era of Big Data, we believe it’s vital that traditional architectures should be rethought in the Cloud.
Learn with us
Take a look at our current job roles available across our 10 global offices.