How Agoda Builds Efficient Data Warehouse with Vertica

Published in

Agoda Engineering & Design

8 min readJun 16, 2022

Companies have access to more data than ever in today’s digital economy. This data serves as a foundation for making critical business decisions. At Agoda, it is important for us to be able to analyze enormous amounts of data in real-time and extract significant value from it. This ability is critical for us to succeed and remain valuable in a competitive business world.

But how do we store this data?

All of our analytical data is ultimately stored and processed on Hadoop. However, for certain business-critical dashboards and reports, the response times of querying directly against Hadoop leave much to be desired. For these types of dashboards/reports, we need a faster, more responsive database to store a critical subset of data for business analytical purposes. The database that fulfills this role is Vertica, and within Agoda, we call this database Agoda DW (DW being short for Data Warehouse).

What is Vertica?

Vertica Analytics Platform is a SQL analytics solution that combines the power of a high-performance, massively parallel processing SQL query engine with advanced analytics and machine learning. It can manage enormous datasets too large for conventional databases to handle.

Why we choose Vertica for Agoda DW

Vertica is one of the most ideal tools for advanced data analytic workflows. The following are the reasons we choose Vertica:

Columnar Storage: it increases query performance better than standard row-based storage systems because it reads just the columns referenced by the query rather than scanning the entire table. This reduces query times from hours to minutes or minutes to seconds.
Advanced Compression: it uses powerful compression methods to save up to 90% of disk space while improving speed by decreasing costly I/O.
Optimized Projections: it stores data in a format and structure optimized to improve performance on the most frequently run analytics.
Integration with popular tools: it offers a robust and sophisticated SQL that supports various business intelligence, data processing, and analytics tools.

Vertica Enterprise Mode

Vertica Enterprise Mode is the default database mode in Vertica. This mode allows the data in your database to spread in Read Optimized Store (ROS) containers across all nodes in your cluster. In most cases, queries act on data that is kept locally.

It is excellent for continuous workloads on data sets that include both “hot” (data saved in the database) and “cold” (data not stored in the database). As a result, it does not require any additional hardware besides the database servers.

Implementing Vertica on Agoda DW

When we first started using Vertica, we designed a single cluster with 16 nodes. This cluster performed all data ingestion and querying operations. However, performance was not optimal. Therefore, we decided to redesign it with two clusters of eight nodes each.

Agoda Vertica Enterprise Architecture

Ingestion cluster is for data ingestion and data processing operations.

Query cluster is for queries only, i.e., report, dashboard, and user queries.

The first cluster handled ingestion, and the second handled queries only. The purpose of the Query cluster is to help to alleviate the load of the Processing cluster and isolate writes from impacting reads and vice versa. To prepare data on the Query cluster, the Copycluster job begins synchronizing data from the Ingestion cluster to the Query cluster.

This process requires downtime in the Query cluster, and we direct all requests during that time to the Ingestion cluster during this data loading phase. Once the copy cluster job is completed, we direct all connections back to the Query cluster.

This helps us maximize the query cluster’s performance, albeit with a tradeoff of slower performance during the data loading phase. To mitigate this downside, the data loading process (from Ingestion to Query cluster) typically happens during off-peak hours.

With this new setup, we migrated legacy OLAP cube queries (inherited from our original first-generation Data Warehouse implementation backed by MSSQL), various report dashboards, and user queries to Vertica.

However, although we had separated the Reads and the Writes into two separate clusters, we still experienced performance issues because we were not fully utilizing the strengths of Vertica. Vertica is a columnar database designed for high-volume query processing; in contrast, we were utilizing traditional row-based approaches. So we redesigned our data flow by making several changes.

We moved data ingestion from streamed Vertica row-based insert, update, and delete operations to batched uploads of processed parquet files. Hadoop jobs prep the parquet files before being bulk loaded into Vertica via Vertica’s Copy Command for files.
We utilized Vertica’s database designer tool to design better optimized Vertica Projections that vastly improved query response times and increased the amount of data we can query.

After initially starting with eight nodes per cluster, we increased the node capacity to 16 nodes per cluster and again to 25 nodes per cluster to improve performance and support a higher volume of data.

Pain Points With Vertica Enterprise Mode

Although our setup worked well for a while, several downsides still existed.

Long node downtime recovery: whenever there are Vertica nodes offline, performance reduces by 50%, and when nodes do come back up, data recovery takes a long time.
CopyCluster Operation: We used Copycluster daily to copy data from our Ingestion cluster to the Query cluster. The Query cluster shutdowns during Copy cluster operations and all connections must be redirected to the Ingestion cluster. This causes the performance of the Ingestion cluster to drop for around 2 hours 30 minutes each day, with any failure extending this duration further.
Scalability is expensive: When adding new nodes, the existing data must be re-segmented (rebalanced) to be distributed among the extra nodes when scaling the cluster. Vertica must physically move data from old to new nodes by re-segmenting data, which is time-consuming and impacts existing cluster performance.

Then, in 2018, Vertica officially launched the EON mode that largely addressed the pain points listed above.

Introducing Vertica’s EON mode

In EON Mode, Vertica computational resources are separated from the shared communal storage layer (in our case, an Object Store). This separation meant that we could now decouple the query computation layer from the data storage component and scale both of them independently, depending on our requirements.

So in our case, we can, for example, increase the number of Vertica nodes without re-loading/re-processing any of the processed data that were already loaded in our share communal object store.

Why we decided to switch from Enterprise Mode to Eon Mode

We started using the EON mode in early 2021. Before moving to the EON mode, we tested its performance and saw how it solved the pain points and limitations of using the previous Enterprise Mode.

We noticed the following improvements:

Faster Node Recovery: EON mode takes significantly less time to recover from node failure scenarios.
Eliminate the need for Copy Cluster Command: In our previous setup, Agoda had to maintain an Ingestion and Query cluster while periodically copying data from the Ingestion Cluster to the Query Cluster. With Vertica Eon Mode, this is no longer required. Although we still maintain two separate Vertica Clusters to isolate the Ingestion and Query use cases, both clusters now write and read from the same shared communal storage, respectively, eliminating the need for the Copy Cluster Command between them.
Ease of Scalability: EON makes it easier to scale out to meet changing workloads. It introduces the concept of configurable Shards for their communal shared store. These shards allow for more granularity compared to entire nodes in the old Vertica enterprise architecture while also allowing us to size them appropriately for our scalability/re-balancing requirements.

Agoda DW on EON Mode

With Vertica Eon mode, we decided to retain having two clusters (the correct terminology for Eon mode is sub-clusters), one for ingestion and one for queries, as we liked that the setup continues to allow us to scale out writes and reads independently. Having two sub-clusters also ensures that they act as redundancy for each other if either goes down.

More importantly, because Agoda is a data-driven company, Vertica Eon Mode allows us to scale up new sub-clusters on-demand to handle new use cases or to help alleviate query load during peak hours when our employees are trying to get answers to their business-related questions, as well as scale them down outside of working hours.

Eon Mode Upgrades still require downtime.

We’ve identified that upgrading to newer versions for Vertica Eon still significantly degrades cluster performance to the point where it is akin to having downtime of around 2 hours. This downtime affects all our processes, including reports, cube processing, and the dashboard.

We plan to resolve this issue by having Vertica Eon Clusters in two of our on-premise Data Centers. The second data center would also act as our Disaster Recovery Cluster for our data. This will also allow us to shift queries between the data center whenever one of them is down for maintenance or upgrades resolving our downtime problem.

What’s Next?

Vertica has served us well over the years, and we are glad to have benefitted from Vertica features such as Eon mode. That being said, the current implementation of Agoda DW on Vertica Eon mode will certainly not be the final iteration of how we aim to deliver fast, reliable and responsive data to serve our business analytics needs. We are constantly looking for new technologies and architecture to improve Agoda’s overall employees’ experience when interacting with data.

Acknowledgements

Big thanks to kajohnsak thaiwsawang and Itipong Chewinpipat for sharing the special DW stories. Atcha Srisittichaikul and Chathawee Luangmanotham for writing the content and finally, Shaun Sit and Janet John for helping review the content.

Helpful Links