Archive of stories published by The Hotels.com Technology Blog

Replicating big datasets in the cloud

Circus Train is an open source tool developed by Hotels.com for migrating and replicating very large Apache Hive datasets between clusters and clouds. The tool has become a key component of Expedia Group data platforms, enabling off-premises migrations, hybrid…


Modularising Hive Queries.

As anyone who works with Hive and HQL (or even SQL) will know, monolithic queries can often become very long and quite tedious to read. I have heard horror stories of encounters with queries containing hundreds, even thousands of lines of code. So, the Big Data Platform team at…

These were the top 10 stories published by The Hotels.com Technology Blog; you can also dive into yearly archives: 2017, 2018, and 2019.