Building real-time data products at scale using Tinybird

Jordi Miró Bruix
Hotel Tech Stories
Published in
4 min readMay 4, 2021

In this post we are going to share how the THN team worked with Tinybird to transform our approach to managing vast volumes of data to be able to scale effectively and provide real-time data to our rapidly growing customer base of thousands of hotels globally. To provide a comprehensive overview, we will explain the challenges we were facing, the initial situation, the current solution and finally our plans for the solution in the future.

The challenge

The Hotels Network (THN) is an innovative data and technology platform for the hotel industry. Using data collected from the websites of hotel clients worldwide, THN provides hotels with an ecosystem of tools that help them to grow their direct booking channel, from a benchmarking analytics platform BenchDirect to a Predictive Personalization tool that uses machine learning.

Data is the core of everything we do at THN; it allows us to provide best-in-class solutions for the hotel industry. Today, with more than 10,000 hotels across 100 countries using THN, we are already processing several million data points every single day (and this figure is growing exponentially in line with the rapid growth of our clientbase of hotels).

To continue offering our clients the best, we have to be able to provide up-to-date, accurate data to thousands of hotels, based on millions of website visitors. Providing this statistical data as close to real time as possible is key. Given the rate at which the company is expanding, we needed to review how we do this and focus on delivering real-time data in a way that allows our company to scale.

The initial situation

THN loves MySQL. We have a large cluster of MySQL databases with several Terabytes of data (and growing), heavy load, and many replica servers to allow a fast response within analytic demand. Unfortunately, MySQL is not the best solution for everything. We display a lot of data to our clients in dashboards, which requires a lot of statistical calculations using many different variables. As the number of clients and website visitors grew, we had to find a better way to manage this.

We found ourselves in a situation with a series of PHP crons that aggregated, calculated and generated materialized views every night. THN’s API would feed our products with this data but this process was taking longer and longer as the business grew. Furthermore, we were not completely satisfied with the monitoring of these processes, which generated additional business and tech problems.

Within the data engineering team (kudos to Marc and Nicola), we started to look for alternative solutions: ElasticSearch, Hadoop, ClickHouse, amongst others. After a significant amount of time spent reading, researching, testing and debating, we made the decision to build an MVP using our own ClickHouse, within our existing AWS infrastructure.

Part of the inspiration for the solution we chose came from this blog post from Cloudflare. Based on the insights and learnings from their experience, we started to feed our ClickHouse infrastructure from our API. When doing so, we found several things:

  • It was easier and faster to build these dashboards than with the previous solution.
  • The transition and learning curve from MyQSL to ClickHouse was smoother at SQL level.
  • We could get rid of some components (crons), generating a much more simple architecture.

I had known the founders of Tinybird for many years and decided to call them to see how we could collaborate together. Their team had a lot more experience and knowledge than us about ClickHouse, and the product they were building offered solutions that we could not build ourselves:

  • Modern solutions for versioning and infrastructure deployments
  • Useful (and very cool!) web interface to develop complex queries
  • ClickHouse As A Service / Managed ClickHouse
  • Robust API to integrate our solutions, manage tokens (security access), etc…
  • 24/7 support

The current solution

In order to feed data to Tinybird, we are continually sending CSV files. We generate materialized views in real time without crons and use Tinybird’s API to feed our products. In this way, we can ingest millions of rows of data without any problem and generate the necessary data points for our products.

The future solution

Looking ahead, we aim to continue optimizing the systems and processes we use in order to deliver the best possible product experience to our hotel clients:

  • We are integrating our Kafka data stream into Tinybird to ensure that we have real time data.
  • We are expanding the usage of Tinybird to new products and other departments. For example, our Data Science team will retrieve datasets from Tinybird directly.
  • We are teaching our engineering team to use Tinybird for other use cases, such as creating alerts when data is not received from our customers.

Developing great things, together

How we are working with Tinybird is just one example. At THN, technology is in our DNA and we love to build brand new things to work faster and better. We’ve brought together an awesome team with deep expertise in product design, engineering and data science. If you enjoy working in a fast-paced environment with people who are passionate about innovating by leveraging the most impactful technologies available, why not join our team?

--

--

Jordi Miró Bruix
Hotel Tech Stories

Father (x3), husband, entrepreneur and currently CTO at The Hotels Network. Love tech, business, sports, food, sneakers and music