EXPEDIA GROUP TECHNOLOGY — DATA

Drone Fly — Decoupling Event Listeners from the Hive Metastore

Deploy Hive metastore listeners outside Hive’s context to improve metastore reliability

Abhimanyu Gupta
Expedia Group Technology

--

Logo of Drone Fly where a bee with an elephant face wrapped in letter D is sending packets to a drone fly on top of letter Y.
Drone-Fly Logo

We, the Expedia Groupᵀᴹ Data Platform Team, are building the next-gen petabyte-scale data lake. This next stage in the evolution of our data lake is based on our Apiary data lake pattern and utilizes a number of our open-source components like Waggle-Dance, Circus-Train, etc. A Hive metastore (HMS) proxied by the Waggle Dance service is usually the first point of contact for a user query to discover and analyze data. That makes the Hive metastore a critical piece of infrastructure.

We utilize a number of Hive metastore listeners that are installed in the Hive metastore to enable a variety of event-based use cases such as Shunting Yard, Cloverleaf, Beekeeper, Ranger policies etc. Some of the open-source listeners that we use are:

The Problem

Over the last couple of years, we received an increasing number of requests to enable more and more event-driven use-cases. These require us to install and maintain a growing list of HMS listeners in the Hive metastore. Some of these listeners are provided to us by third parties to enable HMS integration with their tools. These perform operations like sending events to external messages buses, connecting with databases, calling third-party APIs etc. More and more processing is added to these listeners to address various business use cases.

Given the critical nature of the Hive metastore in our data lakes, this tight coupling is problematic:

  • As more listeners are installed, the metastore will increasingly spend more time notifying each listener as it is done sequentially. This could cause unnecessary load on your metastore or in the worst case, it could take down the entire metastore (e.g. by running out of memory, thread starvation etc.)
  • If there is a bug in one of the installed listeners, it will impact the metastore and other listeners might not be notified because of the sequential nature of the notification logic.
  • When installing a new listener, we need to make changes in the metastore docker image and the container deployment requires downtime.

Introducing Drone Fly

To address the above issues and to make the onboarding process for a new listener robust, we developed a service called Drone Fly.

Drone Fly is a distributed Hive metastore events forwarder service that allows users to deploy metastore listeners outside the Hive metastore service. With Drone Fly, you need to install just one listener in your metastore. i.e. the open-source Apiary-Kafka-Listener. You can then install one or more of your listeners in Drone Fly’s virtual Hive context by providing them on Drone Fly’s classpath.

Trivia: We named it Drone Fly because it mimics the Hive metastore context just like in nature, a Drone Fly mimics a Honey Bee :)

Let us see how your event-driven data lake architecture changes with Drone Fly.

Architecture without Drone Fly

The diagram below shows a typical Hive metastore setup without using Drone Fly. In this example, there are several HiveMetastoreListenersinstalled which send Hive events to other systems like Apache Atlas, AWS SNS, Apache Kafka and other custom implementations.

A block diagram showing listeners deployed in Hive Metastore and sending events to Apache Atlas, AWS SNS, Kafka etc.
Architecture without Drone Fly

Architecture with Drone Fly

With Drone Fly, the setup is modified as shown in the diagram below. The only listener installed in the Hive metastore context is the Apiary Kafka Listener. This forwards Hive metastore events on to Kafka from which Drone Fly can retrieve them. The other listeners are moved out into separate contexts and receive messages from Drone Fly which forwards them on as if they were Hive metastore events so the listener code doesn’t need to change at all.

Block diagram showing listeners deployed within Drone Fly context and forwarding events to Apache Atlas, AWS SNS etc.
Architecture with Drone Fly

Deploying Drone Fly

Deploying Drone Fly is straightforward. You can run it as a standalone Java service or as a Docker container on Kubernetes using the Drone Fly base image. You will also need to install the Apiary Kafka Listener in the Hive metastore so that Drone Fly can consume metastore events emitted from it. Please refer to the documentation for detailed installation steps.

We have also open-sourced an apiary-drone-fly Terraform module to simplify installation. This repository contains all the Terraform scripts needed to spin up Drone Fly on Kubernetes.

Conclusion

With Drone Fly, your Hive metastore is completely decoupled from your listeners. To achieve even further decoupling, Drone Fly can be set up to run in dockerized containers where each instance is initiated with one listener. This way, even if one of your listeners goes down, other listeners can continue to function without interruption. We follow this pattern for our Expedia Group data lake and it provides us with the flexibility to deploy multiple listeners without having to worry about metastore performance. It also streamlines the onboarding of new listeners to our data lake.

Please take a look at the Drone Fly repo for more info on getting started.

Thanks for reading!

--

--