Introducing Wormhole: Dockerized Presto & Alluxio setups for blazing fast analytics

This blog introduces Wormhole — open source Dockerized solution for deploying Presto & Alluxio clusters for blazing fast analytics on file system(we use S3, GCS, OSS). When it comes to analytics, generally people are hands-on in writing SQL queries and love to analyse data which resides in a warehouse(e.g. MySQL database). But as data grows, these stores start failing and hence arises a need for getting the faster results in same or less time frame. This can be solved by distributed computing and Presto is designed for that. When attached with Alluxio, it works even more faster. That’s what Wormhole is all about.

Here is the high level architecture diagram of solution:

Wormhole Architecture

Let us explain each component in the order which they should be setup in wormhole:

Apart from above components, we require a Zookeeper quorum setup which is required for making Alluxio master and Presto coordinator highly available(HA). For complete documentation on setup, please refer here.

Time for some action

Now since we have setup Presto on top of Alluxio, but how to make it available for everyone to use? So the answer can be some other tools like Metabase, which provide connectivity to Presto. Just we needed to add the appropriate configurations and it works like a charm for all sorts of analysis.

Presto and Alluxio also provides UI to track the current state of things and that helps a lot.

Alluxio monitor UI
Presto monitor UI

Next Plans

Next focus will be mostly on making the solution self-serve through a user interface and make it self scalable(possibility of deployment on Kubernetes).

--

--

DataEngineer | Open Source Enthusiast | Traveller

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store