EXPEDIA GROUP TECHNOLOGY — SOFTWARE

Rolling out Blobs to Open Source at Haystack

A coding journey in an open source environment

Vaibhav Sawhney
Expedia Group Technology

--

Haystack logo © 2019 Expedia Group

In this post I introduce new blob storage functionality that Expedia Group™ has added to the Haystack ecosystem. I helped create example repos that demonstrate how to use the new functionality while on a three month job rotation with Team Haystack. Ashish Aggarwal helped me to take a step back from the implementation details and think about how a system should be designed to facilitate future expansion with the least amount of effort.

I started my rotation by creating a sample client-server application to showcase the use of blobs open-source library. The client application sends example data to the server and the server sends back an example response. We want to capture the request and response object from client and server respectively and save them as blobs to local storage.

I enhanced that application by benchmarking it using JMH with and without the blobs library being used in it. JMH stands for Java Microbenchmark Harness, and it helps you implement Java microbenchmarks correctly. I calculated 3 benchmarking modes, i.e. threshold, average response time and sample time for client-server interaction with the blobs getting stored in a local directory and without it.

This helped me understand the performance impact that is occurring after using the blobs library to store the blobs either to local storage or to S3 bucket. For benchmarking we used local storage only. I mocked the request and response of the client-server interaction so that the benchmarks are not affected by the API call latency.

I dockerized the same application and mounted a local storage volume so we could run the benchmarks with a specific amount of resources and without creating a load on the local system.

Below are the screenshots of the results I got from benchmarking.

Throughput
Average Time
Sample Time

After successfully getting the results of the benchmarks I moved on to another task which was to create blobs feature specifically for Haystack.

Below are some diagrams along with the explanation to simplify the things we tried to achieve using the modules present in blobs.

Phase 1 was to create multiple pluggable libraries to allow a client to interact with haystack-agent through gRPC and send the blobs to it. The haystack-agent will then dispatch the blob through a dispatcher plugged with it.

Phase 1

The same dispatchers will also contain the ability to read a blob using a key. This can be leveraged by another application which would want to read a blob through a gRPC service, interacting with haystack-agent.

Proto definitions used to create client and stub for gRPC in haystack-idl.

Phase 2 was to enhance the already created libraries to store the blob key in a span as metadata once the blob is about to be dispatched to haystack-agent for storage. This metadata can be used to retrieve the blob for that particular span through haystack-ui.

Phase 2

In order to show this use case, I created another application similar to blobs-example, which is haystack-blob-example.

Phase 3 was to create an HTTP reverse proxy for clients who do not want to create their own gRPC client to read the blobs. In this case, the client will be reading the blob as a string using a rest call to a proxy server that will in-turn call the gRPC service to retrieve the blob. To develop the reverse proxy we leveraged Google’s own grpc-gateway library.

Phase 3

A docker-compose example is available showing how blobs are stored in S3 using haystack-blob-example and how they are retrieved using haystack-ui. You can find the example in the haystack-docker repository.

I hope you enjoyed this introduction to blobs and haystack. For a deeper dive please explore the links below and comment or reach out to me for more info.

https://expediadotcom.github.io/haystack/

https://github.com/ExpediaDotCom/blobs

https://github.com/ExpediaDotCom/blobs-example

https://github.com/ExpediaDotCom/haystack-blob-example

https://github.com/ExpediaDotCom/haystack-docker

Learn more about technology at Expedia Group

--

--