Catch the Floo

Using AWS API Gateway, Kinesis, Elasticsearch and Kibana for product use case analysis

Yoav Nir
Expedia Group Technology
4 min readOct 10, 2019

--

Photo by Artem Maltsev on Unsplash

At Expedia Group™ Valar Team, there are two things we really like: tools and Harry Potter (more on that later). In fact, we like tools so much that we’ve taken ownership of some tools that other teams were too busy to look after. As we continue to maintain and evolve these tools, our stakeholders often want to know how the tools are being used. This helps them plan for the future as they know which features the business are finding useful. We can find such information in our log files, but those files are noisy by nature and don’t work well for historic data due to log retention limits. We wanted a better way to capture this data.

An intern project

I joined Expedia Partner Solutions (part of Expedia Group™) as an intern in mid-June, 2019. For my internship project, I was asked to create an application that would capture the data described above. On my first day, I felt a little lost trying to understand the tools, data, acronyms and everything our company does. Then my manager explained: “We want to be able to ask, who did what when?” Then everything clicked.

Before any work could be done on the application, the most important step had to be done: picking a name. That’s where our Harry Potter passion came into play. We had already named an application Hedwig, so that wasn’t a possibility. As we knew it would enable event messages to travel between our systems, we decided to go with Floo, after the Harry Potter Floo Network.

The solution

I now knew what I had to build, but didn’t yet know how. My only experience with the cloud was that pop-up that shows up on my iPhone each time I take a picture, saying that I’ve used all my iCloud storage. On top of that, Floo would need to be built and deployed through a continuous delivery pipeline which would be another learning curve.

We wanted it to be simple for applications to send event data and we wanted it to be easy enough to query data so that it can be used by stakeholders and engineering teams. It would have to be compatible with a wide range of message types: user permission changes, API key generation, contract amendments, etc. It should be able to answer basic analytics questions about our applications. For example, how many API keys were generated in the past month? We started by designing an API that would capture events. Next followed the architecture that would capture, deliver, store and query the event data. For easy integration, we decided to use Amazon Web Services (AWS) API Gateway. Applications can send their desired data by simply making an HTTP post request to the API Gateway whenever an event occurs. For data storage, we went with AWS Elasticsearch, as it turns out it had a number of advantages that would answer multiple design requirements. First, each data point is stored as a single JSON document, meaning each document is independent of a schema by default, allowing us to have flexibility in the range of events we store. However, in order to prevent the chaos that comes with independent data points being thrown to our cluster we came up with a data schema that would be verified by the gateway for every event that enters Floo. Each event has to contain the following:

  • Origin: the name of the application from which the event occurred
  • Topic: The cause that triggered the event
  • Event: A flexible object that contains information about the event

The origin acts as our index field in Elasticsearch, so that all data is aggregated by application. The topic then gives us a clear understanding of why an event has bean sent and what the data is about. In addition we use an AWS Lambda function to identify and timestamp each data point. This gives us a clear and concise way to capture event data. The cherry on the cake is that AWS provides a Kibana instance for each Elasticsearch cluster, so we have a powerful tool to filter and analyse our data out of the box. Finally, in order to support a large data flux and add design flexibility, we integrated our API Gateway with an AWS Kinesis Stream, which ensures we can easily scale Floo as we add more applications.

Delivering the project

Despite what seemed like huge challenges at the start, I managed to get Floo up and running in production during my internship. Floo has been happily shipping events since the beginning of September. And that’s thanks to the amazing team we have here at Expedia Partner Solutions. People seem to keep an open ear and are ready to listen and help you. I wouldn’t have been able to deliver Floo if not for the support that all the talent here is ready to give. Be it answering small questions about AWS services, being present at architecture reviews, or showing me how they have tackled similar issues, the “one team, group first” attitude has enabled me to learn very quickly during my internship at Expedia Group.

Key take-aways

A series of facts that I’ve learned while working on Floo and during my time with Expedia Partner Solutions:

  • API Gateway and Kinesis go together like soy sauce and ketchup. It’s tastier than it seems. You may not need a proxy lambda.
  • Continuous delivery is awesome and well worth the time investment to set up.
  • Elasticsearch is surprisingly powerful and it’s easy to get started with some simple queries. As long as your data is timestamped, it does a lot of work for you, right out of the box.
  • Don’t be afraid to ask questions. It really is the best way to accelerate your learning.

--

--