Creating Event-Driven Serverless Data Pipelines with API Gateway and Lambda Functions

Andres Tavio
B6 Engineering
Published in
3 min readFeb 18, 2021

Motivation

At B6 Real Estate Advisors, we are always looking for new data sources to bring into our data ecosystem that will allow us to provide better insights to our users: commercial real estate brokers.

One of the recent initiatives has been to gather data from clients (or potential clients) on their investment preferences. This includes data points such as property type (Industrial, Multifamily, etc), geographical location within NYC, price ranges, and more. We also gather data about the client (phone, email, etc.) and the company they represent, if applicable.

Even though this data was being collected in a survey, it remained siloed from the applications that our brokers use. Our task was to bring it into the CRM so that the brokers could leverage it easily.

Designing the System

The mechanism developed to gather this data was a SurveySparrow survey, which gave us a few options for how to design this pipeline. They have an extensive API that we could poll for new data, but that implementation seemed unnecessary for a few reasons.

The primary reason was that we wanted to ingest this data in near real time, so the polling rate would have to be very frequent and the API is rate limited. A secondary reason was that the overhead in creating a new Airflow DAG and API wrapper/Airflow Hook for SurveySparrow seemed like over-engineering.

While we already have the majority of our pipelines orchestrated by Airflow and we could have come up with a solution using that tool that fulfilled most of our requirements, we thought this was a good opportunity to develop a serverless solution based on the event-driven nature of the data.

Luckily, SurveySparrow has an excellent Webhook feature that fits nicely into a serverless system design. We ended up designing a solution using API Gateway and Lambda Functions with AWS.

Implementation

At a high level, the serverless solution we came up with follows this sequence of events:

  1. A client completes a survey on SurveySparrow
  2. SurveySparrow emits a Webhook event with a payload that we have customized to make parsing the survey responses easier
  3. The Webhook event and payload hit an API hosted on API Gateway
  4. The API forwards the payload to a Lambda function
  5. The Lambda function reads the payload, performs transformations/validations on the data, and loads the respective entities into their tables in our CRM backend

We are happy with this solution because it satisfies our requirements in a simple, cheap, and easy to implement way. Survey responses are now being processed and pushed into the CRM in near-real time. The code for the implementation is encapsulated nicely in a single Lambda function (error handling aside). There is no need for a constantly running process, the system spins itself up and down as needed. Finally, scaling a serverless system is simple enough that we ensured this system won’t have to be optimized in the near future.

Conclusion

It’s important to design a solution based on the nature of the problem you’re trying to solve, not the tools you already have. We were able to cut down on development time and infrastructure costs while satisfying all requirements for a system by going serverless. Next time you’re tasked with ingesting event-driven data, consider designing a serverless solution.

--

--