Tackling Fragmentation in Serverless Data Pipelines

How to stay sane when managing tens-to-hundreds of lambda-backed repositories.

Published in

Whispering Data

4 min readJun 29, 2020

Photo by Nicolas Ladino Silva on Unsplash

Let’s start with a confession: I love using serverless architectures. Developing in this environment can feel like a Shin Lim performance — a flick of the wrist here, a blow of smoke there — and magically the processed data appears where it needs to, in real-time.

Within the AWS ecosystem, a number of services stitched together provide this experience. And on the analytics team at Equinox Media on which I sit, we’ve embraced this architectural pattern to it’s fullest — foregoing self-maintained, provisioned servers to handle data processing — and opting instead for a parade of SQS queues, SNS topics, Kinesis streams, and of course, Lambda functions.

As a result, diagrams of our data pipelines bear resemblance visually to a 6th grader’s Rube Goldberg project. And as the metaphor suggests, this paradigm presents new organizational challenges to keep maintenance costs low.

A Cambrian Explosion

When adopting the serverless platform, one thing you’ll quickly notice is a proliferation in the number of code repositories your team is maintaining. This is the result of the a common development pattern that calls for a 1:1…

Tackling Fragmentation in Serverless Data Pipelines

How to stay sane when managing tens-to-hundreds of lambda-backed repositories.

A Cambrian Explosion

Written by Paul Singman