Building a data pipeline to handle >50M events/day (with a startup budget) — Collection

Published in

Super.com

7 min readApr 12, 2019

Building the foundations of the data stack for a startup can be overwhelming. It requires a mixture of system architecture and software engineering knowledge to pull it all together, as well as a lot of hard work and a little bit of luck.

At SnapTravel we have scaled over 400% in the last year and, in the process, we have learned a lot about tackling a variety of challenges at scale. As our business requirements grew so did our data requirements, both in breadth and depth. Unsurprisingly, we needed more than a few read-only databases being queried live from SQL script.

In this series, we will be covering SnapTravel’s event-data pipeline and will attempt to describe the building blocks for the Data Science Hierarchy of Needs. At the end of this series, you should feel confident and equipped to make trade-off decisions while building the critical infrastructure at a high-growth and fast-moving company. The structure of the series is to feature 1 layer of the pyramid in each post. Naturally, we will first tackle is Data Collection — the basis on which you will build your entire data strategy.

Building a data pipeline to handle >50M events/day (with a startup budget) — Collection

Written by Nehil Jain