ON DATA ENGINEERING

Real-time Data Pipelines — Complexities & Considerations

Julien Kervizic
Hacking Analytics
29 min readDec 22, 2020

--

Photo by Robin Pierre on Unsplash

The shift towards real-time data flow has a major impact on the way applications are designed and on the work of data engineers. Dealing with real-time data flows brings a paradigm shift and an added layer of complexity compared to traditional integration and processing methods (i.e., batch).

There are real benefits to leveraging real-time data, but it requires specialized considerations in setting up the ingestion, processing, storing, and serving of that data. It brings about specific operational needs and a change in the way data engineers work. These should be taken into account when considering embarking on a real-time journey.

Use cases for leveraging Real-time Data

Streaming data integration is the foundation for leveraging streaming analytics. Specific use cases such as Fraud detection, contextual marketing triggers, Dynamic pricing all rely on leveraging a data feed or real-time data. If you cannot source the data in real-time, there is very little value to be gained in attempting to tackle these use cases.

Besides enabling new use cases, real-time data ingestion brings other sets of benefits, such as a decreased time to land the data, need to handle dependencies, and some other…

--

--

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com