TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Mastering Data Streaming in Python

💡Mike Shakhomirov
TDS Archive
Published in
12 min readAug 16, 2024

--

AI-generated image using Kandinsky

In this article, I will address the key challenges data engineers may encounter when designing streaming data pipelines. We’ll explore use case scenarios, provide Python code examples, discuss windowed calculations using streaming frameworks, and share best practices related to these topics.

In many applications, having access to real-time and continuously updated data is crucial. Fraud detection, churn prevention and recommendations are the best candidates for streaming. These data pipelines process data from various sources to multiple target destinations in real time, capturing events as they occur and enabling their transformation, enrichment, and analysis.

Streaming data pipeline

In one of my previous articles, I described the most common data pipeline design patterns and when to use them [1].

A data pipeline is a sequence of data processing steps, where each stage’s output becomes the input for the next, creating a logical flow of data.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

💡Mike Shakhomirov
💡Mike Shakhomirov

Written by 💡Mike Shakhomirov

Data Engineer, Data Strategy and Decision Advisor, Keynote Speaker | linktr.ee/mshakhomirov | @MShakhomirov

Responses (1)