Re-thinking Data Pipeline Patterns
A Data pipeline is a sequence of transformations that converts raw data into actionable insights. In the past, the processing and storage engines were coupled together e.g., a traditional MPP warehouse combines both a processing and storage engine. With decoupled processing solutions (such as Spark, Redshift Spectrum, etc.) becoming mainstream in both open-source as well as the AWS Big Data Ecosystem, what are the popular data pipeline patterns? This post describes the data pipeline patterns we have defined in the context of decoupled processing engines.