Review — Designing Data-Intensive Application

AC
Data Folks Indonesia
2 min readAug 12, 2023
Photo by Shubham Dhage on Unsplash

A note on reading Designing Data-Intensive Application by Martin Kleppmann.

Let’s start from the motivation or urgency why I need to read this book. The company that I work for started to develop and productionize data product. In order to do so, I have wonder on how to build such system that consume data intensively. My questions are around how to implement best practices and have a right mindset to deploy a service in the right way such as how to measure service reliability, maintainability, and scalability. Which the book’s subtitle.

So, I decided to invest my time to read this book during my commute time. This book is divided into three parts: Foundations, Distributed Data, Derieved Data.

Foundations explains the concept of designing data applications such as the definition of reliability, scalability, and maintainability; how to measure; how to achieve; This part also explains data models and query languages such as SQL and NoSQL, storage systems, etc.

Distributed Data describes how to process data in a distributed manner across multiple machines. This part mentions how Map Reduce works.

Derieved Data focuses on data processing types: streaming, and batch processing. streaming processing is which sometimes called near-real-time. The job runs right after the event received. On the other hand, batch processing runs a job on predefined interval e.g hourly, daily, weekly. batch processing takes a huge inputs data and produces output data.

This book is recommended for those who want to grasp the concept of data application. But this is not something you looking for if you want to literally building something. The book contains snippets of code but only for demonstration purposes.

--

--