DS520: Real-time Data Analytics Platform Based on Lambda Architectures

Leslie C Milton
From GHC With Love!
2 min readOct 3, 2019
Speaker: Yang Zhou

Speaker: Yang Zhou

Summary: Traditionally, our ability to deliver insights on data has been hindered by the latency issues associated with copying data periodically to data warehouses. Yang Zhou explained how to build a real-time analytics platform based on lambda architecture to help deliver data-driven insights and take actions on it in near real-time.

Yang works for Intuit as a software engineer.

Speaker started with a data science scenario. A data scientist, Julie, noticed a gap existed in end-to-end technical monitoring. There are two reasons for gap (no end to end transaction password and analytics not available in real time).

We need to get analytics data to be near time. Build a scalable, distributed real-time system that handles lots of data. Lambda architecture will solve this problem. This architecture is a great approach to go from batch processing to real-time streaming. There are 3 layers of Lambda architecture: Batch Layer, Serving Layer and Speed Layer.

Speaker described each layer in detail.

Speaker also discussed Intuit’s Real-Time Analytics Platform. The source data is sent to Apache Kafka to handle real time data (message passing). Spark streaming is used to consume the info from Kafka. Apache Spark is a unified analytics engine for large-scale data processing. Spark divides the data into mini-batches and sent to the Spark Engine. The data store is Cassandra. Cassandra is a distributed, NoSQL database management system.

Customers’ actions from web UI fire events to send JSON payload to Kafka.

The dashboards identify and report metrics associated with key customer activities. Aggregated events are also sent to Wavefront to automate alerting and monitoring. Tableau enables slicing and dicing of events. For example, Quickbooks is using Intuit’s technology to analyze near real-time events.

Anomaly detection time has been reduced from days to seconds. Data scientist can rest easy with this platform to feel happier with their work and life.

--

--