Santander’s Bigdata Architecture

Notes from Cloudera’s blog on Santander’s Architecture using technologies HBASE, Flume, Kafka, Rocks DB & Spark.

Flaflka(Flume + Kafka ) is used to enrich incoming stream data & save in multiple channels.

Enrichment & Ingestion Using Flume & Kafka
  • Each Flume agent runs own RocksDB instance locally to host meta data for enrichment. Rocks DB updated hourly by using Computed output By spark. Local Rocks gives higher performance by avoiding rpc calls.
  • Custom Flume Enrichment Interceptors process events from upstream, transforms queries using metadata in Rocks DB, Writes results to kafka topic.
  • HBASE: Observer coprocessor used compute derived data on put and store derived data under different row key. This has several issues 1. Since computed row is stored in different row automicity cannot be guaranteed. 2. Every transaction has to read computed data and update with latest computed results, decreased write throughput by 1/20.
  • To avoid above problem, instead of precomputing and storing, computed values are calculated on the fly using End point coprocessor running on Hbase region server.

Note to self to read further:

Like what you read? Give sudhir patil a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.