Shanker Sneh
Jul 22, 2017 · 1 min read

The second lambda splits the click data into individual events, prepares them into Hive-friendly partitions (date) and pushes those to S3.

The MySql RDS is used to log the S3 keys being written by the lambda; this decouples the producer and the consumer.

The lambda can keep writing to S3 in realtime, and different consumers can read off MySql the S3 keys (at their own frequency), work on them and maintain their respective checkpoints for processing.

)
    Shanker Sneh

    Written by

    The Data guy