Nearly Real-Time Event Tracking and Analysis with Clojure, AWS Lambda, and AWS Athena

S Agung Wijaya
HappyTech
Published in
2 min readNov 22, 2017

A couple of months ago, our team in Happyfresh got a story request to track user behaviors in one of the features in our Happyfresh app. The tracked data were estimated to be up to 90 rows for each order for either completed order or not. The data then later will be used for analytics by the Business Intelligence team and the Data Scientist, probably for them to decide the next strategy and feature development.

At first, the team suggested to use Segment to handle it since our event tracking features are mostly build on that and it’s really easy to use as well. But there are a couple of reasons not to develop it with that. The main reason is that this tracking is mostly using dynamic payload and we need a schema-on-read strategy, so building custom processor seems like a better solution. So, we decided to develop our own service.

Infrastructure

This service has only one functionality which is to store the tracking data somewhere. That’s why we think that it is possible to develop that as a serverless service hosted in AWS Lambda with Amazon SNS as the trigger and write it in Clojure. For the storage, we use Amazon S3 because it scales and does better than RDBMS for handling large data and saving data concurrently. Lastly, we use AWS Athena for the analytic tools as we can query the data on S3 using SQL-like query language.

Service Infrastructure

Why Clojure?

The first reason is that, at the time we developed, AWS Lambda already supported JVM based application, which Clojure is. Another reason is that it has good interop with Java classes. Because of this, we can easily use the huge collection of maven libraries and not giving a care in looking for the same-purpose libraries written in Clojure. And we also can look up Java documentations instead and code it in Clojure way.

Conclusion

The service is doing great by now but one of the biggest problem is that this service is for only one specific event. Therefore, for the other event tracking we are still utilizing Segment. If we have time we might build it to be more generic and build some client libraries to be used in our backend.

--

--