The OLX People Contingency To Trim Cost & Own Data

Raghav N
Building Aasaanjobs (An Olx Group Company)
3 min readMay 1, 2020
Image source: Snowplow Analytics

“Change is the only constant.”

This adage holds good for any field, especially for the software industry. In this modern technological era, building flawless software products or applications is a necessity. That said, knowing your customers or end-users and how to improve their experiences is equally important. And for this, we need to have some kind of metrics to measure how the customers use your products.

There are many products in the market which help you to understand the behavior of your customers’ interaction with your products by capturing additional data.

The OLX People “Way” of Gathering Metrics

In OLX People, we use a proprietary analytics product to analyze the users’ behavior while using OLX People products and applications(website and app) simultaneously.

It gives us different metrics about the usage of products in their well-crafted UI. But all this comes at a cost. Most of the products in the market (including the current proprietary product we use) are expensive. As the customer base grows, the cost associated with such products also increases.

Along with this, the data is stored in the external systems. It’s not as if we can’t get this data on our systems but for this, you need some other application to transfer the data, which compounds the cost even further.

Being a data-driven company, we explored some open-source options to lower the cost and own the data. We tried an open-source, Snowplow, which checked all our boxes. It needed some learning to understand and implement Snowplow.

The Snowplow Pipeline

Trackers are client or server-side libraries that track customer behavior by sending Snowplow events to a collector.

Collectors receive data in the form of GET or POST requests from the trackers and write the data to logs.

The Snowplow Enrichment step takes the raw log files generated by the Snowplow collectors, tidies up the data, and enriches it so that it is ready to be analyzed and uploaded into some storage.

From this point, you take this stored data to data modeling and analytic tools to crunch and analyze it.

We have a hybrid infrastructure where we use AWS for all our infrastructure and BigQuery for data analysis. We started this experiment with Terraform to ease the process of setting up infrastructure in Amazon. We used the candidate platform website and mobile data so that we could compare both the results from the current proprietary product we are using and Snowplow.

Below is the technology stack we used for Snowplow.

Tracker — Javascript tracker

Collector — AWS EC2

Enrich — Kinesis firehose, Lambda and some external databases for IP addresses

Storage — S3, BigQuery

By default, Snowplow captures a lot of data. And then it goes a step further to give us an enrichment process where we can get additional information like IP addresses using some databases. Snowplow also helps us to capture user-defined data, aka unstructured data. We have to define custom schemas in Iglu(Snowplow schema registry).

By doing all this, we managed to reduce the cost of the process to one-tenth and owning the data. We can now plug this data into some external UI or custom-build one. This is just the beginning of a new journey in this new analysis of data.

Why work with OLX People?

We don’t give you a job; we build your career and offer an opportunity to work with the brightest minds.

Join our team. Forward your CV to ta@olxpeople.com

--

--