Reworking User Journey Analytics Platform for Scale and Savings

Published in

OY! Indonesia

6 min readAug 29, 2023

Authors: Gusti Ngurah Yama Adi Putra, Ashal Farhan Wadjo

In the fast-paced world of B2B fintech, understanding user behavior through comprehensive user journey analytics is paramount. At OY! Indonesia, we embarked on a transformative journey to optimize our analytics security and infrastructure.

Part 1: Gaining complete control of our user’s data in a centralized analytical data lake

Numerous initiatives and improvements were implemented to cultivate a conducive fintech environment. Among these initiatives, the tools for data control and analytics stand out as game-changers. Driven by the need for greater scalability, data control, and tool efficiency, we transitioned from Amplitude to Elasticsearch (as data storage).

You can’t protect data, if you don’t know where it is — https://nsrd.info/blog/2019/12/09/meme-monday/

Join us as we delve into the motivations behind our decision, the remarkable results we achieved, and the invaluable benefits that unfolded along the way. Psst, we also save some of our cash due to this transition.

Enhancing Efficiency and Control with Elasticsearch

Efficiency and data control became key drivers for our transition from Amplitude to Elasticsearch. By centralizing our data storage and analytics processes in Elasticsearch, we embraced the principle of efficiency through centralization. Consolidating our analytics infrastructure with a single tool streamlined our workflow, eliminating the complexities of managing and synchronizing multiple systems. This centralization improved data accessibility, simplified data retrieval, and enabled optimized resource allocation, ultimately enhancing the overall efficiency of our analytics operations (a.k.a operational cost efficiency).

Moreover, the transition to Elasticsearch empowered us with enhanced data control through ownership. By relying on a single tool, we gained full ownership over our data lifecycle. We were able to implement customized data models, apply personalized analytics techniques, and tailor our data processes to align precisely with our business requirements. We could easily add more data to collect, grouping based on the new identifiers and storing it in private Elasticsearch storage. This comprehensive end-to-end data management ensures that our data is consumed and stored within a controlled system.

Big tech abusing my data, Nope! — https://twitter.com/TARTLEofficial/status/1461137296330264580

The Technical Process: Fluentd and Seamless Integration

To ensure a seamless transition, our backend infrastructure provided an API that utilized Fluentd to forward requests from our frontend to Elasticsearch. This technical process streamlined data flow, simple event handling, and integration between the frontend and backend. Instead of creating a separate library in the frontend, we leveraged the same library used to send events to Amplitude (@amplitude/analytics-browser). This integration facilitated smooth data forwarding to Elasticsearch, maintaining data consistency and ensuring efficient data analysis.

Before — after OY! user journey system — Before — After OY! User Journey System

First in the frontend, when initializing the amplitude like usual, define the API key and user id. Then set the serverUrl parameter to the API endpoint that will process the sent events

import { init } from '@amplitude/analytics-browser';

init('<api_key>', '<user_id>', {
  serverUrl: '<url_to_the_backend>',
});

With this configured, the events will be sent to our backend instead of the amplitude API endpoint. By adjusting the library’s endpoint through the serverUrl option during initialization, we seamlessly redirected events to our Fluentd backend.

The request is now in our backend services. Before our backend sends the data to Fluentd, the backend will map the data to the corresponding tag. This tag is useful for grouping our data later. Here’s how we utilize Fluentd in our backend.

import org.fluentd.logger.FluentLogger;

/** Instantiate Fluend logger with FluentLogger library. */
String prefix = "oyanalytics";
String host = getFluentdHost();
int port = getFluentdPort();
FluentLogger fluentLogger = FluentLogger.getLogger(prefix, host, port);

/** Firing data to Fluentd. */
String tag = mapAmplitudeTag(amplitudeRequest);
Map<String, Object> event = constructFluentdEvent(amplitudeRequest);
fluentLogger.log(tag, event);

We use FluentLogger library to forward the data to Fluentd. First, we define our prefix (“oyanalytics”) and assign the correct host and port (after discussing it with the DevOps team) based on the current environment (production/staging/development). Next, instantiate the FluentLogger object with prefix, host, and port data. Now the FluentLogger is ready to fire data to Fluentd. The data will be mapped into a tag used in Fluentd to group it. Finally, we construct the data into an event and send it with the FluentLoger log function and tag.

Now, how is Fluentd able to forward the event sent from the backend to Elasticsearch? We utilize two Fluentd directives source and match.

Fluentd configuration — Fluentd Configuration

Source directives telling Fluentd about where the data comes from. After Fluentd know where the data came from, we could tell what Fluentd should do with the data with match directives. Here the tag that we define in the backend plays an important role. Like its name, the match directives will match the tag given from the source and process the data in it. For example, the tag we use is businessweb with the prefix oyanalytics (it’s become oyanalytics.businessweb). Then we tell Fluentd about where the data should be written/forwarded to (in this case elasticsearch), to which port it should be, how’s the naming format, and configure the messaging buffer. That’s it, now our data will smoothly be forwarded from the backend to Elasticsearch with a reliable Fluentd system.

This technical process resulted in significant benefits for our analytics infrastructure. We achieved streamlined event handling, eliminating the need for separate libraries and ensuring efficient data flow. The modification of the Amplitude library and integration of Fluentd as the data forwarding mechanism allowed us to seamlessly bridge the frontend-backend, enabling data to be received, processed, and stored in Elasticsearch. This process enhanced data accuracy, reduced latency, and ensured a cohesive analytics ecosystem.

The Transition Journey and its Benefits

Our journey from Amplitude to Elasticsearch was driven by the pursuit of efficiency, enhanced data control, and streamlined analytics. By embracing the principles of centralization, ownership, and streamlined analytics, we harnessed the power of Elasticsearch to optimize our analytics infrastructure. The transition empowered us to unlock the potential of our user journey analytics, achieve cost savings, gain data control, and extract valuable insights.

The transition to Elasticsearch unleashed the full potential of advanced data analytics for our B2B fintech business. We witnessed a paradigm shift as we harnessed Elasticsearch’s capabilities to extract actionable insights, optimize our business processes, and stay ahead in a competitive landscape. With Elasticsearch as our foundation, we embarked on a data-driven journey that fuels innovation and empowers us to deliver great value to our clients. Last but not least, we managed to reduce our engineering operational costs by approximately Rp 455 million rupiah/year with this migration.

Conclusively, our transition from Amplitude to Elasticsearch stands as a resounding achievement in enhancing scalability and cost-effectiveness. This transformation streamlined our analytics infrastructure, fostering efficiency, and data control. Our next journey is to process the data in the Elasticsearch data lake into insightful information. The data lake will be integrated with state-of-the-art tools to extract strategic insights in it. Finally, there’s a long way ahead in our data to insight journey roadmap, so stay tuned.

Acknowledgements

Many thanks to (In alphabetical order) Dandi Diputra, Harditya Rahmat Ramadhan, Hilfi Madari Alkaff, Mohammad Nuruddin Effendi, and Rido Widi as a reviewer of this article.