How Tokopedia Build Their First Graph Database

Learning points in developing Graph Technology for better fraud detection

Fahrell Giovanny
Tokopedia Product
5 min readFeb 21, 2020

--

Managing a lot of data points as a technology company can be so complicated. Whether you handle it to make a faster and more informed business decision or to improve the flexibility and the capability of the company to react to uncertainty or to increase awareness of risk, to enable the company or organization to create a better preventive measure. This is where a graph database can help.

Graph Visualization Example of Buyer and Seller Transaction in Tokopedia

What is a graph database?

A graph is a structure, so in another way, you create a structure within your database.

Graph DB can manage the relationships between data as equally essential to the data itself. As with most technologies, there are few various ways that make up the crucial elements of a graph database.

One of the methods is the property graph model, in which the inputs are made from nodes, relationships, and properties (where we determine the information on the nodes and the connection between them).

In December 2018, Tokopedia Risk Management started working on this graph database project. We researched and decided to use one of those exciting graph projects (let’s say we use the ‘The Graph’). Long story short, things didn’t go our way. The Graph is still far from its ideal version (For instance, we found bugs, a lot of it). ‘The Graph’ is not stable, and most of the time, we are the ones who become their test engineer.

Under those circumstances, in May 2019, we decided to abandon the ‘The Graph’ project.

But luckily, our graph database story did not end there. In September 2019, our Principal Engineer started to research graph database alternatives. As a consequence of the failed experiment of ‘The Graph,’ we are all agreed to use the most popular one.

It turns out popularity correlates with reliability.

As time goes by, we are relying more and more on graph database features till now. Then, in 2020, Tokopedia Risk Management took a big leap of faith. Our tribe decided to build a dedicated squad to explore and expand the graph database applications.

Here are some key takeaways over the course developing Graph Database in Tokopedia.

We Have to Create a Specific Relation Name

In our early days inserting the data points to the graph database, we only used ‘HAS’ for all relations between nodes. As a result, we are unable to analyze particular data deeper or create an API for each data relation. If you are doing the same, don’t stop! Although it is hard, you can migrate all the old data and change it into a specific relation name.

Old Relation in Tokopedia Graph Database (Using Has)

Deadlock Might Happen

Deadlock is a situation in which two or more transactions are waiting for one another to give up locks. For example, a specific data transaction input might hold a lock on some rows in the ‘A’ table and needs to update some rows in the ‘B’ table to finish.

TL;DR Beware of deadlock if you try inserting a lot of data at the same time into the graph database.

Unstructured Data, Different Way to Query

If you are only accustomed to SQL type queries, you need to learn some of the NoSQL concepts first before doing complex queries. In a nutshell, a graph database is a non-relational data management system that does not require a fixed schema and avoids joins. Thus, it will be unfamiliar with most of us.

Resource Intensive

One of the real challenges of the graph database is they are not efficient at handling queries that span the entire database. A graph database is not optimized to store and retrieve massive transactional data. So it becomes tricky when we need to tackle heavy workloads; we also need to add more resources (Server, RAM, etc.).

Community vs. Enterprise

For example, one of the graph databases has a community and enterprise edition. The community edition is a fully-featured, best-in-class graph database that uses the GPL v3 license, whereas the enterprise edition is designed for more commercial deployments where scale and availability are necessary. To simplify things, you actually can use community edition (open source) to build your graph database, but it will require more effort compared to the enterprise.

Data Can Be Dangerous

If you can store all of the user data and information, you have to be appropriately transparent. You have to log all the activities involving graph databases (e.g., queries and stuff) so that the user can be accountable for all of their actions. Integrity sometimes can be hard to maintain. But with enforced transparency, it becomes so much easier.

Limitless Use Case

When you are not familiar with graph databases, your impression of it might be blurred. But if you have ever analyzed, proceed, and built graph databases, along the way, you will find the solution within the graph database application, sitting there, waiting to be discovered.

Most of the time, Graph DB can be used as the fraud detection engine (in our case as well). But now, developers all around the world have started to use it as a real-time recommendation engine, identity management, social media network, artificial intelligence, or even knowledge graph. The well-known example for the graph database use case is the Panama papers scandal, where the world-famous criminal, movie stars, and even Queen of England were found to be hiding assets in the offshore bank account.

Panama Papers Graph Visualization

No matter what or where your business stage is, you don’t have to be afraid of experimenting with the graph database. The uniqueness of the graph database information can help your company to significantly learn and grow even faster than before.

I would love to hear about what experience and idea you have encountered while developing graph databases.
Feel free to contact me at fahrell.giovanny@tokopedia.com

Special shoutout to:
Alluka Team — Yuga, Rian, and Josua
‘Technical Architect’ — Zaki and ‘Principal Engineer’ — Tjandrayana
‘Risk Management Leader’ — Fandy Soejanto and Faisal Arif

Click the 👏🏻 below so other people will see this here on Medium.

--

--