ArangoDb with Massively Scalable & High Availability at redBus.

Personalization for Customer Engagement on redBus

Arun Parmar
redbus India Blog
7 min readAug 18, 2023

--

- Personalization, known as Perz, is crucial for engaging and retaining customers on the redBus platform.
- Perz system performs computations to create customized experiences and evaluate offers for different customer cohorts.
- It shows relevant information based on customers’ journey history, including recent searches, purchases, and recommendations.
- Choosing the right technology and datastore is challenging but essential for scalability and efficiency.
- After trying various DataStores, ArangoDB was selected as the optimal choice for the redBus use case.

WHY?

Native Multi Model Approach

- When the business use case requires a database capable of storing data, normalizing databases, and building graphs, it is recommended to choose a solution that covers the majority of cases.
- ArangoDB efficiently handles data consistency, offers low cost, and ensures high availability, making it a suitable choice for such use cases.
- ArangoDB native data model support allows data to be stored as documents, key-value pairs, SQL, and graphs.
- The unique advantage of ArangoDB is its ability to use one core and one query language while supporting multiple data models.
- With ArangoDB, you can seamlessly access and query data across different models, enabling the development of high-performance applications.
- Additionally, ArangoDB allows for horizontal scaling, leveraging the full extent of all data models.

Installation & Deployments please refer to official website of ArangoDb.

What Comes once you are ready with setup.

A). Nice DashBoard with all useful stats like below:

  1. Which Engine => rocksdb.
  2. DB UpTime since service restart.
  3. Edition & Version.
  4. r/w req/sec.
  5. Connection and data sync.

6. Data Transfer in Bytes/Sec.

Memory
CPU

B) ArangoDb Query Language (AQL)

AQL (ArangoDB Query Language) offers an efficient approach to writing complex queries for data retrieval. One notable advantage is that the language and syntax remain consistent across different clients, regardless of the programming language being used.

However, it’s important to note that AQL does not allow for the deletion or dropping of collections, databases, or indexes. Nevertheless, you can still manipulate the data within the collections using AQL.

AQL DashBoard

C) Databases

ArangoDB is a database that supports multiple data models such as key-value, document, and graph. However, it does not allow running aggregations on multiple databases within the same server. Each database can contain data in different formats like key-value, collection, or graph.

To enhance functionality, ArangoDB utilizes a JavaScript framework called Foxx. This framework enables the development of data-centric HTTP microservices that can run directly inside ArangoDB.

When assigning names to databases in ArangoDB, it is important to note that non-ASCII characters are not allowed. Only alphanumeric characters (0–9, a-z), hyphen (-), and underscore (_) can be used in database names. Additionally, ArangoDB comes with a default database name called “_system”.

D) ArangoDb can be access via rest service

while accessing the arangoDb via rest(HTTP) api url path should contain db name else by default the request will go to _system db for example http:127.0.0.1:8529/_db/dbname/…...

There are lot of things which is happening internally how data stored on disk etc but that can easily available on official website of ArangoDb.

E) Collection

Here you can store data in documents collection but in Graph data can be stored under vertex collection .And to make connections between the vertices of a graph, you need to use edge collections. The documents they contain have a _from and a _to attribute to reference documents by their ID.

Simple query or syntex to access doc from collection.@ symbol uses to bind the parameters.

FOR profile IN @@profilecollection
FILTER profile.`name` == @value
RETURN profile

D) Queries and operators

ternary Operation in query.LET keyword or statement used to assign arbitrary value.

 let a = true? (return true) : (return false)
return a

Below is query which shows how native multi model approach works b/w multi model data or collection

main keywords which gets used in below queries are.

LET,document,filter

let today = date_now()
let lastDay = today - 7 days in seconds
let firstDoc = document("collname","uniqueKey")
let firstData= (for object in collectionname filter
object.primarykey == "value for that key" return u)[0]
let withcountbasedondatacount = (filter count(firstData.somekey) > 0
for c in firstData.somekey filter c.time > now
sort c.time
return c)
// above line we have done count > 0 ,sorting ,conditional statements and then filter


let somedata = ( for x in collection filter x.indexcolumn = somevalue
sort x.indexcolumn desc return
{
"a":x.somecolumnwhichisavailable ,
"b": x.somecolumnwhichisavailable,
"c": x.somecolumnwhichisavailable,
"d": x.somecolumnwhichisavailable,
"e": x.somecolumnwhichisavailable
})

// above code to return data in own format with condition

let somedata1 = ( for x in collection filter x.indexcolumn == somevalue
collect somevariablex = x.somecolumnwhichisavailable,
somevariabley=x.somecolumnwhichisavailable
into groups
return {
"somevariablex":somevariablex,
"somevariabley":somevariabley,
"groupvariabley":groups[0].x.somecolumnwhichisavailable
})

//he COLLECT operation can be used to group data by one or multiple group criteria.
//It can also be used to retrieve all distinct values, count how often values occur,
//and calculate statistical properties efficiently.The COLLECT statement will eliminate
//all local variables in the current scope. After COLLECT only the variables introduced by COLLECT itself are available.


let somedata2 = count(somedata[0].items) >0 ? document(somedata[0].items) : document(document("default").items[*])
//This is example for ternary operator

let somedata3 = (for g in collection filter g.key== "someuniquekey"
for v,e,p in 1..1 INBOUND g._id edgecollectionname/
filter e.isValid == true
return v._key
)

//This is example of Graph traversals in AQL Where v is vertex,e is edge,p is path
// inbound and outbound

let somedata4 = length(mappedUserData) > 0 ? UNIQUE(APPEND(userPrev, FLATTEN(mappedUserData[*].prev))) : userPrev
//This is example of checking length,aplly Unique,flat array in AQL


return {
"data1":firstDoc,
"data2":firstData,
"data3":withcountbasedondatacount,
"data4":somedata,
"data5":somedata1,
"data6":somedata2
"data7":somedata3
"data8":somedata4
}

In above query you can see multiple things happening in same query .

in graph traversal you can use named graph,vep which is vertex,edge,path(vertices & edges).

IN min..max the minimal and maximal depth for the traversal.max cannot be specified without min

OUTBOUND: follow outgoing or edges pointing in either direction in the traversal

for g in collection filter g.key== "someuniquekey" 
for v,e,p in 1..1 INBOUND g._id edgecollectionname/
filter e.isValid == true
return v._key

For more insight on graph please ref to this

F) Replication

ArangoDb allows to do replication in two ways synchronous and asynchronous replication.

  1. In synchronous write happens to each of node at once in ArangoDB Cluster.
  2. In asynchronous write happens between the Leader/Master and the Follower/slave of an ArangoDB.Which called as Active Failover.

In Active failover One ArangoDB Single-Server instance which is read / writable by clients called Leader/Master.One or more ArangoDB Single-Server instances, which are passive and not writable called Followers/slave, which asynchronously replicate data from the Leader/master

Basic diagram of active failover set up is

In our use case, we implemented asynchronous replication by utilizing two ArangoDB instances. One instance serves as the master node, while the other acts as the slave node.

Here’s how the replication process works:
1. All data is written to the master node.
2. Replication is enabled on the slave node, which receives a copy of the data from the master node.
3. The slave node asynchronously replicates the data from the master, ensuring that it stays up to date.

This setup allows for data redundancy and fault tolerance, as well as the ability to distribute read operations across multiple nodes.

slave dashboard for replication

Advantages of asynchronous replication is managing the node in case of master goes down.IO operations can be well controlled in this.

For more info do refer this

Health Monitoring

Write Operation to Master per 2 min
Read Operation from slave per 2 min
CPU of both the nodes
volume r/w iops
no of io waiting to complete

By analyzing the metrics mentioned above, we can determine the appropriate approach to take. These metrics provide valuable insights into customer behavior and preferences, allowing us to make informed decisions. With this information, we can tailor our strategies and initiatives to effectively engage and retain customers on the redBus platform.

Final Notes

In conclusion, it is important to note that there are multiple approaches to solving problems, and the choice of approach depends on the specific needs and use case of the organization or business. While there are various engineering aspects involved, I have shared the experience that is currently at the forefront of my mind. This experience contributed to successfully deploying the system and fulfilling most of the use case requirements. Thank you for your attention.

References

ArangoDb official website https://www.arangodb.com/docs/stable/

Grafana for metrics https://grafana.com/

--

--