Which will be your new noSql, Couchbase or Mongo?

Erdem Erbas
Trendyol Tech
Published in
8 min readOct 11, 2019

--

Intro

Recently we’ve decided to replatform one of our (almost legacy) microservices, and while we were doing it seemed reasonable to switch to a noSql database. Up until now postgresql was being used but it was more beneficial to use a noSql since the data model used is a document based model rather than a relational one.

When noSql is mentioned, 3 databases comes into mind: Mongo, Couchbase and Cassandra. If you’re here for the Cassandra I’m sorry to inform you that it won’t be taken into consideration in this article. Reason for that is no one in the team was experienced with it in production. Couchbase is the go to noSql for our company, most of the teams uses it and that is why it is taken into consideration. Furthermore Mongo has been used in production previously by me and few more in our team so we felt comfortable with it.

Test Environment

Since we had no background in setting up on both of those databases we asked for it from our system and infrastructure team. At the end of the day we had 2 clusters on our hands structured as in the image. What you’ll notice is that sharding is not considered for Mongo since the application it is considered for will not hold that much data and it can be kept in check by archiving older records. Both clusters have similar specs. Special thanks to our DB and System teams for their support on setting these clusters up.

MongoDB with 1 primary and 2 secondaries
Multidimensional Couchbase Cluster

Our sample document will be a nested claim document. In it will be claimItems and in it it’ll hold status history of the given claimItem.

Claim: {
"createdDate": 1624959521846,
"fulfillmentTypeId": 18,
"supplierId": 58,
"orderId": 100000,
"claimItems": [
{
"claimItemStatus": 0,
"customerClaimItemReasonId": 49,
"orderLineItemId": 118,
"claimItemNote": "ClaimItemNote2d94aa40-9fd0-4802-bc9e-84adc3953c94",
"createdDate": 1554297893222,
"customerNote": "CustomerNoted4aed7ca-09af-426d-a7b9-13725065b364",
"id": 92,
"claimItemStatusHistories": [
{
"claimItemStatus": 3,
"executorId": 51
},
{
"claimItemStatus": 4,
"executorId": 100
}
],
"platformReasonName": "PlatformReasonName32552601-17e0-4c92-8762-ff884d333d86",
"claimIssue": {
"claimIssueReason": "ClaimIssueReason60c0d736-98ca-4f96-9a0c-69215efabdc6",
"claimIssueFiles": [
{
"fileName": "FileNameb1eb152d-7c31-4615-a46f-87d6770025f2",
"id": 71
}
],
"description": "Description78b3702e-8000-4b7b-a897-8648e358712d",
"id": 113,
"issueDate": 1617734585058,
"claimIssueReasonId": 59
},
"platformReasonExternalCode": 34
}
],
"lastModifiedDate": 1564501155685,
"shipmentPackageIds": [
26
]
}

DATA INITIALIZATION

What use is a database if there is no data?? So, going forward databases should be filled with data for our test to mean something, and that mentioned, data should be generated for us. On that subject JFixture will come to our help. It is going to aid us in generating random objects with given restrictions, then those will be inserted in to db. And bam! you got yourself a database full of random (not all fields) documents. (PS. I could not get the JFixture range feature to work so I did that my own in a primitive way :)) )

A simple Java program is used to achieve what mentioned above. Same program will also be used to query the databases, source will be shared at the end of the article.

QUERIES AND INDEXES

Now that there are some data we can execute some queries on them. Since we got a nested document (3 level), let’s go ahead and query that contains multiple fields, one shall be on most outer level, and the other shall be in the middle. But before doing that we should create an index on that field so it can be queried(for couchbase, you can query any field on mongo but without an index well… good luck)

Basic examples for both indexes and queries can be found on documentation of both db’s, but we’ll go over an example for our specific use case. So you had a chance to catch a glimpse of the sample document that we’re going to store, let’s go ahead and define a compound index (an index consisting of multiple fields) that consists of lastModifiedDate and claimItem.claimItemStatus for both cb and mongo:

Couchbase:

CREATE INDEX `claim_lastModifiedDate_itemStatus` ON `Claims`(`lastModifiedDate`,(distinct (array (`ci`.`claimItemStatus`) for `ci` in `claimItems` end))) 

MongoDB:

db.claims.createIndex( { "lastModifiedDate": 1, "claimItems.claimItemStatus": 1 } )

BEWARE!! By default, creating an index blocks all other operations on a mongo database, so to overcome that you can create your mongo index with the background parameter.

Now that we have our indexes let’s go on with our queries

Before going on with the queries on Java let’s talk about where you can execute and test them. Couchbase comes with a built in client that let’s you manage the cluster that you have. However for mongo you have to use an external one to manage your cluster(Most popular ones are Robo3T and Studio3T). Below are example queries for the compound index that we have created.

Couchbase:

SELECT META(claim).id, claim.*FROM Claims AS claimWHERE claim.lastModifiedDate >= 1561396685000 AND claim.lastModifiedDate <= 1563988685000AND ANY item IN claim.claimItems SATISFIES item.claimItemStatus =3 END LIMIT 15

Mongo:

db.getCollection('claims')
.find({"lastModifiedDate":{$gt:1557589728350},
"lastModifiedDate":{$lt:1657589728350},
"claimItems.claimItemStatus":6})

Moreover Java queries are provided below for each:

Couchbase:

select("META(claim).id, claim.*")
.from("Claims")
.as("claim")
.where(x("claim.lastModifiedDate").gte(startDate)
.and(x("claim.lastModifiedDate").lte(endDate))
.and(anyIn("item", x("claim.claimItems")).satisfies(x("item.claimItemStatus").in(itemStatuses.toString()))));

This query must be executed by couchbaseTemplate, and then a mapper should be used to map it to your object (example can be found in the code that will be given in the end).

Mongo:

List<Bson> filters = new ArrayList<>();
filters.add(Filters.gte("lastModifiedDate", startDate));
filters.add(Filters.lte("lastModifiedDate", endDate));
filters.add(Filters.in("claimItems.claimItemStatus", itemStatuses))
claimDocumentMongoCollection.find(Filters.and(filters))
.into(claims)

As for mongo, it maps the object itself.

TEST IT

As mentioned before a simple app which will provide us with an API so that we can query the db’s we created. After that a simple load test tool will help us bombard that API, therefore the dbs.

API

We’ll be using a simple Java application with spring boot to interact with our database. Sample apps can be found here https://blog.couchbase.com/couchbase-spring-boot-spring-data/ and here https://spring.io/guides/gs/accessing-data-mongodb/ . But beware that some of the things we’ll be doing will be different from those.

We’ll define a claimController which will consist of 9 endpoints, 4 for couchbase and 5 for mongo(extra one is the write operation with tx).

Let’s go over the 4 groups of endpoints and their functions

  1. Read with id, pretty simple one actually we’ll just get a random long value to query the db with it. Couchbase part will use no indexes for this since it is a key-value db and we’ve key :)
  2. Query deepest nested value available, the one that I’ve chosen is the claim.claimItems.claimItemStatusHistory.claimItemStatus, this is an integer field that can only have a value between 0–8.
  3. Query a composite index consisting of claim.lastModifiedDate and claim.claimItem.claimItemStatus. LastModifiedDate part will be span of a month.
  4. Use the query above get a single document and update an insignificant field of it, and save back to db.

An example of each one has been provided above, and all 3 read queries will limit the result to 15.

Api used will be provided at the end of the article so you can skip right to it and cross check if you’d like to. It also contains sample compose which you can use to get your cb and mongo running (couchbase part will contain an extra sh file to setup cluster, buckets and users).

Test It Already!

Database is full, indexes are created and queries are prepared. What else are we waiting for?? Let’s do this!!. To achieve our goal, a test tool had to be picked which was easy to setup and visualize the results.

k6 is a developer centric open source load and performance regression testing tool for testing, and it is extremely easy to setup and run. Below is a simple js file that you provide k6 to run(yes it is this simple, I was surprised too).

import http from 'k6/http';export default function() {
const r0 = http.get("http://apiUrl/claims/couchbase/idRead");
const r1 = http.get("http://apiUrl/claims/couchbase/read");
const r2 = http.get("http://apiUrl/claims/couchbase/getLastMonthRandomStatuses/1");
const r3 = http.get("http://apiUrl/claims/couchbase/read-update");
const r4 = http.get("http://apiUrl/claims/mongo/idRead");
const r5 = http.get("http://apiUrl/claims/mongo/read");
const r6 = http.get("http://apiUrl/claims/mongo/getLastMonthRandomStatuses/1");
const r7 = http.get("http://apiUrl/claims/mongo/read-update");
const r8 = http.get("http://apiUrl/claims/mongo/read-update-tx");
}

Moreover, it is also incredibly easy to install and run on mac. Only thing you need to do to install is this small command `brew install k6` and BAM! you’re ready to go.

For other OS’s, here is the documentation

As most of the testing tools out there k6 does not provide any visualization. Don’t feel bad though cause like everything up until this point, visualizing your test results are also effortless.

Gitlab repo that will be shared at the end also includes a docker compose file that can be used to get a grafana and an influxDb running. k6 will send the output of its test to influxdb, and grafana will use the data on influx to visualize your results.

It is also described here how to achieve that. There is even dashboard that are prepared just for this.

Below are the results with various test users

Test results with 50 virtual users.
Test results with 75 virtual users.
Test results with 100 virtual users.

Couchbase performs better on get with id cases, since it is essentially a key-value database.

On the other hand mongo has a better query and write performance. The downside of using replica set mongo is that your write operations are most likely to be your bottleneck since only primary can do that operation. However it is much more easier to set it up compared to sharded mongo or a couchbase cluster.

One of the most important differences between two is(was*) that mongo has multi-document transactions! (As of version 6.5 couchbase will also have multi-document transactions)

Conclusion

Just a few more sentences then I’m out of your hair. Keep in mind that the mongo architecture here is the replica set, this means that it can be scaled only vertically. You should decide on your mongo architecture depending on your data size.

At the end we’ve decided to go on with couchbase since our team’s experience with it having a greater impact than the performance difference between the two.

Thanks so much for sticking with me until the end :)

--

--