The Road to Couchbase
Our Couchbase journey has been started in last year. We still continue our journey :) In this article, I want to introduce our Couchbase infrastructure, disscuss decisions that we made before regarding Couchbase architectural issues, and our reasons to prefer Couchbase for our document-store database. In addition to this, I will be disscussing technical differences between Couchbase and other platform that we’re using…
Couchbase Infrastructure
Before diving into details, it is better to explain Couchbase infrastructure and our reasons to use Couchbase. Couchbase, is a document-store NoSQL database platform. Because it is a document-store, it consist of semi structure data format. In other words, Couchbase is a NoSQL database, open source, document-oriented, designed for interactive web applications and mobile applications. Couchbase Server documents of are stored as JSON. With integrated caching, Couchbase offers low latency read and write operations, providing linearly scalable throughput. Architecture has no single point of failure. The cluster is easy to be scaled horizontally and live cluster topology changes are supported. This means that there is no application downtime when updating the database, the software or the hardware using rolling upgrades. Couchbase Inc. develops and provides commercial support for the Couchbase open source project that is Apache 2.0 licensed.
Our main concerns are as follows;
- Scalability : Because we run e-commerce services in our platform we needed a database platform which is capable of scaling. In other words, Couchbase offers, scaleable infrastructure. Thus, we were able to scale Couchbase services that runs in our clusters with no downtime.
- Data Model : In some projects, we were facing database layout change problems. In other words, in RDBMS it is not easy to change a table column definitions or data model. Because Couchbase stores data as different JSON documents it is easy to write any documents in any structure.
- Cluster Structure : In some projects, the amount of data is huge and they were stored in a single table. Thus, a DBA needed to tune performance for a table consists of more than 100m rows etc… Because of performance issues we needed a database cluster that shards the data in the servers equally.
Because of these reasons, we decided to prefer Couchbase and built a platform. The infrastructure for this platform is called MDS model. MDS is a synonym for “multi dimensional scaling” model. Our Couchbase infrastructure consists of dedicated service servers. So, each server in a cluster consists of only one service. Couchbase server has different service. For example data,query,index,and eventing etc… We assigned only one role to a single server.
While we’re building our Couchbase infrastucture we also followed the official Couchbase deployment guidelines. According to this guidelines, we applied the following operating system configurations. You can visit the official Couchbase web pages.
In addition to this, we defined our cluster size and server requirements. This leads us to solve sizing problems. We developed some production Couchbase clusters with CPU,memory, and node quantity.
Trendyol Couchbase Architecture
After we defined the standart production configurations we were ready to study on architecture of Couchbase in production envinronment. In Trendyol, we decided to use MDS model in our production Couchbase clusters. In addition to this, if you study in database engineering field you have to consider provisioning,load balancing,disaster side management,replication,-backup policies,and monitoring…
Monitoring Architecture
Firstly, we have been using Grafana to monitor Couchbase clusters. Besides, we use prometheus as a data source and Grafana for visualizing the metrics. To manage this, we install node exporter and couchbase exporter on every Couchbase server.
After installing these components into our couchbase servers it starts to broadcast metrics related to server and couchbase. These metrics are registered into Consul service part with appropriate service tags.
You can find details of tools that we’re using. But, tools can not be the main purpose of your architectural designs. They can be changed,replaced or event deleted :)
- couchbase_exporter : https://github.com/couchbase/couchbase-exporter
- node_exporter : https://github.com/prometheus/node_exporter
- Consul : https://www.consul.io
- Prometheus: https://prometheus.io
- Grafana : https://grafana.com
Disaster Side Management
In our production environments, almost every database cluster should have a DR side. Thus, in our Couchbase clusters we also followed this rule. For this platform, we considered DR side as;
- Data
- Index
- Users
- Views
- Functions
In Couchbase, we considered database as a project consists of different components like data,index,and function. Thus, we decided to manage DR side for each object seperetaly. For the data part, we have been using XDCR feature provided by Couchbase. This allows us to replicate data to a different server or different cluster. With the help of this feature, we build additional Couchbase cluster called DR-Side for every production cluster. When it comes to other parts like index and users we developed Powershell automation. You can find the details as follows;
Backup and Restore Architecture
In addition to DR side improvements we always keep cold backup of our databases in a different server and clusters have been backup up periodically. In order to construct a backup infrastrcuture, we have been using a different server outside of clusters. This server is called backup server. All backup jobs managed inside this server.
Our backup procedure consists of these steps;
- Couchbase cluster is backed up to backup servers.
- The backup taken in the firs step is compacted.
- For every 3rd backup the merge operation is applied to backup directory.
By doing this, we are able to recover cluster-wide disasters and data problems. Because, sometimes you have to use backup in order to fix big disasters. Also,
Setup and Provisioning Architecture
As we have lots of steps in order to deliver a production Couchbase cluster it is not make sense to do this tasks by hand. We developed and automation solution for provisioning. We decided to use Ansible for provisioning. Because we consider every step to build a production cluster as a configuration on the operating system. Thus, we believe that this process is a configuration change process and prefer Ansible.
You can find the details of our provisioning process as follows;
This process is still improved day by day. Because, our main motivation is to provide cloud-native solutions to our development teams. Provisioning and scaling processes should be served as a self-service component. Thus, we’re currently building a new solution for our provisioning and scaling services. At the end, every development team will be able to provision a new Couchbase cluster and scale their existing cluster. Of course, both database and development team will keep track of resource usage. Resource management process will also be shared both system and development team together.
Conclusion
We disscussed our Couchbase infrastructure and architectural decisions. Using a document-store database can be harmfull if requirement engineering can not be completed. Because, every platform solves a set of problems. The key point is knowing which problems you want to solve. As database engineers and software engineers, we have to define requirements in a clear and observable way.
We will study and discuss the automation solution in our other article.
Regards,
Demir.