Create a MongoDB sharded cluster with SSL enabled

Joynal Abedin
HackerNoon.com
8 min readMar 24, 2018

--

Near two months ago I started learning MongoDB seriously. At Growthfunnel.io we use MongoDB, and we need to scale our system for a large volume of data(approximately 6TB+) and high throughput. Sharding, database clustering it was all new to me, so I started learning. The purpose of this article is sharing and validating my knowledge with the community. I’m not an expert on any of this. I’m just sharing what have I have learned.

This tutorial explains step by step how to create a MongoDB sharded cluster. We will deploy this demo on a single machine.

Prerequisites:

  • MongoDB — 3.6.2
  • OpenSSL
  • NodeJS
  • Bash
  • Basic knowledge of mongodb sharding

What is database sharding anyway?

Sharding is a process of splitting data across multiple machines that separate large database into smaller, faster, easily managed parts called data shards. The word shard means a small part of a whole cluster.

Our sharding architecture

Our sharded cluster will run on a single machine, each component will start on separate process & port. This cluster partitioned into three shards, each shard contains two data members and one arbitrary member. Each shard replica has:

  • 1 Primary member
  • 1 Secondary member
  • 1 Arbitrary member

Prepare the environment

1. Install MongoDB from official documentation.

2. Configure hostname

Append this line 127.0.0.1 database.fluddi.com into /etc/hosts file; database.fluddi.com will be our database hostname.

3. Make sure data directory & Log directory have read and write permissions

Create data directory and log directory and own these directories to your user for reading & writing. For my case my user and usergroup name is vagrant.

4. Clone mongodb-sample-cluster repo

This repo contains configuration files for the cluster.

confs directory contains cluster components configurations; You can customize for your needs. Make sure that your data directory & log directory have read & write permission. By default data directory pointed on /data/mongodb/ & log directory pointed on /var/log/mongodb/test-cluster/.

Generate self signed SSL certificate

1. Generate certificate authority

Let’s generate a self-signed certificate for the sharded cluster, this is only for this demonstration. For production use, your MongoDB deployment should use valid certificates generated and signed by a single certificate authority. You or your organization can generate and maintain an independent certificate authority, or use certificates generated by a third-party SSL vendor.

Now own this directory, use your user and usergroup name.

OK, let’s create a certificate authority. Generate a private key for CA certificate and keep it very safe.

Now self-sign to this certificate.

This will prompt for certificate information.

2. Generate certificate for cluster members

Generate private key & CSR.

This will prompt for information, make sure domain name support wildcard domain.

Now self sign it.

Output will be something like this:

Create .pem file.

3. Generate client certificates

Each client certificate must have a unique & different SAN from cluster member certificate. Otherwise, MongoDB will consider it as a cluster member. Each certificate belongs to a MongoDB x.509 user, more details.

OK, let’s generate two certificates by following the previous step, just make sure OU is different.

  • For Web app

Everything will be same as member certificate only OU will be different.

  • For Database admin

Everything will be same as member certificate only OU will be different.

Now change all files permission to read-only.

Create the config server replica set

Create data directories, replace $DB_PATH with your actual DB path.

Change directory to mongodb-sample-cluster code repo, you just have cloned.

Start all member of config server replica set.

Connect to one of the config servers.

Initiate the replica set.

Create the shard replica sets

Deploy shard 0

Create data directories for replica instances.

Start each member of the shard replica set.

Connect to one member of the shard replica set.

Initiate the replica set.

It will return something like:

Deploy shard 1

Create data directories, replace $DB_PATH with your actual db path.

Start each member of the shard replica set.

Connect to one member of the shard replica set.

Initiate the replica set.

Deploy shard 2

Create data directories, replace $DB_PATH with your actual db path.

Start each member of the shard replica set.

Connect to one member of the shard replica set.

Initiate the replica set.

Connect a mongos to the cluster

View mongod & mongos processes.

Now we are ready to add databases and shard collections.

Add shards to the cluster

Connect a mongo shell to the mongos.

1. Create a admin user

Lets authenticate,

2. Add shard members

3. Add x509 user for webapp and administration

Enable sharding for a database

Before shard a collection, you must enable sharding for the collection’s database. Enabling sharding for a database does not redistribute data but make it possible to shard the collections in that database.

Once you enable sharding for a database, MongoDB assigns a primary shard for that database where MongoDB stores all data before sharding begins.

Enable sharding on a database, in this demo I’m using the namefluddi

Shard collection

1. Shard the collection

You need to enable sharding on a per-collection basis. Determine what you will use for the shard key. Your selection of the shard key affects the efficiency of sharding.

Now connect to mongos with a client certificate & authenticate.

Authenticate user,

Create a collection in fluddi database.

Create an index of visitors collection.

Let’s shard the collection. I’m choosing a compound key.

Check the cluster status.

Sample output:

2. Modify chunk size

Make chunk size smaller for demonstration purpose. Otherwise, you will need to generate a large volume of data. That is only for demonstration purpose, don’t do this in production.

Now chunk size will be 8MB.

Generate some dummy data (Optional)

Go to mongodb-sample-cluster code directory, you cloned.

  • Configure .env file, follow .env.example
  • Install packages, use npm i or yarn
  • Run node index.js, this will generate 50000 visitor records

Now again connect to mongos with a client certificate & authenticate.

Authenticate user,

Check cluster status.

Sample output:

Bind IP address

If you deployed whole things on a remote machine and wanted to access the database from your computer or you want to connect a web app from another server, you need to bind IP address. Change bindIp to 0.0.0.0 . Now mongos will listen on all the interfaces configured on your system.

Before you bind to other IP addresses, consider enabling access control and other security measures listed in Security Checklist to prevent unauthorized access.

Terminology

Replica set: A group of mongod processes that maintain the same data set.
Primary: Replica member accept writes.
Secondary: Pull and Replicates changes from Primary (via oplog).

Thanks for reading. Hope you guys enjoyed this article and got the idea. On my code repo, I included a init script to automate whole sharding setup process, take a look at it.

Credits

I borrowed some text from following links because it suits better than my version.

  1. http://searchcloudcomputing.techtarget.com/definition/sharding
  2. https://cloudmesh.github.io/introduction_to_cloud_computing/class/vc_sp15/mongodb_cluster.html

--

--