How we scaled MongoDB (part 1)

Published in

Touch4IT

4 min readDec 16, 2019

At Touch4IT, we frequently use MongoDB as a database solution for client projects. MongoDB has some interesting properties, however it generally uses a lot of RAM, therefore we are exploring options, how to scale MongoDB outside defined replica set running in docker.

This is the first post from our upcoming series on scaling MongoDB.

Intro

A slightly advanced understanding of Docker, networking and Linux is assumed.

In this series we show how we scaled our MongoDB across different VMs, without any downtime. The starting state consists of a MongoDB cluster running on a single VM (nodeA). The desired state is to have 4th MongoDB replica on a separate VM (nodeB).

Assume the following docker-compose.yml

version: '3.6'
services:
  mongo1:
    image: mongo:3.4.20
    command: bash -c "mongod --replSet rs1 && mongo"
    container_name: mongo1
    expose:
      - 27017
    volumes:
      - ./mongorc.js:/etc/mongorc.js
    networks:
      mongo:
        ipv4_address: 172.30.0.11
  mongo2:
    image: mongo:3.4.20
    command: bash -c "mongod --replSet rs1 && mongo"
    container_name: mongo2
    expose:
      - 27017
    volumes:
      - ./mongorc.js:/etc/mongorc.js
    networks:
      mongo:
        ipv4_address: 172.30.0.12
  mongo3:
    image: mongo:3.4.20
    command: bash -c "mongod --replSet rs1 && mongo"
    container_name: mongo3
    expose:
      - 27017
    volumes:
      - ./mongorc.js:/etc/mongorc.js
    networks:
      mongo:
        ipv4_address: 172.30.0.13networks:
  mongo:
    driver: bridge
    name: mongo
    ipam:
      config:
        - subnet: 172.30.0.0/24

And mongorc.js

rs.initiate(
    {_id : "rs1", members: [
      { _id: 0, host: "mongo1:27017" },
      { _id: 1, host: "mongo2:27017" },
      { _id: 2, host: "mongo3:27017" }
    ]}
  );
rs.slaveOk();

The challenge

Since the MongoDB is set up in a way, that only local access is allowed, the challenge is to make all MongoDB containers accessible from outside (e.g. nodeB), without any downtime (no docker restart commands or other tricks). The reason for this is to add a new MongoDB replica (call it mongo4, on nodeB), which is located outside of the docker mongo network. This will allow the cluster to be scaled or moved to other nodes, the first step towards high availability.

By default, only containers in the docker network called mongo have access to the MongoDB cluster.

When interacting with MongoDB cluster from the hosting machine, ping mongo1 works as expected, but what happens when you want to access mongo1 (172.30.0.11) from another machine?

Suppose nodeA is running the docker-compose.yml from the above snippet, and we want to scale replica set outside nodeA, eg, to nodeB. nodeA and nodeB are directly connected using subnet 192.168.123.0/24 nodeA has IP .10 and nodeB has .1

Well the first step is to set up static route on the the nodeB, so it knows about the docker mongo network.

Static route

# route to mongo network via nodeA IP
nodeB$ ip route add 172.30.0.0/24 via 192.168.123.10

Great, now we should be able to access mongo1’s IP from node B.

Well, no.

The router

nodeA must act as a router, to forward traffic to MongoDB, therefore we need to enable ip_forward , e.g. like this:

nodeA$ echo 1 > /proc/sys/net/ipv4/ip_forward

OK, now it should work, right?

Well, no.

The firewall

To understand what else we need to do, we will take a look at the nftables (new iptables).

nftables is considered as the new and improved alternative to iptables. This goal can be achieved with iptables as well, just translate the language of nft to the language of iptables and you are good to go. To learn more about nft see the wiki.

To list the current state of firewall run:

nodeA$ nft list ruleset

To further understand what is going on, when we want to reach MongoDB, we may use nft monitor trace to our advantage. First, we enable monitoring of packets by adding a simple rule:

# enable tracing for all packets that reach prerouting table
nodeA$ nft add rule ip nat PREROUTING meta nftrace set 1# run live trace
nodeA$ nft monitor trace# in another terminal, we try to ping the mongo container
# one packet is enough, so we don't spam
nodeB$ ping -c 1 172.30.0.11# notice the monitor command, it will output something like:

We can see that the rule execution ends on FORWARD chain, which has the default policy drop. If we examine the FORWARD chain, we see that the first rule jumps to DOCKER-USER chain. Hence we can use this conveniently created chain for our custom rule, which will allow traffic to MongoDB containers. The last piece of the puzzle is:

# add rule so the communication towards mongo will be accepted
nodeA$ nft add rule ip filter DOCKER-USER ip saddr 192.168.123.0/24 accept

As of now, we should be able to connect to MongoDB container from nodeB.

nodeB$ mongo 172.30.0.11
MongoDB shell version v4.2.1
connecting to: mongodb://172.30.0.11:27017/test?compressors=disabled&gssapiServiceName=mongodb
WARNING: No implicit session: Logical Sessions are only supported on server versions 3.6 and greater.
Implicit session: dummy session
MongoDB server version: 3.4.20
WARNING: shell and server versions do not match
Server has startup warnings: 
... blah blah ...
> 
# and here we are!

Conclusion

We managed to connect to a running MongoDB docker cluster, without touching the containers and most importantly without downtime. We are now prepared to create a new MongoDB and connect it to the existing replica set. To sum it up, these were the actions we took:

Add a static route to MongoDB network
Enable ip_forward
Allow connection using nftables
TADAA!

In the next part, we will talk about creating a new MongoDB replica.