Building a Mongo Replica Set with Chef and Vagrant.

Simple once you’ve sorted your hostnames.

Chef and MongoDB are two of our favourite things over at PolkaSpots Supafly Wi-Fi. We use Chef to keep about 2 dozen servers in check and Mongo as a massive data store before processing with Hadoop.

Here’s how you can create a 3-node replica set (with Vagrant).

The Opscode Chef recipe for Mongo is comprehensive and pretty complete, although it didn’t work out of the box for us. You should download their cookbook, not roll your own.

We’re assuming you’re OK with Vagrant. The only thing you’ll want to do is add a private network to each of the servers in your Vagrantfile:

config.vm.define :mongo2 do |conf|
conf.vm.provider “virtualbox” do |v|
conf.vm.network :private_network, ip: “10.0.0.12"
...

Get Your Hostnames Sorted

MongoDB does not like to replicate unless you’ve set up each node with the correct hostnames. Once you’ve mastered this, you should be ready to roll.

We use our own Chef Cookbook to manage our hosts (which we’ve not open sourced yet). You can either download the Opscode one, or manually update each node’s hosts file like so:

vi /etc/hosts
127.0.0.1 localhost
10.0.0.11 rep1
127.0.0.1 rep2
10.0.0.11 rep3

Notice we’re using 127.0.0.1 for the current machine, despite mapping it to a machine. More on this shortly.

Get your Override Attributes in Order

Don’t edit the MongoDB cookbook directly. It’s best override any attributes you need to change in your role file.

{
“name”: “flume”,
“override_attributes”: {
“mongodb”: {
“cluster_name” : “cluster1",
“config”: {
“replSet”: “rep20",
“bind_ip”: “0.0.0.0"
},
“mms_agent”: {
“api_key”: “249a674cead4407d94daFKmsmd30"
}
}
},
“env_run_lists”: {
“playground”: [
“role[base]”,
“recipe[mongodb::replicaset]”,
]
}
}

Mongo complained about the bind_ip initially but in the end, we left as 0.0.0.0 as per the cookbook (otherwise, you can’t connect from your replica nodes). Again, get your host file sorted and you won’t need to worry.

You only need to set two variables:

replSet: rep20-x
cluster_name: your-name-here
  • replSet is the name of your replica set and shouldn’t really change.
  • cluster_name is what the cookbook uses (basically) to discover what nodes should be in the replica set.

And add the recipe:

“recipe[mongodb::replicaset]”

Provision Your First MongoDB Node

Run vagrant provision mongo to fire up your first machine.

Vagrant can be an old dog unless your ramp up the RAM. Use 4Gb min for a decent runtime (using Debian).

SSH into the machine once it’s run (successfully we hope) and make sure you can access your first Mongo instance.

vagrant@rep1:~$ mongo
MongoDB shell version: 2.0.6
connecting to: test
>

Check the status of your cluster using:

> rs.status()

We actually found we needed to initialise the cluster the first time around using:

> rs.initiate()

I suspect the Cookbook does this for you, we’ll test again later. Please also note, we’re using a blank database. We’ll discuss getting data in later.

Check the status of the cluster again and make sure your first node is OK. Once you’ve done this…

Fire Up Your Remaining Nodes

Remember you need three nodes for a highly available replica set. Fire up the last two using the same instructions.

We had to run Chef a couple of times on both the new nodes and the primary one to get them all replicating.

A Few Troublesome Areas

Overall, it took us a day to get this going properly in a Production environment. The major things which held us up were as follows.

Couldn’t parse cfg object…

Failed to configure replicaset, reason: {“errmsg”=>”couldn’t parse cfg object can’t use localhost in repl set member names except when using it for all members”, “ok”=>0.0}

This seemed to be related to the hostname of the server. To fix this, we:

  • Altered the hostname of the machine in /etc/hostname
  • Edited /etc/hosts and mapped the node name to 127.0.0.1
  • Rebooted. Old school.

Couldn’t initiate, can’t find self…

Failed to configure replicaset, reason: {“errmsg”=>”couldn’t initiate : can’t find self in the replset config”, “ok”=>0.0}

This wasn’t really an issue with MongoDB or the cookboob but more of a chef run nuisance. We could see the correct message in the provisioning logs: Configuring replicaset with members rep1, rep2, rep3.

But, running the following on a working node only said there were two members.

> rs.status()

To fix this, we ran Chef again on a node with the Replica Set in operation. This adds the unknown node into the set. This seems a little annoying and are looking for a work-around.

Then, we ran the provisioner on the failing node — everything passed.

Again, check your nodes are online using the rs.status() command. And read your logs.

Reboot, reboot, reboot

Vagrant seems to require you reboot your machines on a regular basis — especially when working with MongoDB Replication. Set your hosts up and then reboot them all. Just do it, it will save you hours.

Finally

Run rs.status() on any node and check the output. It should look like this:

SECONDARY> rs.status()
{
“set” : “rep20",
“date” : ISODate(“2014-07-02T00:04:44Z”),
“myState” : 2,
“syncingTo” : “rep1:27017",
“members” : [
{
“_id” : 0,
“name” : “rep2:27017",
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“optime” : {
“t” : 1404259425000,
“i” : 1
},
“optimeDate” : ISODate(“2014-07-02T00:03:45Z”),
“self” : true
},
{
“_id” : 1,
“name” : “rep1:27017",
“health” : 1,
“state” : 1,
“stateStr” : “PRIMARY”,
“uptime” : 299,
“optime” : {
“t” : 1404259425000,
“i” : 1
},
“optimeDate” : ISODate(“2014-07-02T00:03:45Z”),
“lastHeartbeat” : ISODate(“2014-07-02T00:04:43Z”),
“pingMs” : 1
},
{
“_id” : 3,
“name” : “rep3:27017",
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 299,
“optime” : {
“t” : 1404259425000,
“i” : 1
},
“optimeDate” : ISODate(“2014-07-02T00:03:45Z”),
“lastHeartbeat” : ISODate(“2014-07-02T00:04:43Z”),
“pingMs” : 0
}
],
“ok” : 1
}

We’ll cover how we get our data in next time.