Fast, Reliable MongoDB with Diamanti

Scope of tutorial

In this tutorial we are presenting a simple solution to setup a mongoDB replica set with easy scalability, high availability and fast recovery using Diamanti mirrored volume. You will see how Diamanti simplifies the application deployment on kubernetes to be done in seconds. Diamanti abstracts all the failover and recovery complexity from user yet ensures that mongoDB is highly available and scalable.

Overview

First let’s get an overview of the solution proposed in this tutorial. We will deploy a 3 node mongoDB replica set with kubernete on Diamanti appliance. A replica set in MongoDB is a group of mongod processes that provide redundancy and high availability. We will utilize the Diamanti mirrored volumes for ensuring fast in-cluster recovery of mongoDB pods. We will be setting up a self recovering setup where it can failover and recover itself whenever a pod or node goes down. We will utilize a side-car container running in each mongoDB pod to facilitate the initial setup and failover.

Understanding mongoDB replica set

A MongoDB replica set is a group or cluster of replicated, stateful mongod instances which communicate with each other. Multiple instances allow the high availability as well as high capacity and scalability for a given mongoDB deployment. It is recommended to have 3 or more mongoDB nodes in a cluster. One node act as primary and remaining nodes are secondary. All data replicates from primary to secondary nodes. MongoDB facilitate the election among instance to select a primary or master at bringup or in case of failure. On application side mongoDB driver takes care of identifying the primary or master instance. MongoDB driver sends all the writes to current Primary/master and load balances the reads across the secondary replicas.

In order to configure each mongoDB instance for initial setup, healthcheck, election and failover we will be running a side-car container inside each mongoDB pod. This side-car container is a node.js based application and is referred from following github repo but modified for diamanti solution:
https://github.com/thesandlord/mongo-k8s-sidecar

Understanding node.js application

For demonstrating the working of mongoDB deployment, we will be using a simple node.js application. This application creates a restful API interface for any other front end application to easily manage the backend database.

What you need

  1. Official mongoDB docker image.
  2. Docker image for sidecar container:
    Github: https://github.com/diamantiSolutions/mongodb-cluster (images/mongo-k8s-sidecar/)
    Dockerhub: diamantisolutions/mongo-k8s-sidecar:v1
    Originally referred from: https://github.com/thesandlord/mongo-k8s-sidecar
  3. Docker image for node.js based restful API application which provides API to Read, Insert, edit data objects from the database.
    Github: https://github.com/diamantiSolutions/mongodb-cluster (images/nodeJsMongo/)
    Dockerhub: diamantisolutions/nodejs-mongo-app:v2
  4. Pod spec for launching pods based on all the above images can be downloaded from:
    Github: https://github.com/diamantiSolutions/mongodb-cluster (rc/)

Deploying a mongoDB replica set with Diamanti Mirrored volumes

Setting up Diamanti Cluster

In this tutorial we assume that you already have a working diamanti cluster ready for deployment.

Create a Diamanti Network Object

First step is to create diamanti network object to facilitate easy connectivity among the mongo nodes. Let’s create network named “default”. You will need to know the IP address ranges, gateway and vlan to be used for your network. Please consult your network administrator if you are not sure about it. Please be aware incorrect values in this step can cause unexpected behavior.

[[email protected] diamanti]# dctl network create default -s 172.16.137.0/24 --start 172.16.137.4 --end 172.16.137.20 -g 172.16.137.1 -v 137
[[email protected] diamanti]# dctl network list
NAME TYPE START ADDRESS TOTAL USED GATEWAY VLAN
default public 172.16.137.4 17 0 172.16.137.1 137

Create 2 way mirrored Diamanti Volume

We will add one node at a time with its own Diamanti volume. Its recommend to have more than 1 way mirroring for faster in-cluster recovery (Explained later). For this example lets use a two way mirrored volume:

[[email protected] ~]# dctl volume create vol-mongo-0 -s 20G -m 2
 NAME SIZE NODE LABELS PHASE ATTACH-STATUS ATTACHED-TO DEVICE-PATH AGE
 vol-mongo-0 0 [] <none> Pending 0s
 [[email protected] ~]# dctl volume list
 NAME SIZE NODE LABELS PHASE ATTACH-STATUS ATTACHED-TO DEVICE-PATH AGE
 Vol-mongo-0 20.03G [demo-2 demo-1] <none> Available Available 6s

Add First mongoDB node

We will use above created volume to launch the new mongoDB pod. You can use mongo-node-template.yaml file as the starting point to create a replication controller (Not to confuse with mongoDB replica set) and service for each mongo node. Lets just update the replica number in the template file using sed and create a pod and service for our first mongoDB node.

[[email protected] ~]# sed -e ‘s~<num>~0~g’ mongo-node-template.yaml | kubectl create -f –
replicationcontroller “mongo-0” created
service “mongo-svc-0” created

Pay attention to the the podspec, as you will see that each replication controller will have single replica of pod to ensure that if this pod dies for any reason it will be rescheduled to any available node. Also you will see that we are running two container inside the pod. First container is a regular mongoDB container, second one is a side-car container which helps setting up the main mongoDB container in same pod. Side-car container automates all the tasks related to initial setup, election, failover, recovery etc for the mongoDB container.

Add more mongoDB node

Now lets add more mongo node, following same procedure. First create the volume and then use that volume to create the pods for mongoDB. Please note that name of the volume must match in the spec. We are just changing the node number when adding more volumes and pods:

[[email protected] ~]# dctl volume create vol-mongo-1 -s 20G -m 2
NAME SIZE NODE LABELS PHASE ATTACH-STATUS ATTACHED-TO DEVICE-PATH AGE
vol-mongo-1 0 [] <none> Pending 0s
[[email protected] ~]# sed -e ‘s~<num>~1~g’ mongo-node-template.yaml | kubectl create -f –
replicationcontroller “mongo-1” created
service “mongo-svc-1” created

[[email protected] ~]# dctl volume create vol-mongo-2 -s 20G -m 2
NAME SIZE NODE LABELS PHASE ATTACH-STATUS ATTACHED-TO DEVICE-PATH AGE
vol-mongo-2 0 [] <none> Pending 0s
[[email protected] ~]# sed -e ‘s~<num>~2~g’ mongo-node-template.yaml | kubectl create -f –
replicationcontroller “mongo-2” created
service “mongo-svc-2” created

As we keep adding nodes, side-car containers will take care of facilitating the election of the master among these three nodes. Thanks to mongo architecture, user don’t need to worry about who is primary and also no external load balancing required among the replicas. Mongo driver automatically takes care of sending the writes to primary and reads to picked node. We just need to make sure we giving the list of all the nodes in replica set to mongo driver when initiating the connection from the application.

After adding above 3 node our mongo cluster is ready for work. Now lets create a simple node.js application pod to access the mongo cluster.

Setup the application pod

Now lets deploy a node.js based application to access our database cluster. This application a simple restful API application which talks to mongoDB in the background.If you dont need to modify the src code of this application, you can simply use the pod-spec you already downloaded from the repository of this demo. Please note that in the pod-sepc we have passed in the service names for all three mongo pod we have created to the ENV variable MHOST. If you adding more than these three mongo nodes please modify spec accordingly.

[[email protected] ~]# kubectl create -f nodeJsMongoApp.yaml
replicationcontroller “nodejsmongo-rc” created
service “nodejs-mongo-app” created

Testing the solution

Now we are all ready to test the solution we just created. Lets connect to the shell on the node.js application pod and try executing some APIs.

[[email protected] rc]# kubectl exec -it nodejsmongo-rc-m1byb /bin/sh
#insert one entry
/usr/src/node-app # curl –data “make=Tesla&model=modelS&year=2015&color=black” http://localhost:3000/api/cars | jq .
{“status”:”success”,”message”:”Inserted one car”}
#insert second entry
/usr/src/node-app # curl –data “make=Tesla&model=modelS&year=2015&color=black” http://localhost:3000/api/cars | jq .
{“status”:”success”,”message”:”Inserted one car”}
#get all entries
/usr/src/node-app # curl http://localhost:3000/api/cars | jq.
{“status”:”success”,”data”:[{“_id”:”592e0bf9496ade00211a7538″,”make”:”Tesla”,”model”:”modelS”,”year”:2015,”color”:”black”,”__v”:0},{“_id”:”592e0eef496ade00211a753b”,”make”:”Tesla”,”model”:”modelS”,”year”:2015,”color”:”black”,”__v”:0}],”message”:”Retrieved ALL cars”}
#insert third entry
/usr/src/node-app # curl –data “make=Tesla&model=modelX&year=2016&color=red” http://localhost:3000/api/cars | jq .
{“status”:”success”,”message”:”Inserted one car”}
#get all entries
/usr/src/node-app # curl http://localhost:3000/cars | jq .
{“status”:”success”,”data”:[{“_id”:”592e0bf9496ade00211a7538″,”make”:”Tesla”,”model”:”modelS”,”year”:2015,”color”:”black”,”__v”:0},{“_id”:”592e0eef496ade00211a753b”,”make”:”Tesla”,”model”:”modelS”,”year”:2015,”color”:”black”,”__v”:0},{“_id”:”592e0c3a496ade00211a753a”,”make”:”Tesla”,”model”:”modelX”,”year”:2016,”color”:”red”,”__v”:0}],”message”:”Retrieved ALL cars”}
#modify second entry
/usr/src/node-app # curl -X PUT –data “make=Tesla&model=roadster&year=2014&color=red” http://localhost:3000/api/cars/592e0eef496ade00211a753b
#get all entries
/usr/src/node-app # curl http://localhost:3000/api/cars | jq .
{“status”:”success”,”data”:[{“_id”:”592e0bf9496ade00211a7538″,”make”:”Tesla”,”model”:”modelS”,”year”:2015,”color”:”black”,”__v”:0},{“_id”:”592e0c3a496ade00211a753a”,”make”:”Tesla”,”model”:”modelX”,”year”:2016,”color”:”red”,”__v”:0},{“_id”:”592e0eef496ade00211a753b”,”make”:”Tesla”,”model”:”roadster”,”year”:2014,”color”:”red”,”__v”:0}],”message”:”Retrieved ALL cars”}

Testing the High Availability

Now lets test the availability by restarting the node which is running the primary mongoDB instance. In order to find who is primary , login to the shell of any replica, and run following cmd:

[[email protected] ~]# kubectl exec -it mongo-0-14d6o /bin/sh
rs0:SECONDARY> db.runCommand(“ismaster”)
{
“hosts” : [
“172.16.137.4:27017”,
“172.16.137.5:27017”,
“172.16.137.6:27017”
],
“setName” : “rs0”,
“ismaster” : false,
“secondary” : true,
“primary” : “172.16.137.6:27017”,
. . .
. . .
}

As you can see 172.16.137.6 is primary, let’s see which node its running on

[[email protected] ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
mongo-0-14d6o 2/2 Running 0 7m 172.16.137.4 demo-2
mongo-1-ec33z 2/2 Running 0 16m 172.16.137.5 demo-1
mongo-2-qgtve 2/2 Running 0 9m 172.16.137.6 demo-3
nodejsmongo-rc-q8ceg 1/1 Running 0 4m 172.16.137.8 demo-1

Now lets unplug the node demo-3 to bring it down. Alternatively you can shut it down or reboot it with ipmi tool. While node is down let’s quickly move on to next step.

Please note that as kubernete takes time to realize that node down, if you look at pods list pod will still show up as running. But rest of the mongo cluster will detect that master is not reachable and immediately call for election and will chose the master among rest of the nodes. You could test this by accessing the API on node.js app and doing write while master node went down. You might see a 20-30 sec unavailability time, while node down is detected and election is happening. Beyond that your application will work perfectly fine even though previous mongoDB master node is still down. Lets run query multiple time while previous master is down:

[[email protected] rc]# kubectl exec -it nodejsmongo-rc-q8ceg /bin/sh
/usr/src/node-app # curl http://localhost:3000/api/cars | jq .
{“status”:”success”,”data”:[{“_id”:”592e0bf9496ade00211a7538″,”make”:”Tesla”,”model”:”modelS”,”year”:2015,”color”:”black”,”__v”:0},{“_id”:”592e0c3a496ade00211a753a”,”make”:”Tesla”,”model”:”modelX”,”year”:2016,”color”:”red”,”__v”:0},{“_id”:”592e0eef496ade00211a753b”,”make”:”Tesla”,”model”:”roadster”,”year”:2014,”color”:”red”,”__v”:0}],”message”:”Retrieved ALL cars”}

After a while you can see that kuberenetes does detects that node is down:

[[email protected] ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
mongo-0-14d6o 2/2 Running 0 7m 172.16.137.4 demo-2
mongo-1-ec33z 2/2 Running 0 16m 172.16.137.5 demo-1
mongo-2-so6mu 0/2 Pending 0 11s <none>
nodejsmongo-rc-q8ceg 1/1 Running 0 4m 172.16.137.8 demo-1

And after taking its own time when node is back up and running you will see that previous master is now again recreated. Thanks to Diamanti mirrored volume, recovery time for old master as new secondary will very low and does not require any recreation or resyncing of database. And we will be soon back running to full capacity:

[[email protected] ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
mongo-0-14d6o 2/2 Running 0 20h 172.16.137.4 demo-2
mongo-1-ec33z 2/2 Running 0 20h 172.16.137.5 demo-1
mongo-2-so6mu 2/2 Running 0 20h 172.16.137.6 demo-3
nodejsmongo-rc-q8ceg 1/1 Running 0 20h 172.16.137.8 demo-1

Why mirroring

As you must have noticed that we have done the 2 way mirroring for each of the volume. Benefit of using the Diamanti mirroring option is when one node goes down the mirrored volume is available on other node to simply migrate the pod to new node. This help in recovering the lost mongo node instantaneously without any need of copying terabytes of data after the failure to recreate a node. You can even do 3 or more way mirroring to increase the redundancy for in-cluster disaster recovery option.

Scaling the mongodb deployment
Scaling the stateless pods are much easier, while with databases it gets trickier. Databases need there own persistent disk attached. If you try to scale the database replication controller, all the database pods will try to mount the same disk, which is not possible. In future tutorials we will talk about the implementation based on stateful set, where this problem could be solved. But for this example we will follow a simple two step process to add a node to replica set. It’s as easy as previous steps of creating a volume and launching pod with that volume. This step can be scripted to have single cmd scaling.

[[email protected] ~]# dctl volume create vol-mongo-3 -s 20G -m 2
[[email protected] ~]# sed -e ‘s~<num>~0~g’ mongo-node-template.yaml | kubectl create -f –
replicationcontroller “mongo-0” created
service “mongo-svc-0” created

Now lets restart the application pods with newly create service added to host list in the pod spec.

env:
– name: MHOST
value: “mongo-svc-0,mongo-svc-1,mongo-svc-2,,mongo-svc-2”

[[email protected] ~]# kubectl create -f nodeJsMongoApp.yaml
replicationcontroller “nodejsmongo-rc” created
service “nodejs-mongo-app” created

Scaling without restarting app

If you think its not a good idea to restart the application in order to add new replica to applications address list, you can opt for alternative option. Your application can query the list of mongo nodes from this micro-service running on different container. This service pings the Kubernetes API to get the IP of all MongoDB services. Application need to periodically poll this service to keep its list of address up to date, and use this list for each new mongo connection. You can find an example of such micro-service at: https://github.com/thesandlord/kubernetes-pod-ip-finder

Whats Next

This mongoDB solution can be further enhanced to have multi-zone disaster recovery and availability. We are not going to cover it in this tutorial, but you can achieve that by deploying the mongoDB nodes in same replica set but on different clusters/zone. You need to make sure that different clusters in different zone can connect to each other via VPN, so that each mongod instance in replica se can talk to each other.

Conclusion

At Diamanti we are trying to bring innovative ideas so that developers can focus on development not on infrastructure. In this tutorial we saw how easy it is to deploy a Fast, reliable and scalable mongoDB cluster on Diamanti appliance. In future tutorials we will continue to explore how other modern technologies can be deployed on Diamanti Appliance and how they can benefit from Diamanti solutions.