Deploy Cassandra Datastax 2-tier application using OpsWorks

Published in

bluemeric

7 min readDec 22, 2015

On hearing more and more discussions around PaaS and with the release of OpsWorks to simplify deployments, we were curious to try out a Cassandra Datastax application deployment using OpsWorks. Our ultimate goal of the experiment is to achieve a 2 tier application with Cassandra database as the data layer and a java application as the web layer. The application layer should be able to dynamically connect with the data layer.

Create a cookbook package and upload it to a repo
Create a stack with two layers (cassandra and application)
Configure the “cassandra” layer to bring up a Cassandra cluster
Configure the “application” layer to deploy a web application that would create a keyspace and tables in the Cassandra cluster
Dynamically pick up Cassandra instance IP from the “cassandra” layer to configure the web application in the “application” layer.

We have picked up existing cookbooks from the open source to illustrate the Cassandra deployment and have written custom recipes to perform step 4 & 5.

1. Create a cookbook package and upload it to the repo

Since we are planning to deploy 2 layers within the same stack, we need to package both the Cassandra cookbook and the application cookbook in the same zip file and place it in a repo. We choose S3 as the repo.

We picked an existing Cassandra Datastax cookbook from the opensource links below :

Apt cookbook available here.

Cassandra Datastax cookbook available here.

The cookbooks are then structured based on the AWS suggested directory structure:

When the Cassandra cookbook is deployed, we faced two issues common issues:

Cassandra Crashes due to java segmentation fault : to rectify this issue, refer here
Improper seed value : Ensure that the seed value in the /etc/cassandra/cassandra.yaml file is set to blank or to the private IP of the instance.

2. Create a stack with two layers (cassandra and application)

One key factor during this step is to create a custom chef cook book and pass some of the stack level json attributes. These attributes will be visible to all the instances across all the layers.

Edit the “Custom Chef JSON” and pass the below attributes:

{
	"cassandra": {
		"port": "9160",
		"keyspace": "TROV",
		"table": "peoplecheck"
	},
	"ssh": {
		"id": "ssh",
		"ssh_wrapper_file": "/tmp/ssh-wrapper.sh"
	},
	"git": {
		"id": "git",
		"checkout_dir": "/tmp/deployment",
		"git_repo":
	}
   }

As you can notice, some of the Cassandra attributes can be passed through the custom JSON. To illustrate how the layer (instance IPs) can be dynamically used across layers, we have delayed the use of Cassandra instance IP in the custom JSON until step 5. Once the stack is created, edit the stack and and link the cookbook repo location to the stack.

3. Configure the cassandra layer to bring up a cassandra cluster

Each layer would execute the default built-in OpsWorks recipes and will then execute the custom recipes. If the custom recipes use other packages like apt, we need to package those cookbooks as well in step 1. Open up the Cassandra service ports so that “application” layer can communicate with the services.

4. Configure the application layer to deploy an application that would create keyspace and tables in the cassandra cluster

We wrote a custom cookbook for the Application layer that would install the required application server, deploy the application war file and pass the Cassandra ring details & credentials to the web application.

Here we have multiple options to create the “Application layer” and deploy the application.

Option 1: Run a recipe during a lifecycle phase of the layer and apply to all the instances in the layer

We can choose to execute the recipe during various phases of the layer execution like setup, configure, deploy, undeploy, terminate. When an instance is added to the layer, the recipes get executed based on the lifecyle of the instance.

Option 2: Deploy application during deploy stage of the layer on a selected layer or an instance

Define an application under “Apps” section in OpsWorks. When the application is started, it downloads the application source from a git/subversion/S3/http repo. By deploying an app, the receipe defined during the deploy/undeploy stage of the layer gets executed along with any custom JSON. We have the flexibility to choose the instance/layer to which this application has to be deployed.

Option 3: Run a specific recipe on a selected layer or instance

We can define as a new deployment under “Deployment” or under the “Apps” which can be run on a single instance in a layer or applied to all instances in that layer. Similar to option 2, it is possible to pass a custom JSON for a selected layer or an instance (instead of for the entire stack).

We chose option 1, where the demoApp recipe downloads the application source from git and compile and deploy the application to the application server. The git location and the ssh keys are passed as custom JSON at the stack level.

5. Dynamically pick up the cassandra instance IP to configure the “application” layer

From here it is the choice of how we want the application to behave. We choose to write a web application the would connect to the Cassandra database and then create keyspaces and tables. Passing the Cassandra ring details to the web application is achieved by setting the application server environment variables (setenv.sh file in a tomcat container).

Below is a sample of the recipe that picks up the layer specific instance details and pass it to the application server’s environment.

One of the limitations that we faced here was the usage of databags. We couldnt pass the values across recipes using databags. Though the custom JSON that is associated with the stack is similar to a data bag, we can only retrieve values that are passed through custom JSON. Somehow setting values to these custom attributes did not work for us.

Ohai tool did not give list the custom JSON attributes. In order to get layer level attributes, we need to use the “opsworks-agent-cli” tool. More info on this tool is here. In the above recipe, you can see that we are passing the instance “cassandra1” IP address to the tomcat environment. We couldn’t get the layer’s ELB IP and thus we couldn’t use ELB for this purpose. Also, we need to wait until the “cassandra1” instance is online before starting an instance in “application” layer, so that the CLI tool can get “cassandra1” details.

Limitations in deployment (as of now):

The Chef repo is common to the entire stack. It is not possible to add a cookbook specific to a layer without changing the repo of the entire stack.
It is not possible to link the layers dynamically and sequence them unless we handle it in the layer cookbooks.
Ability to version control the application deployments. Since the Apps are downloaded from a git (say), it is always the HEAD version that is deployed.
Not possible to autoscale based on instance/layer metrics
Ease of deployment — it is not automatic.

Considering the pace at which new features are getting added to OpsWorks, some of the features which are seen as limitations as of today might get addressed before we publish this blog :-) While load balancing, autoscaling features are available in both Elastic Beanstalk and OpsWorks, there are a lot many advantages in using OpsWorks compared to Elastic Beanstalk.

Elastic Beanstalk to OpsWorks — Mindshift:

For those who are familiar with Elastic Beanstalk, using Opsworks would require a mindshift. From our observation, some of the differences these two solutions :

2 Layers vs Multiple Layers:

It is possible to create layers/tiers in both. What appears as a configuration in Elastic Beanstalk would be somewhat synonymous to layers. However in OpsWorks it is possible to have multiple layers. While Elastic Beanstalk creates an Application and environments, OpsWorks creates stacks and layers. In Elastic Beanstalk only 2 layers can be created ie, an application/web layer and a data layer.

Version Control:

Apps in OpsWorks are downloaded from a git (say), and it is always the HEAD version that is deployed. Where as in Elastic Beanstalk the applications are uploaded to the AWS repo which is version controlled by Elastic Beanstalk. Tus, it is possible to version control apps in Elastic beanstalk, but in OpsWorks it is not possible as of now.

Domain Names vs Elastic IPs :

Elastic beanstalk gives us a domain name for the application ending with elasticbeanstalk.com. Later we can change this DNS/hosted zone name using Route53. In case of OpsWorks, there are no default domain names. Using the ELB ip or the instance IPs, we can set the DNS/hosted zone name using Route53.

VPC support :

In Elastic Beanstalk, environment can be created within a VPC, which is currently not possible in OpsWorks

Manual addition of instances to LB :

In OpsWorks, while it is possible to create Elastic Load Balancer and associate the layer instances with the ELB, autoscale does not automatically add instances to the load balancer.

Autoscale does not really autoscale :

In OpsWorks, it is possible to add instances time based or load based, but automatically scaling based on the metrics like application performance or system resource is still not available.

Auto heal for high availability :

One of the good features is the auto healing feature which brings up the instances if they terminate or shutdown for some reasons.

Application Params through recipe:

Elastic Beanstalk gives an option to set the application parameters like memory and port etc which is not possible in OpsWorks.

Thus go for Elastic Beanstalk if you choose simplicity, else OpsWorks gives a lot of flexibility in deploying a multi-tier application.