Elastic search and Kibana in AWS with docker

As an architect I am starting to proof of concept a lot of potential solutions to the problems we have at Compare The Market in order to move the business forward.

We are now moving at fast pace towards a micro-service architecture, and some of the issues we have are now are forcing us to make hard and fast decisions, particularly about data

In my humble opinion (and hopefully others too!), a micro-service should be simple and have its own storage. This means that if I have a large monolithic service I now have to think about how I use the data and break it up.

For example, lets say I have two responsibilities within a service that read and write to the same database and we break this service up. Now I have two services reading and writing to the same data. A total micro-service no-no

This is exactly the issue I have currently, and as we introduce more requirements to query this data in ever richer scenarios, my data store is potentially not going to service all situations

This is where I would like to talk about elastic search. It has its own database thus allows us to have a separate storage area from the existing high frequency read/writes. It has the ability to scale and index documents at a far faster rate and query rate than our current solution. It also allows me to separate our data into the normalised use case models that we are so familiar with nowadays.

Elastic search

Initially I found that there was some good documentation from the guys at elastic.co but I really wanted to get an understanding of elastic search and how to run it, how to deploy it and how to maintain it. At time of writing I am starting to understand it and the process I have been through I hope will help someone to get something up and running in AWS very easily.

At that point thats where the really tough stuff starts, the maintaining, the monitoring and understanding if things go wrong (that will be for another time when I have gone through the pain) but for now this is roughly what I have come up with

Requirement

I started looking at elastic search, getting something up and running using AWS. The brief was as follows:

  1. Hosted in AWS
  2. Cluster solution
  3. Automated if possible
  4. Data should be persistent
  5. Use best practices where possible

Up Front

You will need as minimum a local machine installed with aws cli (click here for instructions) and then setup the aws cli configured to talk to your amazon account (click here for instructions)

You should now be able to use the aws cli to create and query your AWS infrastructure.

You should also have a ready made VPC on AWS with multi regions, again this can be achieved fairly easily by following this link

Deliverables

A semi-automated script to get an elastic search cluster up and running with one master node and a number of data nodes. Each node will have its own persistent volume.

I have attempted to put into diagrammatic form the process of starting a new cluster above. This assumes a few things, but for now here are the steps I have taken to create a cluster.

You can find all the files you need on my git-hub repository here with instructions on how to run all of this, but for now I will summarise here.

Step 1 : setting up your configuration file

There are 4 scripts to run, and each script will load a file called configuration.json so we need to set some values in this file that correspond to the AWS setup you have.

Most of the values have been set for you, for example role names and security group names but there are some that you need to fetch from your amazon console as follows:

  1. vpc-id which is the id of your VPC (Virtual Private Network) in AWS
  2. region has to be a valid region within your VPC
  3. availability-zone has to be a valid zone in your VPC and should be .a subset of region
  4. subnet-id is the id of the subnet you wish the cluster to be deployed (at time of writing I don’t support multi region)
  5. s3-bucket-name is the name of the bucket you wish your config files to be uploaded. These will be used by the instances at start-up time.
  6. base-ami is the current latest ubuntu image (I use currently ami-0ac019f4fcb7cb7e6)

Step 2: Creating the security groups and roles

The script here can be run. It will fetch the data from the configuration file and create a security group and the IAM role that will be used for building the base AMI for all our clusters.

If you go into the console in AWS you will now see that we have a security group allowing the ports we discussed about (as well as an ssh port for instance management)

You will see that the security group ingress rules have assigned access to your current external IP as well as your VPC CIDR block.

Step 3: Creating a base image

The script here can be run. This script will create an AMI for you by taking the amazon ubuntu image defined in the configuration file and running a user-data script that installs docker, docker-compose, pulls the image of kibana and elastic search so when we create our actual instances at runtime it doesn’t need to do all that work again!

Check your console again under EC2 for a base image thats been created. You will also notice that there now exists an instance that the base image has been created from.

You can now use the PEM key that has been downloaded for you to ssh into the box, (you can find the external IP address of this instance by clicking on it in the console and seeing the details)

ssh -i [pem-key-file.pem] ubuntu@[external-ip]

you should now have access to your instance that has created the image. We can check that our user-data ran ok by the following commands

docker-compose -version
docker — version

If both these commands came back with a version then we are good to go to the next step

Step 4: Uploading the config files to S3

The script can be found here. Each different node type requires the same configuration file list with slightly different configuration. The files we upload are:

  1. Dockerfile — which actually does a few things like install ec2 elastic plugin and run elastic search
  2. docker-compose.yml different configuration files for master and data nodes. For example our master node requires kibana as a running container (we could create a single node for kibana but there are many ways to do this)
  3. jvm.options which are used to configure the java virtual machine that elastic search runs on.
  4. elasticsearch.yml which configures the way elastic search is run on each instance

Step 5: Running the nodes

The script can be found here. This is the part that actually starts the new instances. Running this script will just create more nodes.

In essence this script will create a configured number of master and data nodes against the security groups and based on the AMI we created in step 2.

Each instance will have its own volume so that if the nodes are brought down the data will persist (that’s the theory)

You can find the external IP address of the master nodes and browse to the following uri:

http://[external-ip]:5601

If all is well you should find that you have access to kibana and can see via the monitoring that there are 3 nodes, 1 master and 2 data.

I won’t go into how elastic search can be queried and posted to in this blog, but here are a few trouble shooting things that helped me out… although the scripts should sort this all out :)”

Can’t ssh into an instance

If you can’t ssh into any of your instances with the PEM key that is downloaded I would suggest that the security group is blocking your IP or perhaps the outgoing communications is perhaps going through a proxy.

You can go into the security group created and open all ports to check this (make sure you don’t leave all ports open after the investigation)

The instance that I created the base image from doesn’t have docker installed

This generally is due to either an issue in the IAM role created. The Role MUST have S3 Get access to your bucket you created AND also the policy to DescribeInstances

A script is failing to run

The scripts are meant to be idempotent, in that running them many times should have no effect on the system, however there are routes within scripts that if something is created and a failure occurs its possible that certain things will not be created, most notably the IAM roles and policies.

I will be updating the scripts to allow for this in the future

Also check your configuration.json file to make sure that you have correct subnet and vpc information, if in doubt you can change the names of everything (via the configuration file) and start from scratch!

Further reading

There are a number of articles I have read and based my current work on, but mostly I have taken some of the concepts at AppyChip