A Short Technology Retrospective

I’ve been using some new technologies in the past 7 months whilst working on a major re-write of an old legacy search application. Some of these technology choices have had a big impact and I’m excited to share brief details of them with you (catch me on twitter if you want more detail).



Everything from the web framework to the search engine to infrastructure tools will be discussed. But first let’s start with the most fundamental pieces. We built our application using Scala which runs on the JVM.

We Love Scala

Opinion is split on Scala but in our small team everyone loves it. Not least because we were able to do away with lots of the syntax noise from languages like Java and C#. We used TDD and harnessed the functional style of Scala in a lot of places. So the codebase is still as maintainable as any I’ve worked on, but is a lot easier on the eye.

There are a few people who like to moan about Scala. Mostly those that don’t give it enough time. But I find working with it everyday is a real pleasure. Give it a bit of time and at least make your own mind up before deciding you don’t like it.



We have a simple system that stops people abusing some of Scala’s features. We simply tell each other that we’ve gone a bit too far. And it works.

We Love Akka and Play

We only built an API that takes search requests and gives users search results. Things are pretty interesting though. Inside a web request we have lots of concurrent and asynchronous tasks as part of a data collection, ingestion, enhancement and querying pipeline.



Akka and Play are allowing us to get responses out to users, despite all of these tasks, on average in about 200ms on very modest EC2 boxes. Even at peak times our CPUs were comfortable at 30%..



We’re not pushing the Play framework too hard, and we only have about 5 endpoints, with no web pages. But it keeps out of our way mostly and acts as a very nice container for Akka. Akka itself is just wow for the concurrency aspects of our system.

Elasticsearch….. as a Cache

As part of our data pipeline, we pull in data from external sources, index it into Elasticsearch and then calculate facets based on the entire data set. All of these steps happen within a single web request on non-cache hits, yet we can still get responses out to a user by only adding on 80–160ms on top of the network I/O that we do not control.



Each document has a 15-minute TTL in-line with business policy, which is why Elasticsearch is just a short-term cache for the data we pull in from external sources. It’s bit smarter than a key/value type cache, though, because we get cache-hits from overlapping but not-identical queries.



On cache hits, were getting upto 500 documents back. Yet we’re getting very fast and very consistent query times averaging around 20ms (Elasticsearch round-trip time). At these peak times we have over 60 thousand documents (each has about 30 fields)..



During our load testing, we noticed that SSDs made a big difference. So pick an EC2 instance with SSDs if you’re using it for Elasticsearch..



We also use Elasticsearch with Kibana…..

Kibana is Incredible

We have a lot of graphs covering the technical and some of the business metrics of our system. We use graphite a little, but we’ve found a way to put most of our metrics into Kibana which is much prettier.

If you don’t know about Kibana, then make it a priority. It lets you calculate real-time metrics based on the logs created by your application. You can send your logs as json and then query them as structured documents using Lucene’s syntax.

Our use of Kibana is absolutely mind-blowing. In fact, Kibana is so good it’s brought the business and the dev team closer together as we share metrics and smash down silos.

Vagrant and Docker Dev-environment

Setting up dependencies for a project can be so annoying. On this project we have redis, elasticsearch, scala, sbt, play framework and a few other things. It’s been so great that I can just automate all of this inside a Vagrant environemnt and quickly spin it up on my home and work machines with no wasted time.

Not only me, but anyone new to the project can have the application and the tests running on their local machine in just a few minutes. They don’t even need to pollute their host OS.

In our Vagrant file, we’ve set it up so *everything* is in the Vagrant VM including play. This means all I need in my host OS is an IDE or text editor. The VM forwards all ports to the host, so I can still run a browser in my host OS to test the application.

I can log into the VM using SSH, so I’m still using the terminal as I would as if I was developing in the host. Vagrant is almost invisible. I’m loving it.

To provision the vagrant environment we’re using puppet, which pulls in all the dependencies and uses Docker to run instances of other services like redis and Elasticsearch as Docker containers. We’ve actually started to look at Fig for dev-environments so we can lose the Virtualbox overhead.

Docker for Testing on CI Environments

As with a dev-environment, setting up dependencies on CI build agents/runners can be a nightmare or at least lots of un-necessary hassles — especially with multiple projects running on shared CI servers.. So we don’t do that, we just have docker installed on the CI build agents.

Each project has a Dockerfile that spins up their necessary dependencies as docker containers on the build agent and then runs all of the automated tests. Simple, genius, love it.



I predict we will have even more uses for Docker when they hit the magical 1.0.

Puppet for Configuration Management

Not only do we use puppet to automate the dev environment with Vagrant, but it’s also used to provision our int, uat, and prod environments. On the small number of machines we have Puppet is drama-free and a very easy tool to work with. I also think the syntax is great, and personally prefer it over the pure ruby of Chef.

Overall Puppet has been great for us and we look forwarding to using it more in the future.

--

--

Nick Tune
Strategy, Architecture, Continuous Delivery, and DDD

Principal Consultant @ Empathy Software and author of Architecture Modernization (Manning)