A tale of three kings

How we tested Python, Elixir and Go’s abilities to rebuild some of Unbabel’ core services.

At Unbabel we’re super interested in finding better ways to scale our tech, make it more efficient and deliver the best experience ever.

As such, over the last few weeks we found ourselves discussing how other stacks would fare if we had to rebuild one of our core services.

For context, our current stack is strong on Python, PostgreSQL, Redis and MongoDB.

I wanted to build something familiar with different tools. So I set out to implement a simple scenario, event logging with a rest api wrapping a nosql database.


Functionally speaking

Our API should basically respond to only one endpoint from which it would receive data from an event, store it and then return a count of all existing events.

Our specs:

Sequence Diagram
  1. An app sends info about an event to the API.
  2. API writes the new event on the database.
  3. API performs a count query — counting all existing events.
  4. API returns a count to the app.

Enter the three kings

Python, our go-to language. Elixir, a close friend, known for it's incredible low-latency, distributed and fault-tolerant nature. And Go, preceded only by its C-like performance reputation.

For our API implementations I opted for thin web frameworks only. I stayed away from more complete frameworks like Django, Phoenix, etc., to reduce the middleware layers which differ a lot from language to language.

Here's our base implementation:

Python 3.6.0 — flask 0.12
Python 3.6.0 — japronto 0.1.1 (testing the hype)
Elixir 1.4.1 — plug 1.3.0 and cowboy 1.1.2
Go 1.8 — gin 1.1.4

The same algorithm was mimicked as much as possible across the 3 languages ( and 4 frameworks):

1. Parse request params
2. Create Measurement Point Structure for InfluxDB
3. Write Measurement Point
4. Count all existing events
5. Return count

Why not Java, Clojure, C/C++, Ruby or …? Time. If you would like me to include any of these in the future, just drop me a line and I’ll gladly do it!

And now for persistence …

For the database I picked InfluxDB (1.2), a time series database, as I've been curious about it for a long time.

InfluxDB is a time series database built from the ground up to handle high write and query loads. InfluxDB is meant to be used as a backing store for any use case involving large amounts of timestamped data.

Emulating an app

The app is simulated using Apache JMeter, an application designed to load test functional behaviour and measure performance.

I used JMeter 3.1 to simulate high usage and try to mimic production-type of load.

Infrastructure

Machine #1

Macbook Pro 2016, 2,9 GHz Intel Core i5, 16 GB 2133 MHz LPDDR3

Machine #2 and #3

GC Compute Engine — n1-standard-1 (1 vCPU, 3.75 GB memory)

More about Google Cloud machines here.

No reverse proxy servers, and no request gateways.

Test Case

Test Pre-conditions

Before executing each test we need to make sure that

  1. our database is in a clean empty state (so that all tests have the same pre-conditions),
  2. we only test one server implementation per execution
  3. we execute a warm up run before taking any measures.

Test execution

JMeter will launch 100 concurrent and consecutive active threads during 5 min. and ramp up period of 1 second.

Test Results

To make sure the results are consistent I executed each test case at least 3 times (also validating all pre-conditions).


Key Findings

Disclosure

After trying out a couple of executions it was pretty clear that the stack was getting easily under stress as the average response times started to increase to non-friendly values on account of the database write/read response times.

As such I accepted it for what it was and considered this a load test.

Overall Results

After all test executions, clean ups and warm ups, here are the results.

Overall results table
Total requests

There’s a real big difference between Go and all other. At the end of the 5 minutes test, Go was able to process 1.74 times more events than Elixir and roughly 2 times more than Python.

Average Response Time

Average response times also translate the exact same findings as the total number of requests, no surprise here. Go's average response of 559ms, Elixir still below 1s with 974ms and Python around 1.144~1.193 ms.

Throughput

The average throughput can be translated into:

Go: ~10k RPM
Elixir: ~6k RPM
Python: ~5k RPM

Given my previous experience with Elixir I expected it to show significantly better single-process performance. Comparing Elixir to Python in this context reveals a performance difference of 18%. IMHO, not enough to justify a stack swap.

Japronto vs Flask didn't bring anything relevant to the table in this scenario. With only a 3.8% of performance improvement Japronto was not a game changer for Python.

Go actually delivered as expected. The numbers say it all, 559m under stress of 10k RPM are a definitely a good result not only by comparison but in absolute.

Individual Performance Signatures

Python — flask

Flask's response time signature is quite linear. There's a response time bump around minute 1 but after repeating the tests a couple of times it shows no impact in the final outcome.

Python — japronto

Japronto's signature was a bit erratic. It varied quite a lot but all tests showed the same pattern.

Elixir — plug

Elixir's signature constantly showed a wave-like pattern in all tests. This looks like an underlying behaviour which I didn't dig into but that I assume we can attribute to Beam (Erlang's VM).

Go — gin

Go's signature shows a linear increase but, from all frameworks, Go revealed the highest Standard Deviation in response times.

InfluxDB

From the ground up InfluxDB was a very easy to install and configure database. I tried in MacOS and Ubuntu 16.04 and it played nice and easy in both.

It has a SQL-like query syntax, which allows most people to get around it faster. It also introduces the notion of Measurements (roughly similar concept to a Collection or Table), Tags which are automatically indexed by InfluxDB and allow you to query by it and also Fields (which should be our data value entries).

It also provides a nice Admin webpage which lets you query and visualize data. You may even check DB stats and Diagnostics as it provides a couple of pre-defined query examples.

However I was really sad to find this in their docs:

Open-source InfluxDB does not support clustering. For high availability or horizontal scaling of InfluxDB, please investigate our commercial clustered offering, InfluxEnterprise.

To be able to use a DB cluster, is something quite important for us at this stage. Though it can be quite a while until you need an InfluxDB Cluster, it's not nice to know you can hit a hard wall, a paying wall.

Not sure I'd use InfluxDB for a highly scalable scenario as I would not be ok with having to pay for a clustering solution where I can deploy solutions like Cassandra, Hadoop or Kafka (not quite the same but still relevant).

Out with the DB

Our main purpose was to find out how a full stack would behave in the same conditions and we achieved that. However, it’s noticeable that the response times increase continuously and in a linear manner (kind of).

Before I concluded this experiment I wanted to be able to compare how the language stacks would fare against each other if we took out the biggest time consuming variable from the equation, the database.

For this purpose I ran the exact same tests but now there's no write or read, we parse the request params and return a fixed number. A dummy test.

The results were a bit different than expected:

In this scenario, Elixir was the fastest. Japronto responded faster than Go and Flask is the slowest of all.

Let's look at the RPM values this time around:

Go: ~80k RPM
Elixir: ~113k RPM
Python-Japronto: ~95k RPM
Python-Flask: ~25k RPM

This definitely gives us some perspective on http server response times vs full stack (server + database wrapper) response times. Interestingly enough we see a big delta between Flask and Japronto — the same was not true when we added InfluxDB to the mix.

Conclusion

In most scenarios, you can achieve the exact same end result with Python as you do with Go. With Python, you’ll probably be able to ship far faster, and have more developers available.

It’s also far more economical when it comes to the actual amount of code needing to be written; it took me around 100 lines of code in Go to do the same as with 20 lines with Elixir and 35 with Python.

That said, you shouldn’t really compare lines of code as a metric without strong context, because for scaling-sensitive scenarios, more lines of code is not relevant. In our testing, Go set the bar for performance very high, and while you may need to take the time to write better tests and keep an eye on code styling and documentation, it feels like it is worth the effort.

So, given these results, are we actually going to replace any of Unbabel’s core services with one of these stacks? Our current thinking is that Go might be best for processing, Elixir for I/O intensive services and Python for more mainstream scenarios. But yes. The answer is yes.

How we go about it is a topic for another post. ;)