It’s always I/O

Ivan Ramirez
5 min readAug 5, 2015

--

While working on a simple API service that returned a JSON response with the state of a given activity at a certain point in time we set the goal of building a service that would return these results in less than 20 ms and AT LEAST 600 requests per second (rps) We ended up learning again that not everything works as you plan.

💡This is a promising idea…

Our first idea was to precompute the view (JSON response) from an ‘offline’ process that listened to external events instead of having to query the database to fetch the data to send the response and then push that view to the database in a binary format as it is compressed and faster to read

The task was easy in our mind:

2 ms: Query to cassandra

10 ms: Serialize from Binary to Java Object (Pojo)

3 ms: Trim any fat from the Pojo

2 ms: Serialize to JSON and send back

The serialization from the database Blob to the Pojo was really expensive. Database queries were also expensive sometimes although not the rule and we wanted to avoid any caching, so every request would go to the database.

So we decided to do something else: since we were already using Kafka, we decided to have service receive (although it technically polls) the precomputed views via Kafka and update an in-memory database. If the data was too old or the server got restarted and we didn’t want to reprocess the Kafka topic, we would go to the database (Cassandra in this case) as we would have the data there.

That gave us the following times:

~0 ms: Read from memory O(1)

4 ms: Serialize from Binary to Java Object (Pojo)

3 ms: Trim any fat from the Pojo

2 ms: Serialize to JSON and send back

Much better! We tested three different JSON parsers: GSON, Jackson and Boon, GSON giving us the best results or at least consistently good results, although at the end they all had pretty similar numbers.

Still, we found out that probably we could do better, what if we didn’t have to serialize from Binary to Pojo and instead serialize from a JSON String to a Pojo? We tested it, serialization took a surprising 3 ms time

~0 ms: Read from memory O(1)

3 ms: Serialize from JSON to Java Object (Pojo)

3 ms: Trim any fat from the Pojo

2 ms: Serialize to JSON and send back

Awesome, just trimmed 1 ms! Then decided that perhaps we don’t need to trim anything from the POJO which means we can get rid of the serialization process altogether, we just needed to be smarter when we generated the precomputed view or just send the extra payload at the cost less processing time. Expected times: 2ms ❤️

💔 And then we tested

When we started testing this using Jmeter and AB our hearts broke. We were sending 1000 requests in groups of 50 concurrent requests and were getting 100ms, 200ms, 400ms response times. So we blamed SpringBoot and then Jersey and then the Embedded Tomcat server (which was using NIO which is supposed to support better throughput)

Our code was literally looking looking for a key in a HashMap (not a HashTable nor a ConcurrentHashMap) and returning that to our users. We started profilers, created a vanilla servlet, added metrics everywhere, took thread dumps, monitored the CPU and Memory of the server, tweaked around the JVM memory settings, we couldn’t find anything.

We knew our precomputed views were big but when we started to pay more attention to their sizes (and these were generated in another process) we discovered that some of these views were up to a 1.7 MB. Most were 600 KB or 400 KB just because the API is very verbose (which is expected if you are using JSON) so we decided to focus on other areas: rather than the performance of the code and the Java server start analyzing the behavior of the system as a whole, including the server where this was hosted

This was a c3.xlarge instance from AWS with a very decent setup and we noticed that during the load tests our CPUs didn’t go higher than 30%, after all we were just looking at a hash map and writing back, which is probably where this usage was. Then we looked at the IO usage, mostly the network: we used tools like nload (monitors the network usage), iostat (mostly to monitor any Disk IO activity), iotop (let’s you monitor the IO of each process) and dstat (similar to nload, easier to read), for example we used: “dstat -I 5,10 -N eth0”

We also found out that we had gzip enabled in our web server but AB didn’t use it so we were getting so we were sending up to 128MB/s when the max bandwidth allocated for an instance of this type was 62.6 MB/s according to the AWS documentation. So we started testing with GZIP enabled in AB and noticed that the CPU usage went to the roof (completely expected) but the network usage went back to normal (again expected because the data is compressed) Gzip made the network part better but response times weren’t considerably better, just because the server spent a lot of time compressing the output but we were able to respond to more requests.

So no matter what tradeoff we chose; gzip, less network usage, a bit more responses per second and so-so response times or sending the raw data without compression at a higher network usage but less CPU (with probably more chances of processing more requests at a lower pace) our output (or its size to be precise) put us in a situation where we would hit our limits very quickly, either CPU or Network

When we tested with smaller data (46 KB for example), the service was extremely fast: 60th percentiles <4ms and 99th percentile ~18ms serving more than 5000 requests per second, closer to our initial goal.

TL;DR

Our APIs responses were very large, so much so that with some load they would exhaust the network bandwidth of the server and make everything slow. No matter how simple your system is, you’ll always be bound to something. I know this sounds dumb (I feel dumb saying it) and logical, but sometimes you forget about the boundaries that you have in a system as a whole, not just in your code.

--

--