Going for the Node: The 5ms API challenge

Taking on a challenge of re-writing an active API in Node proven to be an educational experience. Here are my insights and tips for a highly performing Node application.

Sagi Isha

--

I have recently concluded re-writing a pretty active API for a client. As its advocate, I obviously chose Node for the challenge in hope it will live up to it’s reputation and that I would be able to tell y’all about it. This API, formerly written in PHP, serves more than 3M requests a day (~3K a minute, ~170K an hour), all of them touching the database in some way and rendering different responses (sometime JSON, sometime HTML). Another objective of the re-write, besides a proper implementation of a tested MVC codebase (I chose CompoundJS), was improvement in resource utilization and a reduced, more effective, infrastructure cost. Later on came optimization, and what I like to call “The 5ms application”.

Here are the insights and tips I picked up on my way there.

Is Node up for the challenge? (or: know your infrastructure)

Let me skip to the end: Most definitely yes! After a short inner debate I chose Heroku as the platform for the application. I obviously considered AWS but later on I decided on Heroku mainly because I didn’t want to spend too much effort on ops, load balancing and other 3rd party implementations (logs, monitors, deploys etc). Heroku does a great job providing cost-effective add-ons for each task, seamless git deploys and easy-as-a-slider scaling (obviously not as a magic, but good enough for distributing stress across your fronts). I was skeptic at first, and it took some benchmarking to convince me it was right choice, but now I’m absolutely convinced. I kept the database instance on Amazon because I wanted more control and configuration over it, as since Heroku is based on top of AWS the network between them is highly efficient. Ideal.

Pre-optimiztion performance

At first it took me 3 regular (512MB) dynos to capacitate the traffic. That didn’t make sense, because Node should have been able to hold that concurrency with just one dyno. Three days later it hit me, it all has to do with RAM! Since each instance consumed about 90MB (said New Relic), in cluster mode with 4 cores I pretty much clogged the machines. After switching to a 1024MB dyno, the traffic was capacitated easily using a single dyno with improved performance and lowered costs.

Conclusion:

  • Use Heroku for your app — it delivers
  • Use cluster mode — to maximize your resources
  • Have at least 1024MB RAM in cluster mode
  • I kept my DB on AWS because I wanted more control but that’s absolutely negotiable

Eliminating stress

Node is really fast and resource efficient, the memory foot print of my highly active and concurred application was just 80MB per fork, so in many cases you’d find your databases is your bottleneck. Obviously I’m not saying anything new, however I do want to share some Node practice that will help remove stress and latency on your database and application. After launching the app initially, and without any optimizations, stress on the database rose to 1.2/2 CPU. That’s no way critical but I wanted to do something about that.

After investigating and finding my most time consuming queries, I used in-memory javascript objects as mapped digests to transact update queries in bulks. Stress immediately dropped to 0.5 CPU and freed memory on the database instance. Although your database caches a lots of it’s queries, you should still avoid the networking to it whenever you can. So later on I also used the same technique to cache highly repetitive read queries — that eliminated a massive amount of networking and improved the response time even more, “costing” me with single MBs of memory and reducing the stress on the DB instance to 0.1 avg CPU. In my case, I could do this in an in-memory implementation — you’d might justly argue that’s not too scalable, but it did fit my specific application needs. In a more distributed environment you should probably use Redis or Memcache to get the same effect.

Another thing I noticed when transacting bulk updates to the database is a peak in dyno memory and latency, after some tweaking I realized I shouldn’t transact too many queries simultaneously (async parallel), and that a cascaded (async series) approach proved to be as fast and more considerate on my resources and performance. The last thumb rule would be “unregistered users (or invalid requests in a case of an API) should not read from the database at all”, take that as your guideline in designing your cache, and you’d be in the right direction IMO.

Lastly I took into account that some queries to the database should not expect callbacks: analytics, emails or any other background jobs for example. Moving forward with your algorithm without waiting on their returned value (callbacks in Node’s case) eliminates a lot of I/O latency, and allows you to respond to the request promptly and without delay.

Callbacks are a feature, you actually don’t always have to wait for them to fire to move on with your logic

Conclusion:

  • Digest your updates (in-memory or 3rd party cache) and transact them once in a while
  • Prefer avoiding too many parallel requests to ease on your app CPU and memory
  • Use in-memory caching when applicable to reduce networking and its latency
  • Invalid or unregistered requests should not read from the database but from cache, or better yet denied early
  • Don’t automatically nest your logic in callbacks, in some cases you can carry on without depending on their response (analytics, emails, digests etc)
  • Place as many static assets like images, CSS and Javascript on S3 or other 3rd party storage to redirect requests from your server

Monitor and feel the pulse

After concluding my optimization, and going through several performance behaviors — from 25ms to 100ms to 10ms, I finally settled on a 3ms average response time and a 1/1 Apdex score, while the application is growing and capacitating more and more traffic with grace. Pretty neat, isn’t it?

Optimized performance

A lot of the optimizations (although I didn’t invent anything new here) could be done only thanks to strictly monitoring my application and understanding it’s bottlenecks. It’s different from app to app so my tips and insights might not be exactly suitable for your needs, but understanding your application will certainly will be. I enjoyed New Relic a lot (that now supports Node monitoring) in understanding my in-the-wild performance. Tailing and logging also made it possible to understand where I can ease the stress, sparse resources and make a more efficient application. Putting all these into practice, my client’s application performance now peaks, scales, and with reduced 70% of it’s original costs.

I <3 Node.

--

--