Performance implications and the Intel Meltdown/Spectre patch

Tom Miller
ebizuk
Published in
3 min readJan 11, 2018

If you haven’t heard of Meltdown or Spectre (not the slightly disappointing Bond film!), the chances are you don’t move in IT circles.

Meltdown is a now infamous security vulnerability that came to light in late December, that primarily affects Intel Processors. On a scale of 1–10 of “badness”, this is right up there, a 9 if not a 10. It allows sensitive data (root passwords and crypto-keys, for example), to be stolen from memory. Hint: this is a very bad thing.

“ This is, essentially, a mega-gaffe by the semiconductor industry. As they souped up their CPUs to race them against each other, they left behind one thing in the dust. Security.”

Because all our servers are provided by Amazon AWS, we are lucky that issues like these are dealt with by Amazon for us in the form of security patches to guard us against such attacks.

As soon as a security patch was made available, all of our servers were instantly updated, protecting us and our customers’ data from potential hacks. (If your own servers and desktop computers have not been patched yet, install Windows Updates as soon as humanly possible).

However, since the New Year we had been seeing a considerable uptick in CPU usage across all our servers. At first, we put this down to the current time of year — New Year is always a busy time for our services as people are loading new agreements, end of year figures are being added and rebates calculated etc. — but we’ve had over fifteen of these “New Year rushes” before, and we’ve not seen similar spikes in CPU use.

After a few days of investigation, it’s clear that this Meltdown patch has a side-effect; it is having a big impact on server performance all across the internet (see here for more details) https://www.theregister.co.uk/2018/01/04/amazon_ec2_intel_meltdown_performance_hit/

These performance issues are not isolated to users of Amazon’s Cloud Computing infrastructure; Microsoft Azure and Google Cloud Compute infrastructures are also equally affected. Desktop PCs are also affected (so if you install the Windows Updates, your PC will run a little slower), but to a lesser degree than powerful Enterprise Servers such as we run.

“The solution is to either optimize application code running on the VMs, or move to more powerful and expensive virtual machines to take the extra load.”

This week we have been working to optimize some of our code to try and offset CPU intensive processes by queuing them so they don’t run synchronously, running them out of hours, or maneuvering them within our request life-cycle to be more optimal. We have made considerable strides this week — and today (Thursday), we have encouraging signs that these optimizations have worked; we’ve seen speed gains across our systems, and CPU usage has reduced dramatically.

Security is always our top priority, so the fact our system was always secure is reassuring — however speed and reliability are a close second, so we will not be satisfied until we are sure we’re at least back to where we were before Christmas (and hopefully we can do even better).

Increasing our server capacity and the power of our servers across the board would lead to significant extra costs that we’d inevitably have to pass on to our customers, and this is something we of course want to avoid.

We will continue to assess the situation over the coming days, doing what we can to optimize within software where possible in order to improve performance, reduce server overhead, and provide a fast and speedy service to our customers.

Next week we will assess whether the updates we’ve made are enough to meet our standards, and will communicate our findings then.

Thank you.

Tom Miller
Managing Director
eBiz

--

--

Tom Miller
ebizuk
Editor for

Managing Director of eBiz — providing solutions to Buying groups for over 15 years.