We have been a loyal Linode cloud customer since 2015. As an e-commerce startup with multinational customers, our cloud infrastructure spans across multiple cloud providers such as Amazon AWS, Google Cloud, and Linode cloud services. Linde is our most favorite for many reasons. For example, it is reasonably priced and the easiest to configure, provision, and maintain instances on Linode than AWS since Linode has a clean and straightforward administrative dashboard. AWS feels cluttered and overkills for typical web infrastructure scenarios.
However, there are a few things we do not like about Linode that we never faced with AWS or Google cloud yet. In our four years of being a cloud customer of Linode, we had two mandatory, nonnegotiable scheduled downtime. In both cases, the downtimes were necessary to upgrade firmware for Intel CPU or Linux kernel update for the host machines.
Now, this is understandable as they are very proactive in keeping their host hardware up-to-date. They must perform these upgrades. However, it is hard to convince our enterprise customers to deal with such downtime, especially in the middle of their biggest retail seasons.
To make matters worse, since we cannot choose a physical host when creating a virtual node via either the administrative panel or the API. So, it is very likely that one can end up on the same host hardware with multiple virtual instances.
So we devised a simple DevOps procedure to remove the nonnegotiable nature of these downtimes.
Surviving Nodes Behind Load Balancer
For resources that are behind load balancers such as web nodes, add new nodes and take down old affected nodes. Typically, when the Linode team identifies a fix for a host machine, they make sure all new node instances are created on hardware that has already been patched. So new nodes are never affected by the existing known issues.
By turning off affected nodes that reside on physical hosts that require an update and turning on newly created nodes that reside on physical hosts that do not have the known issues, we mitigate the “non-negotiable” element with ease.
Surviving Standard Nodes
For resources that are not behind a load balancer, we decided to first create new instances on the same data center with the same specifications. Since a new instance is created on a host without known issues, we clone the old node onto the new node and swap the IP addresses. This does cause downtime, which depends on the disk size of the node being cloned. But, such a downtime can be planned, and therefore it is on our schedule! We can find a window of time when such cloning can take place with minimal business impact.
By cloning the old node to a new node and swapping the IP addresses between the nodes, we can survive the potential downtime per our managed schedule and with full client co-operations.
We are happy that Linode is straightforward to deal with so far. Their customer support team is very approachable as well. We like it, and we are excited to see them grow into a very cool suite of services.
We are looking forward to their Kubernetes container-orchestration service to be available soon.
The only recommendation we have for the Linode team is to offer their customers perhaps a simple option to choose a schedule and get the node cloning and IP swapping done in a single click. This makes a lot of sense to us!