Boost up infrastructure while saving money — Part I

Kushagra Mangal
OpsDev
Published in
10 min readApr 12, 2020

As the saying goes “House is only as strong as its foundation”, similarly the success of a tech-based company is as strong as its infrastructure and its management.

The focus of this article is towards boosting up the company’s infrastructure while achieving the cost benefits that the cloud and the latest technology have to offer.

Different optimization strategies we are going to discuss

  1. Utilizing the unused cloud for better cost efficiency (Getting the best resources for bucks we spend)
  2. Containerization (This is one of the most popular and practical trends that companies are following to easily scale up)
  3. Reducing development cost to half by scheduling it.
  4. Chopping off the unused resources (Identifying and getting rid of systems no longer being used)
  5. Reduce data transfer charges
  6. Moving to the newer technologies (Lot of new technologies provide the functionality same as a legacy at fraction of the cost)

Unused Cloud — Best way to save money 💰

There are usually three kinds of instances that each cloud providers have:

On-Demand Instances

These instances are the baseline for pricing the instances. There is no commitment or upfront payment required for them. These are usually priced on an hourly basis and you only pay for the computing capacity you use. The benefit of these instances is that you can change the instance type or compute capacity based on the application requirement without any interruptions.

Most of the companies/individuals that are in there initial phase use these instances.

Reserved Instances

These are the instances requested by a client for a long period of commitment. In these, the client makes a large upfront payment to receive greater discounts. RIs can minimize risks, more predictably manage budgets, and comply with policies that require longer-term commitments.

Your Reserved Instance will always be available for the operating system and Availability Zone in which you purchased it. But compared to On-Demand even if you are not using these, you will still be charged according to the commitment you made.

Spot/Preemptible Instances

Most of the cloud vendors have unused capacity in data centers and they sell them with different names

  1. Spot instances(AWS)
  2. Preemptible VM(GCP)
  3. Spot VM(Azure)

These are the instances with specific cloud providers fleets that are not reserved and are equivalent to extra resources for them. Due to this these instances are usually 60%–85% cheaper than the on-demand instances. Pricing benefit varies based on several parameters like region, availability, demand, etc. They are exactly the same as On-Demand instances in terms of hardware or software except for their interruptible nature.

You need to bid for these instances in order to get them. However, if someone else bids a higher price then it will be taken away from you.

These work little differently based on different cloud vendors:

Spot Instances AWS

AWS has had this option available for a long time. Options in this are:

  1. Specify the price you are willing to pay per hour. If an instance is available below that bid you can keep on using it, but if someone else bids higher it’s lost.
  2. Spot fleet is another option where multiple instances will be started, you need to specify a maximum price, target capacity, instance type, and AZ(Availablity Zone). Even though Spot prices change, the fleet tries to maintain the desired capacity.
  3. Uninterrupted 1 hour and 6-hour spot instances are available, but they have reduced savings.

Preemptible VM GCP

These are similar to AWS Spot instances. It has some differences though:

  1. These are 80% cheaper than regular VM and there is no variable pay hence no bidding.
  2. These have a maximum time period of 24 hours, but the compute engine can terminate an instance at any time.
  3. These are not covered under SLA.

Spot VM Azure

This is a relatively new feature by Azure and is still in public preview. In terms of functioning its similar to AWS spot instances

These spot instances can be used for any kind of batch job, video processing, rendering, etc that is the load that can be interrupted and we can save a huge amount of costs.

We need to always remember that instance can go down at any time, so we need to handle the infrastructure accordingly.

Well, is there a way we can try running Application servers on these spot instances?

Yes, there is an amazing service called Spotinst(Now renamed to Spot) which manages orchestration of Spot instances, Reserved, and On-demand instances to handle the workload. This provides with best possible SLA with spot instances.

They provide different solutions:

ElastiGroup

These allow migration of autoscaling groups from AWS or Azure. It migrates the scaling policies and handles scaling of an instance using its own automation engine. The benefit that it provides is — Its automation engine identifies the best possible pricing for spot instances and based on availability and instance type defined auto-scales them according to the metrics defined in scaling policies.

Also as the instances in auto-scaling groups are already managed for interruptions i.e. when an instance goes down what to do etc. You can add necessary shutdown and startup scripts here also to take the needed actions.

It balances the instances between spot, on-demand and reserved instances as defined in configuration to ensure the workload is always running. Here you can specify different instance types also with preference that will be allowed to run according to the requirements.

A sample to saving costs on the autoscaling group.

With this, a company that is already using the autoscaling group can immediately reduce their costs to a minimum of 60% by just migrating to this. It is a maximum of 5 clicks to do the migration.

Managed Instance

These allow migration of a lone-wolf instance that is running any workload. It provides facility to persist different resources like Root Volume, Data Volumes, Public IP, Private IP to facilitate the same behavior as an On-Demand instance with Spot Instances.

In the scenario of an interruption on Spot Instance, It will use the snapshots of volumes it is continuously taking to start a new instance with the same Data storage and migrate the Elastic IP accordingly and do the registrations with Load Balancer if needed.

This also has the option of On-Demand fallback like other solutions to ensure that if spot is not available then use the on-demand instance.

A sample to saving costs on a managed instance.

Ocean

This is the container infrastructure solution, which supports multiple managed Kubernetes providers as well as ECS with Amazon and Azure as well as KOPS.

So the way to use this is, we can create a Kubernetes cluster and run the Worker nodes using Spotinst. This allows all the compute capacity for your containers to be on spot. How this works is, when an instance needs to be interrupted, It will create a new instance and its scheduler on the cluster will schedule all the pods to this new instance and then shut down the pod.

But to handle this we need to be careful about certain things:

  1. Handling graceful shutdown in the application, As before shutdown, the scheduler will send a termination signal to all the pods, after getting that all the pods will have around 5 min before forced shutdown and all the traffic will only be redirected to the new pods.
  2. Also as these are spot instances, so sometimes you may not have even 5 minutes before shutdown. So if there is a need to ensure that a service can never be interrupted, then we can define a label as ‘od’. This will schedule the pod on an on-demand instance. https://docs.spot.io/container-management/kubernetes/kubernetes-concepts/spotinst-labels/

Spotinst handles all the requirements for auto-scaling up of your worker nodes as per the requirements of the services. Hence horizontal scaling up is also taken care of.

A sample to saving costs using the ocean.

This also improved our CPU utilization to around 80% and memory to 55%, which in turn incorporated the first strategy i.e. Containerization

Containerization — Docker 🐳

To containerize an application is similar to isolating an application with all of its dependencies in such a way that it can be easily be deployed at any server or machine without any hassle.

It follows the strategy of Build, Ship, Runs any App Anywhere. If you are into the development of technologies, you must have already heard about Virtual Machines.

VMs are an abstraction of the hardware layer (meaning that each VM simulates a physical machine that can run the software). VM technology can use one physical server to run the equivalent of many servers (each of which is called a VM)

Containers are instead abstraction of the application layer, where each container runs as isolated processes on the same operating system.

Containerization

So, what are the main benefits of containers from infrastructure and cost endpoints?

Infrastructure Benefit

Containers allow any application to be managed with extreme ease because of the reliability of the environment to be the same no matter what. This is possible because of the docker’s layer-based architecture.

This architecture allows us to add up an additional read or read/write layer over a pre-built image. For an eg. we can create a docker image from ubuntu:18.04 image, Over this, we can run different commands where each command is a read-only layer that adds the changes to files from that command. In the end, when we create a container from this image it adds a read/write layer over the prebuilt layers.

A sample container from an image

The benefit of this system is if the image size is 500Mb and then we are creating 50 containers from it. Only one image of 500Mb exists and over it, 50 Thin Read/Write layers exist which have only the size of files getting changed/modified.

Due to this, we can create a docker image that will always act the same on any machine. Also in case of scaling up service, we can just increase the number of docker containers for our image which will basically be the same as running multiple servers if the resource limit is not reached.

Cost-Benefit

Most of the organizations utilize cloud servers to spin up their applications, but most of the times we don’t fully utilize the machine’s resources that are available. For an eg. a company may be running 5 web applications and may have started 5 servers to serve each of them. In most of the scenarios, the application doesn’t actually fully utilize the CPU and Memory capability that the machine can offer.

Sample CPU Utilization running one service

Instead what can be done is to containerize each of the services and run them on a single machine. This will allow better utilization of the server and in turn, reduce the cost of running the servers. This strategy even allows for finer control on utilizing the server resources.

What all can be done with this?

Instead of hosting different services on different machines, host all services in one big machine using containers.

Use lightweight containers based on busybox or alpine.

These work similar to Ubuntu and Linux just by removing the unnecessary applications or parts from the OS, bringing the OS size even close to 5–10 MB

Use Kubernetes for better orchestration

Kubernetes is an open-source container orchestration tool, that allows managing containers with higher efficiency and reliability. It provides features as:

  1. Auto Healing
    If an application stops working, restart/recreate it to keep it running. While ensuring the traffic is only transmitted to working containers.
  2. Auto Scaling
    Based on the different parameters (Maybe requests/CPU/memory etc) a container can be scaled to ensure better performance at peak traffic load.
  3. Rolling updates
    When deploying changes, we can deploy them in a rolling manner i.e. Instead of sending all the traffic to newer deployed code, we can only send part of traffic to newer code and in case of any failure or issue immediately roll back to the stable version.
  4. Run Anywhere
    As this is an open-source tool that can run at any location weather be on-premise, hybrid or public cloud infrastructure, it allows for prevention from any kind of vendor lock-in. Migration of Kubernetes cluster from one location or vendor to another is as simple as copying data from one machine to another.
  5. Secret Management
    It also has secret management which prevents everyone from knowing the credentials as well as messing up the secrets.

You can also try services like AWS ECS(Elastic Container Service), Redhat container platform, AWS EKS(Elastic Kubernetes Service), GKE(Google Kubernetes Engine), AKS(Azure Kubernetes Service) which allow a managed system to orchestrate containers.

Due to containerization we were able to run 130+ services on 7 servers with 70% CPU and Memory utilization. Google runs billions of containers and lot of other bigger companies are using containers to significantly bring down the cost of infrastucture needed for its products

With the mentioned first two strategies, we were able to bring down our computing costs by around 65%(Till February) of what they used to be.

A big decline in compute costs using the first two strategies

We will be discussing the other strategies in the next part of this article. Its link will be updated here as soon as it is published.

If you find this article interesting, read more on OpsDev. Follow OpsDev on Instagram, Facebook & Twitter to get the latest updates on new articles to find more things about OpsDev — Like DevOps, DevSecOps, DataOps, etc.

Subscribe to our newsletter.

--

--

Kushagra Mangal
OpsDev
Editor for

Software Engineer at Zolostays | MERN Stack, AWS, Docker, Kubernetes, OAuth | Technology Enthusiast