Maximise your cloud, minimise your costs

Ioan Bernevig
Pictet Technologies Blog
6 min readDec 6, 2021

Cloud native solutions are so common today that we don’t even realise sometimes that there are actual servers and resources running somewhere. In fact, at Pictet Technologies we don’t even have on-premises servers that we need to take care of; everything is hosted in the Cloud.
But of course, cloud computing also has a cost, and we are going to describe today how we went from a “startup-like” software company to a more structured IT solution provider, with a more efficient usage of our cloud infrastructure. In our case, our provider is Amazon Web Services, so we are going to focus on that particular vendor, but most of the principles described here can probably be applied to other cloud solutions.

Self-managed teams

One of the main characteristics of Pictet Technologies is that teams are “self-organised”. It means that each team is more or less autonomous in the sense that they have a certain kind of freedom to innovate, experiment and assess new technologies. Of course, introducing any new component must be agreed by both the development and operational team (typically, we can only use a DB vendor for which we have prod support, and licences we pay for), having in mind that we deliver products for the Pictet Group only.
But that being said, development teams are responsible for building their own CI/CD pipelines, and deciding how they want to develop and test their solutions in various integration environments.

Integration environment, in the cloud

As depicted hereafter, we strive for Infrastructure as Code. Any team can create their own configuration and deploy any resources they want (EC2, RDS, S3 buckets, etc.).

Usual flow of managing AWS resources

Growing fast (very fast!)

Pictet Technologies was created in 2016. In the following years, the amount of projects we got to work on rapidly grew. From dozens of instances at the beginning, we quickly reached more than 350 EC2 and almost 100 RDS instances in 2021.
At some point we realised that many of them were created months, or even years ago, and some were not used at all.
From that standpoint, we decided to take actions.

Sociocracy and Circles

In 2021, at the company level we decided to adopt the concept of semi-autonomous “Circles” and embrace a more sociocratic organisation (more to come about that in a future article). In short, each circle has the responsibility to execute, measure, and control its own processes to better achieve its goals. So we felt the need to create a circle that would take care of addressing this issue of an increasing cost that grew even more than the actual growth of the company : the “AWS Circle” (later renamed “DevOps Circle”).

Time for actions

That group of experts decided to take various actions, some of them having immediate impact, some others on the mid/long term.

Resource inventory

The very first step was to identify where the costs were and create some sort of inventory. All running instances at a certain time can be listed from the various consoles that AWS offers : EC2, RDS, S3, etc. You can see when an instance has been created, its activity in terms of CPU or memory usage, read/write IOPS or DB connections in the past hours/weeks.
With that inventory in our hands, we also managed to identify which resources were no longer used. Of course, we double checked everything with the team which initially created it, when the latter was clearly identified. Sometimes, that identification was not obvious and we decided to get rid of them with the good old “pull the plug” approach. We stopped these instances for a while, giving time for anyone to raise an issue. After a couple of weeks we destroyed them for good.

Cost explorer

On top of the various consoles, you have the “Cost Explorer”. This is the central place where you can see how much things cost in the cloud, and you can create many fine-grained reports on a daily or monthly basis, grouped by services types, regions, instance types, or even by custom tags that you can define yourself (more about that later).
It gives you figures for the past months, but can also generate estimates into the future based on the current trends. Data are available almost in real time (D+1).

Automatic stop/start of instances during the night and/or weekend

99% of the time, our dev environments are not used during nights and weekends. It was a no brainer to enable a rule where dev environments are only up during working hours. We accomplished that in 2 different ways, one after the other.

As we are using Bamboo jobs to deploy and undeploy applications after a successful build for example, a quick-win solution was to define a schedule for the deploy and undeploy tasks related to each project/environment. This required the involvement of the different teams to implement it. The major drawback of this solution was that the Bamboo agents availability every morning and night became a bottleneck. Indeed, everybody used the same schedule and some of these jobs took a few minutes to complete.

Then we introduced some flags to all our EC2 and RDS instances: among others, whether this is supposed to be shut down every night/weekend or start every morning/beginning of business week. Then, based on these tags and the use of the Terraform AWS Lambda Scheduler Stop/Start module, we had an autonomous system stopping and restarting all our entire development infrastructure.

The latter solution had one more benefit: when defining tags on the EC2 and RDS instances, we defined it by default as automatically stoppable. That way, the development teams had to explicitly mention which development environments had to be restarted or even stay active over the night/weekend.

Instance rightsizing

This is a two-step process: define the correct size, then resize. Defining the correct size may be more complex than it seems. It is not the goal of this article to go into those details and AWS provides some well-described tips here.
The resizing process itself is straightforward, either by using Terraform or directly from the AWS Console. The migration of instances of both EC2 and RDS databases is completely managed by AWS. You can foresee an unavailability period of about 10–15 minutes for the migration to complete.

Instance types

Using the latest generation of EC2 instances instead of the previous ones has multiple advantages:

  • Better hardware performances: faster CPUs, memory, network throughput;
  • Better virtualisation technology;
  • Lower costs: eg: 9–10% saving when migrating from t2.small to t3.small;
  • Safe: no changes to the data managed by the instance, an insignificant implementation time, etc.

This can be applicable to other instance types than EC2, like for RDS databases.

Savings Plans

Savings Plans are a flexible pricing model that offers low prices on EC2(s) for a commitment to a consistent amount (in $) of usage over a given period (in years).

Technically speaking, this solution is not yet in place but is planned to be as of 1st January 2022. On paper it is highly interesting, but it is a bit too early to give a feedback on it.

We had to be careful as along with cost reduction comes a commitment over the future usage of AWS by our company. A savings plan can be adjusted on different levels:

  • Compute vs EC2 Instance: Compute gives more flexibility but lower discount while EC2 instance type is more specific;
  • Type of instance: Which EC2 instance type is going to be used the most, define a minimum usage on which we can commit;
  • Time period: On which period of time can we commit (1, 3 years);
  • Payment: Paying all upfront comes with a higher discount that a partial or no upfront payment.

All of the above influences the total cost of the savings plan and the potential cost savings. It must be noted that an unsuitable savings plan may finally cost more than not having one !

Conclusion/Going further

We are already in the last quarter of the year. The original objective of about 25% of savings on the total AWS bill will be reached by the end of the year 2021, without even activating the savings plans. The latter will obviously bring some additional savings costs for next year.

We haven’t explored all the possible ways to reduce our AWS costs, but focused on the ones that have the most interesting cost/benefit ratio within the limited timeframe that we allocated to it. The next steps for us will consist of activation and follow-up of the savings plans and their usage before assessing the use of EKS for Kubernetes infra instead of the current K8s solution installed on EC2s.

--

--