It’s Deployment Day! But Do You Know Your AWS Resource Limits?

Published in

Peak

7 min readNov 4, 2019

In the engineering world, talk of a deployment on a Friday afternoon is a major rookie move and the last thing anyone needs is a weekend of rolling back updates and trying to debug someone else’s code. However despite the time of day or day of the week you roll out new changes, a part of you can’t help but worry that something hasn’t been accounted for. In this article I want to discuss one of the new features at Peak that enables our deployments to run just that little bit more smoothly.

At Peak the majority of our cloud architecture is hosted by AWS, thus one of the areas we have to be mindful about is being continuously aware of our resource usage (and not just for billing reasons!). As Peak’s customer base grows, additional services and resources are being utilised and unfortunately when AWS resources are being spun up left right and center, keeping track of your account’s status can be cumbersome. This article begins with the services AWS provide to alleviate some of these issues.

AWS Services

Every AWS account that is created is given a list of default service limits for all resources they provide. The default values vary dependent upon the region the resources are going to be created in. These limits are in place to ensure AWS can fulfil their guarantee to supply on demand available resources for all account holders. After all it would be unfair if one account holder in eu-west-1 region ruled them all. Some limits (also referred to as quotas) can be increased by making a request via the AWS Support Center. This may sound daunting considering the long list of services AWS provide but don’t panic they have kindly provided us with some tools to help us. The next couple of sections will briefly discuss these tools.

AWS Service Quotas Console

The Service Quotas tool allows you to monitor and manage AWS service limits for your account from one centralised location. This tool also supplies a full list of AWS services and default quota values for your account. If you wish to investigate your own limits or which resources you can adjust, I recommend visiting the Service Quotas Console and selecting the AWS Services option from the list. To illustrate this, the image below shows the console view for some of the default quotas on our account for the CloudFormation services. I didn’t choose CloudFormation for any other reason than it being one of my favourite services, but I digress. You may notice that the three quotas identified below can not be adjusted so tough luck on that if you had planned on declaring more than 100 mappings in your yaml/json file. If a value is populated in the Applied quota value column then this just implies that you have previously requested an increase for this service and the value will automatically overrides the default value.

This is great for examining your account’s service quotas and requesting increases, but how do we know what the account is currently using and when do we need to make these increases?

AWS Trusted Advisor

Along comes the Trusted Advisor (TA) tool, another AWS service with the goal of providing help by monitoring actual resource usage against the account limits. The Trusted Advisor is a customised cloud expert which has the purpose of helping your company optimise AWS resources. For the remainder of this section we will focus on the sub service of Service Limits, but for illustrative purposes the image below shows the landing page dashboard in the AWS console along with the other services provided by TA.

A Part of the Service Limits feature is the ability to run ad-hoc checks across your services on a per region basis. Once the Service Limits category is selected the checks commence and will scan the majority of popular services. Upon completion services will be appointed a status, green for no problems detected, a yellow alert for resources at 80% or more of the account limit, and a red alert where the resource is at 100% of the account limit. If you wish to increase the limit for a particular service you can go directly to AWS Support Center and create a case, or use the links provided within TA.

Disadvantages

Although AWS provide these tools and offer support for managing your AWS account resources, these services do offer some minor caveats which are listed below:

The Service Quotas console offers over 90 AWS services but for services not present in this console an alternative method is required.
Not all AWS services are checked by TA, the subset of services tend to cover all the popular ones used by the majority of their customers.
Upon entering the Service Limit section in TA, the checks are kicked off for that account and region, so depending on the number of resources your account holds this can take a little time for the results to appear.
The AWS console for the Service Limits is not the most user friendly interface.
The current process for requesting increases seems a little tedious and requires a manual approach.

How Does Peak Monitor This?

To tackle some of the aforementioned caveats with TA and Service Quotas, Peak have created an automated report which alerts our DevOps engineers on Slack once the report is ready to view. This promotes a more autonomous approach and doesn’t rely on any particular colleagues to manually monitor our resource usages.

The process includes multiple resources but here is a high level overview on the architecture and what it entails. The process is initiated by a CloudWatch rule which acts as a scheduler, so each week the rule will trigger the Task Definition to run. The Task Definition uses a Docker image from the ECR repository, allowing us to update the image whenever is required. The docker image contains a Python script which utilises the awslimitchecker and constructs the report disclosing the findings. The report is stored in a S3 bucket which triggers a Lambda function to utilise SNS and send an alert to Slack via a webhook. Enclosed below is a pretty diagram of this to illustrate how all the services work together.

The contents of the report has been constructed to highlight the number of checks conducted within each service and whether they breech 25%, 50%, 80% or 90% of the account limit thresholds. If any services fall into these thresholds then the report will disclose exactly which service requires attention. The snapshot below is from an old report but illustrates an example of when one of our services went beyond the 80% threshold for the number of S3 buckets allowed. This allowed our DevOps team take action by making a service limit increase request, or analyse the existing buckets and remove redundancies.

Next Steps

At Peak being autonomous in our work is really important to us, so although this current pipeline for generating reports is a step beyond checking our limits manually, there is always room for improvements. Another feature to implement into this report includes services which fall into the 90% threshold. This would involve an event being triggered to create a service limit increase case ready to be submitted to AWS. Now obviously we don’t want to just increase all of the limits whenever we approach the quotas, so this is where we can introduce some user prompting. An email or message can be sent to the AWS account admin with the option to approve the support case or decline it in order to review existing resources.

Furthermore when we go to deploy new releases in the future, we want to be able to calculate what extra resources will be required, and if this would breach any of our existing limits. From my previous experience of trying to figure out why something has failed due to reaching limits is by no means an easy process to debug, especially since the errors are not particularly helpful.

This brings my article to an end and I just want to thank you for taking the time out to read my very first article. Have a great day and happy coding!

View more articles like this in the Peak Content Hub.