Best practices for AWS Cloud solutions

Mathew Kenny Thomas
Tensult Blogs
Published in
6 min readJun 11, 2018

This Blog has been moved from Medium to blogs.tensult.com. All the latest content will be available there. Subscribe to our newsletter to stay updated.

The main purpose of this blog is to discuss the best practices and strategies to follow while architecting solutions in Amazon Web Services(AWS) and to think cloud natively and designing and operating liable, efficient, secure and cost-effective solutions while architecting. In a traditional on-premises infrastructure the customer had to guess the infrastructure needs often without a single line of code being written. Since it was too expensive to test at scale people usually found a lot of problems when they went into production. The fact that testing was expensive also led to a fear of changing the existing architecture and resulted in a frozen architecture over time even though everything was changing. But in the cloud, the constraints that were faced by a traditional environment were removed. In the cloud customers don’t have to guess the capacity, they could test and also evolve the architectures with time.

The best practices have been identified by reviewing thousands of architectures on AWS over the years. Incorporating all these features allows you to build a system that is both stable and efficient. There are some general design principles to follow for good design:

  1. Stop guessing your capacity needs — unlike the traditional architecture, in the cloud you can start up with as much or as little as you want and then scale depending on your business needs.
  2. Test systems at production scale — since Amazon only charges as per your usage you can test your environment for a small cost and then decommission the resources.
  3. Automate to make architectural experimentation easier — through automation you can create and replicate your systems without the added expense of manual labour. You can always track the changes and revert to your previous state if needed.
  4. Allow for evolutionary architectures — since technology keeps changing it would be best if your system could also evolve making use of the new technology that is available to you. By making use of automation and testing your systems at production you will be able to evolve your systems in the cloud.
  5. Improve through game days — test how good your architecture is by scheduling game days to simulate events. This will better prepare you to handle your systems in case something happens and you’ll be able to develop insights as to how your system can be improved.

While architecting solutions on AWS, you have to keep in mind the five key pillars/foundations, which are:

  1. Operational Excellence — the ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures. You need to understand the business and customer needs and decide what your operational priorities are, need to know how to operate your workload and have a continuous process of improvement for your workload and your operations.
  2. Security — focuses on the ability to protect information, systems, and assets while delivering business value through risk management and mitigation strategies.
  3. Reliability — ability of a system to recover from infrastructure or services failures, dynamically acquire computing resources to meet demand and minimise disruptions. You can read more about it here.
  4. Performance Efficiency — ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand and technologies evolve.
  5. Cost Optimisation — ability to avoid or eliminate unneeded cost.

After an architecture is designed, it has to be reviewed to be in line with the best practices. Most of the time, it is when a review is done at least once that teams come to truly understand what they have implemented. The review process should be in a consistent manner, blame-free so that it would encourage deep diving and it should be kept to a few hours. The net outcome of the review process should be to ensure that the customer requirements are met. Each person building the architecture should take responsibility and ownership of the solution and review it themselves to improve up on it. Reviews should be done multiple times before the product is deployed and to ensure that there aren’t any irreversible choices in our architecture.

Over time the initial architecture you developed might have evolved into something new and it is always important to make sure you follow a practice to ensure that the architectural characteristics have not degraded. After each review there should be a list of issues you were able to discover through the review process and prioritise them according to the business needs.

Security

AWS customers are responsible for protecting the confidentiality, integrity, and availability of their data in the cloud, and for meeting specific business requirements for information protection. AWS provides secure infrastructure and services, while you as the customer, are responsible for secure operating systems, platforms, and data. The security concept broadly covers the following aspects:

  1. Who can do what with the help of Identity and Access Management (IAM). Ensures that only authenticated and authorised users are given access to the resources and defines the scope of what they can do with it. To get a better and deep understanding about IAM you can visit this blog.
  2. Detecting security events with Detective controls. Used to detect or identify a security event. For example GuardDuty using CloudTrail.
  3. Protecting systems with infrastructure protection. Consists of control methodologies to meet best practices and industry or legal obligations. AWS Config and AWS Service Catalog in this area.
  4. Confidentiality and integrity of data with Data Protection. Using controls and patterns to keep your data confidential while preserving integrity and availability. You can read more about how AWS Config helps with infrastructure integrity here.
  5. Responding to security events with Incident response. Defines the security processes that need to be in place, to respond to and mitigate the potential impact of a security incidence. AWS GuardDuty can help with this.

In the cloud you can trigger code to respond to an event or a combination of events instead of relying on someone to respond. You can use access controls to define the access given to each person and focus on security events and automation so as to make it automatic, error free and scalable. The AWS service that is essential to the security pillar is the AWS Identity and Access Management (IAM) which allows you to securely define the access for each user.

The IAM service is a component of the AWS secure global infrastructure with which you can centrally manage users, their security credentials and permission policies that determine the AWS services and resources they can access. IAM allows you to create individual users in your AWS account and give them their own usernames and access keys. Access keys allows users to access resources through CLI, SDK’s or API calls. All actions performed by the IAM users are billed to the root account you have created. IAM groups are a collection of IAM users in one AWS account which are grouped by functional, organisational, geographic, project or any other basis users need to be grouped by where they need access to similar resources.

In some cases the AWS Premium Support plans include access to the Trusted Advisor tool, which offers a one-view snapshot of your service and helps you identify common security misconfigurations.

Best practices for network security in the cloud include:

  1. Using security groups whenever possible.
  2. Deploying in private cloud if possible.
  3. Adding Network ACLs (Access control list) to security groups as they provide fast and efficient controls.
  4. Use IPSec or AWS Direct Connect for trusted connection to other sites. Using VGW (Virtual Gateway) when a VPC based resource requires remote connectivity.
  5. Protecting data in transit to ensure the integrity and confidentiality as well as the identity of the communicating parties.
  6. Enabling VPC flow logs.
  7. For large scale deployments, design network security for each of the different layers .

When a user executes a program on a Linux or Windows system the executable program assumes the privilege of the user that launched it and code can carry out actions that the user who launched it has permissions for. If you execute an untrusted code on your system, it might corrupt the system. The system on which the code was executed can no longer be trusted as it might change parts of the operating system, install a rootkit or establish back doors for accessing the system. It is recommended to reinstall all the systems, platforms and application executables from a trusted source and restore data only from backup.

The best practices have been developed to help cloud architects build a secure, high-performing and efficient infrastructure for their applications. It is a consistent approach for customers and partners to evaluate and make informed decision about their architecture, understand the risks in the architecture and solve it before it is put into production. The whole purpose of cloud best practices is to make customers and partners think cloud natively and use the benefits of automation in cloud to make their work much easier than any other architecture.

--

--