AWS assume role and why you should care
- technical (large scale stateless systems with HTTP signing, grouping and keeping permissions restricted)
- business (offload error-sensitive security functionality to battle-tested infrastructure, depending on pay-on-demand instead of pay-upfront resources)
- learning (differences when systems grow, the analogy to offline work)
This article is a written version of my presentation for those who did not attend VilniusPHP (slides, video), or ŠiauliaiPHP (slides, video), or Lithuanian is not your main language. Originally posted as “How AWS handles security”
Why should I care about AWS Security?
AWS has big clients that rely on its security — so AWS security is a good production-proven example. However, the security topic in AWS is very wide and considered one of the hardest topics in the AWS landscape. Therefore, I chose to present AWS security in 3 different ways:
- from a generic introduction (what is AWS and AWS security)
- through theoretical comparison (when system grows)
- to small demonstrations with reproducible examples
Let’s start from the first part…
Introduction to AWS and the Cloud
For those who are not familiar with AWS, let’s start with the introduction. AWS stands for Amazon Web Services and is one of the cloud providers like Google Cloud, Azure, Alibaba Cloud and so on. Despite many definitions of AWS — I found this one most practical: “AWS is one of the oldest cloud providers that has been able to practically implement pay-on-demand and infrastructure as a code ideas”.
- From the technical side, in AWS almost the whole infrastructure can be managed via programmable (API) calls — opening new opportunities to manage growing infrastructure complexity in the same way as we do with software for code.
- From a business side, AWS has many services which can be bought not as an in-front-commitment, but rather in per-request (on-demand) basis. This opens new opportunities for businesses to experiment with a wider range of technologies or approaches. Per request payment model is ideal for minimal testable products or A/B tests. If the hypothesis about business demand had been correct — engineers could start optimizing price and infrastructure (or even switch to in-front payment model), during which happy customer base remains. This investment model looks reasonable for highly competitive markets.
Many hosting providers try to name themselves as a Cloud provider. Personally, I think the name “Cloud” is not about selling cheaper or more. It is more about the thinking model. It is more about innovating at the same speed as infrastructure provider.
When we know what is AWS, let’s dig more into the Security part…
Introduction to security part in AWS
So it is no surprise security part is also very complex. AWS covers many areas of security:
- network (E.g. firewalls, encryption in transfer)
- storage (E.g. encryption at rest, backups)
- and many more as described in AWS Well-Architected framework
But from a software developer perspective, most useful is application-level security. It is based on Identity and Access Management (IAM). Explaining IAM in detail could give you cognitive overhead, so start with a simplified 3-parts system: Who → What → Where
With variations of who can do what and where: software developer could create many valuable security components offloading error-sensitive and complex system parts. Now try to replace simple words with AWS terminology:
- who = principal
- what = action
- where = resource
And you can already understand most of AWS security examples (the “policy” part) and are ready to dig deeper into the architecture behind AWS security.
When we have a basic idea of AWS Security, let’s understand concepts by comparison…
Different views of security (when the system grows)
Even if you are not intending to use AWS — well-proven infrastructure could be a good example of different implementation approaches. A mouse is not just a smaller version of an elephant — so it is useful to compare examples of a different scale systems.
AWS Security infrastructure is a good example of a very big and complex system. It will differ a lot from the simple Symfony (PHP) based website.
Let’s compare how security is applied by 3 different aspects:
- Location (Monolithic vs distributed)
- Human factor (Traditional vs Cloud native)
- Grouping (Hierarchical vs graph-based)
Let’s start with the first one…
1. Monolithic vs distributed
Let’s analyze the Website written with the Symfony framework. Whole security logic is placed in one deployable unit: to make a final decision (allow access or not) we only need a single server. Therefore, I am using the term “monolithic” to talk about the security aspect (not how many copies of PHP code is deployed). Internally the decision can be split into multiple levels:
- parse who is a user
- what roles it has
- (via voters) what actions on which resources it has permissions
If we take another example: AWS Cloud formation (a tool to make infrastructure as a code) — security decision would be distributed.
For example, if we would have a Cloud formation template with 2 S3 buckets (think like FTP servers) and 1 Lambda function (think like a micro server) — for each resource calls would be asynchronous. Therefore during Cloud formation update, you could get security errors not from the Cloud formation tool itself — but from dependent services (S3 or Lambda in this example). This is because under the hood AWS uses HTTP header signing to call each other service and make decisions there.
ideal for large systems because it is stateless
The distributed approach is harder to implement, but in a complex and constantly changing environment like AWS — having all security decisions put into the single application would be a blocker for growth and innovation.
Now we are ready to go to another perspective…
2. Traditional vs Cloud native
When developing a traditional web shop — multiple environments (e.g. testing, staging, and production) are created once and updated rarely. Therefore it is tempting to just create a working state and (hopefully) document somewhere. Documentation is primarily used by humans because with new requirements it is faster to just update some rules and it works. To track all the requirements — peer code review is a common practice. It works well for small to medium teams, or when teams are split by system components.
For example, if we take GDPR requirement to protect sensitive data (at least encrypt storage) — it is usually forced by some ticket system and code reviews.
Viewing from Cloud-native perspective — the main benefit is to make the application in a way — so it could be managed by other applications (not humans). Therefore security requirements and law policies need to be written in a computer-readable format.
AWS uses those formats, for example:
- JSON to represent IAM policy (defining security rules)
- APIs or CLI (defining the current state of infrastructure)
- Cloud Formation templates (defining description of an infrastructure to be created)
Manual checks of 100+ services would be physically impossible to do by humans only — so investment in common language and tools makes security topics easier to compose, check and react to.
If we take the same GDPR example (protect sensitive data via encryption) from the Cloud-native perspective — it could be done via AWS Organizational level policy. For example, by preventing the creation of S3 buckets (think it like an FTP server) without enabled encryption. Moreover, using AWS Cloud Trail — access could be analyzed and acted upon also using only computer scripts (no human intervention).
Both code review and automation approaches can ensure security requirements but are most powerful on different levels.
And only the final perspective remains…
3. Hierarchical vs graph-based
In a traditional Web application, the most common functionality is logged-in users. Despite complex user hierarchies — the permission model remains the same: if logged in with one user — it will remain with the same permissions until logout.
And this is the same in the real world — the higher-level manager you are, the bigger impact on the company you can have. But what happens when someone gets sick or goes out for a holiday — you have to temporarily assume someone else’s role.
For complex Cloud environments, a single hierarchy was not enough. So AWS introduced the concept of assume-role. A role is a group of policies (permissions to do some action on a resource). In AWS you can configure Policy to allow one Role to temporary become another role (sometimes with a greater power, than the previous one).
This helps to keep permissions as restricted as possible and add extension points (allow sts:AssumeRole action) with specific conditions. Resulting in an easier Audit log and many small self-explanatory roles.
The most important part is finished. Are you still missing something? Here is some code…
Understanding security by an example
During the presentation, I showed 2 demos (setup instructions included):
- Upload files directly from frontend: https://gist.github.com/aurelijusb/527c07e0f47b6dcbd1bdca27d265ac72
- Automation without root: https://gist.github.com/aurelijusbanelis/c29dc37e50fc95f5ecec47ea7ac6b69a
Those are good examples of what AWS IAM could do if you want to try it yourself instead of depending only on theory.
And that is it…
References and further reading
- AWS Best practices: https://aws.amazon.com/architecture/well-architected/
- Under the hood by AWS itself: https://aws.amazon.com/about-aws/whats-new/2019/12/introducing-amazon-builders-library/
- Summaries as illustrations: https://www.awsgeek.com/
- Community managed resources: https://github.com/open-guides/og-aws#security-and-iam
- Thinking about the Cloud: from application perspective: http://shop.oreilly.com/product/0636920072768.do
- Thinking about the Cloud: from infrastructure tools perspective: http://shop.oreilly.com/product/0636920075837.do
Conclusion: for complex problems, we need a wider perspective
We walked through, how Amazon Web Service security works. Trying to understand different and complex concepts may be frustrating. However, from the view of self-growth: every conflict becomes uplifting. It gives a new way to think about the problem, especially when traditional approaches do not work anymore.