Introducing Aardvark and Repokid
AWS Least Privilege for Distributed, High-Velocity Development
by Jason Chan, Patrick Kelley, and Travis McPeak
Today we are pleased to announce two new open-source cloud security tools from Netflix: Aardvark and Repokid. Used together, these tools are the next logical step in our goal to run a secure, large scale Amazon Web Services (AWS) deployment that accommodates rapid innovation and distributed, high-velocity development. When used together, Aardvark and Repokid help us get closer to the principle of least privilege without sacrificing speed or introducing heavy process. In this blog post we’ll describe the basic problem and why we need tools to solve it, introduce new tools that we’ve developed to tackle the problem, and discuss future improvements to blend the tools seamlessly into our continual operations.
IAM Permissions — Inside the Cockpit
AWS Identity and Access Management (IAM) is a powerful service that allows you to securely configure access to AWS cloud resources. With over 2,500 permissions and counting, IAM gives users fine-grained control over which actions can be performed on a given resource in AWS. However, this level of control introduces complexity, which can make it more difficult for developers. Rather than focusing on getting their application to run correctly they have to switch context to work on knowing the exact AWS permissions the system needs. If they don’t grant necessary permissions, the application will fail. Overly permissive deployments reduce the chances of an application mysteriously breaking, but create unnecessary risk and provide attackers with a large foothold from which they may further penetrate a cloud environment.
Rightsizing Permissions — Autopilot for IAM
In an ideal world every application would be deployed with the exact permissions required. In practice, however, the effort required to determine the precise permissions required for each application in a complicated production environment is prohibitively expensive and doesn’t scale. At Netflix we’ve adopted an approach that we believe balances developer freedom and velocity and security best-practices: access profiling and automated and ongoing right-sizing. We allow developers to deploy their applications with a basic set of permissions and then use profiling data to remove permissions that are demonstrably not used. By continually re-examining our environment and removing unused permissions, our environment converges to least privilege over time.
AWS provides a service named Access Advisor that shows all of the various AWS services that the policies of an IAM Role permit access to and when (if at all) they were last accessed. Today Access Advisor data is only available in the console, so we created Aardvark to make it easy to retrieve at scale. Aardvark uses PhantomJS to log into the AWS console and retrieve Access Advisor data for all of the IAM Roles in an account. Aardvark stores the latest Access Advisor data in a database and exposes a RESTful API. Aardvark supports threading to retrieve data for multiple accounts simultaneously, and in practice refreshes data for our environment daily in less than 20 minutes.
Repokid uses the data about services used (or not) by a role to remove permissions that a role doesn’t need. It does so by keeping a DynamoDB table with data about each role that it has seen including: policies, count of permissions (total and unused), whether a role is eligible for repo or if it is filtered, and when it was last repoed (“repo” is shortened from repossess — our verb for the act of taking back unused permissions). Filters can be used to exclude a role from repoing if, for example, if it is too young to have been accurately profiled or it is on a user-defined blacklist.
Once a role has been sufficiently profiled, Repokid’s repo feature revises inline policies attached to a role to exclude unused permissions. Repokid also maintains a cache of previous policy versions in case a role needs to be restored to a previous state. The repo feature can be applied to a single role, but is more commonly used to target every eligible role in an account.
Currently Repokid uses Access Advisor data (via Aardvark) to make decisions about which services can be removed. Access Advisor data only applies to a service as a whole, so we can’t see which specific service permissions are used. We are planning to extend Repokid profiling by augmenting Access Advisor with CloudTrail. By using CloudTrail data, we can remove individual unused permissions within services that are otherwise required.
We’re also working on using Repokid data to discover permissions which are frequently removed so that we can deploy more restrictive default roles.
Finally, In its current state Repokid keeps basic stats about the total permissions each role has over time, but we will continue to refine metrics and record keeping capabilities.
Extending our Security Automation Toolkit
At Netflix, a core philosophy of the Cloud Security team is the belief that our tools should enable developers to build and operate secure systems as easily as possible. In the past we’ve released tools such as Lemur to make it easy to request and manage SSL certificates, Security Monkey to raise awareness and visibility of common AWS security misconfigurations, Scumblr to discover and manage software security issues, and Stethoscope to assess security across all of a user’s devices. By using these tools, developers are more productive because they can worry less about security details, and our environment becomes more secure because the tools prevent common misconfigurations. With Repokid and Aardvark we are now extending this philosophy and approach to cover IAM roles and permissions.
Stay in touch!
At Netflix we are currently using both of these tools internally to keep role permissions tightened in our environment. We’d love to see how other organizations use these tools and look forward to collaborating on further development.