Cloudy with a Chance of Zero Trust
How to Secure Identity and Credential Access Management (ICAM) in a Multi-Cloud Environment with a Zero Trust Network Architecture (ZTNA)
The US Government/Biden administration issued Executive Order 14028 on Improving the Nation’s Cyber Security, giving companies and developers the need to rethink how to secure their applications and infrastructure.
The Federal Government’s cybersecurity recommendations involve applying the fundamental principles of Zero Trust and prioritizing adoption and use of cloud technology, where applicable. The Zero Trust security model eliminates implicit trust in any one element, node, or service and instead requires continuous verification of the operational picture via real-time information from multiple sources to determine access and other system responses.
This article examines how the world’s largest and most secure organizations solve for “ICAM” as described by the National Industry of Standards and Technology (NIST). I have the privilege of working with and leading a team in NIST’s Zero-Trust Multi-Cloud Working Group. I am writing this short blog to provoke thoughtful discussions around ICAM in a multi-cloud Zero Trust Network Architecture (ZTNA) context.
Universal Principles of Zero Trust
Before we go any further, let’s talk about Zero Trust. For this article, we will call it Zero Trust-ish. I believe you should trust nothing and authenticate and authorize everything. Since Zero Trust is a trust nothing, authenticate everything posture, adopting a Zero Trust approach will effectively touch every aspect of an operational environment, necessitating a fundamental shift in architecture (not an off-the-shelf product). It is errant to say you have Zero Trust or a complete Zero Trust solution, so we will not. There are Zero Trust approaches strategies, and it is up to each organization to embark on that journey the best way possible. If you want to learn more and dive deep, keep reading. Below are the universal principles of Zero Trust accepted as emerging best practices.
- Identity Driven
- Mutually Authenticated
- Time Bound
- Encrypted at Rest/Encrypted in Transit
- Audited & Logged
NIST defines identity as “the set of physical and behavioral characteristics by which an individual is uniquely recognizable.” Today we live in a hybrid/multi-platform world. The benefits and advantages of this dynamic environment are tremendous and conversely the complexity and security challenges can be staggering. The principle of identity is critical for successful adoption of Zero Trust. Identity-driven security means that global entities (a device/machine, VM, container, application, user, etc.) regardless of platform (agnostic) are uniquely established (given names, not just IP addresses) and recognized apart from traditional operational dependencies (like machine location, network segments, etc.).
After decades of traditional enterprise IT approaches to networks and access methods, the challenge is shifting from just the hybrid/multi-cloud to emerging best practices like how to maintain ZTNA. Based on the executive order, these Zero Trust efforts can be very complex and must be architected and planned using emerging best practices, which continue to develop. This makes it critical to establish a platform-agnostic approach. For example, a service mesh centering on a single control plane based on a specific runtime, vendor, or platform is a risky bet, giving you a lower degree of future-proofing. The reality (especially at the Federal Government) is that most systems are not on a single runtime/platform. This requirements gap has been identified and is critical to solving for ICAM.
Identity is the critical principle of Zero Trust and key for enabling the authentication and authorization of non-person entities (NPE). This is called machine access or machine-to-machine access. When accessing a machine system, we must tie the identity to the credentials or secrets globally. We can define a secret/credential as anything that gives us access to a system or allows us to authenticate or authorize ourselves.
With machine authentication (authN) and authorization (authZ) comes the ICAM discussion of how to obtain secrets/credentials using methods that follow the universal principles of Zero Trust. Secrets/credentials should be encrypted, time-bound, and able to work with a global identity that can be coherently audited and logged. Once you’ve adopted a shared global identity capable of authN and authZ to NPEs, and you are capable of managing your secrets/credentials using those best practices I mentioned, you have begun establishing a baseline ZTNA.
If you’re not convinced, let’s take a typical example. Using credentials for Kubernetes as the control plane, either for identity or for managing credentials, leaves a larger surface area to attack, is hard to maintain, and does not follow the principles we agreed on. For example, you have secrets/credentials on that Kubernetes API server managed by Kubernetes secrets… so you’re good. It is so easy, and the API is fantastic! However, it is not encrypted. There are many types of Secrets in Kubernetes.
Especially for Zero-Trust you must be able to map any identity platform to a global identity capable of dynamically managing secrets/credentials. The Kubernetes, JWT and Cloud Service Provider IAM authentication methods are not the only way to authenticate in a multi-cloud world. Just like there are a multitude of authentication methods there are even more secrets. Right? With ICAM, the credentials you access must be global and follow the universal principles required for Zero-Trust.
NIST defines mutually authenticated as “the process of both entities involved in a transaction verifying each other.” This bidirectional authentication is common and essential in secure service networking and is typically through mutual transport layer security or mTLS. For example, SP 800–204C states, “service mesh software consists of two main components: the control plane and the data plane.” The data plane is where mTLS or mutual authentication occurs using proxies, also known as side-cars. Currently, there is no mention of a service mesh for multi-platform support. I would propose that the above mentioned Special Publication (SP) should focus on an agnostic approach to both clouds and runtimes. With the looming executive order, identity-driven workflows are critical for successful adoption of Zero Trust. HashiCorp Vault and Consul Enterprise are practical and effective solutions that address and enable platform agnostic identity driven workflows that have been implemented successfully.
NIST defines authorization as “the right or a permission granted to a system entity to access a system resource.” This is great yet problematic because access to system resources is rarely singular. For example, here are a few common ways to obtain authorization through access control in a multi-cloud world: AWS Identity and Access Management (IAM), Identity and Access Management (IAM), Azure Active Directory (Azure AD), Azure Active Directory External Identities, Azure Active Directory Domain Services, Identity and Access Management (IAM). We have not even discussed the hybrid/multi-platform added complexity. With a list this size, the need for a specially skilled workforce increases across all these disparate platforms, as does operational complexity. Since access control is tied to authorization and authentication, it is critical to emphasize the importance of adopting a global identity to enable adoption of Zero-Trust and to significantly reduce the burden on operations and the workforce.
NIST Special Publication 800–63B goes into depth on digital identity guidelines and authentication and lifecycle management. It also makes it abundantly clear a global identity is needed for the machines and humans to perform lifecycle management of digital identity adequately. One of the major factors to address for ZTNA in a hybrid/multi-cloud/multi-platform environment is the factor of time to live (TTL) for secrets/credentials across the span of authorization boundaries. Without centralized control, this is impossible. For example, the mTLS connection from service A to service B in the above example, within the service mesh, should be automated and time bound to allow for automatic key rotation. The service mesh following our global principles should be agnostic. This way, the time-bound lifecycle of digital identities is automated and possible for humans to audit and log.
Encrypted in Transit
One of the essential points to consider when using encryption is key rotation. This necessary process can be challenging with various authorization boundaries in a hybrid multi-cloud landscape. Based explicitly on NIST publication 800–38D. For example,
“Periodic rotation of the encryption keys is recommended, even in the absence of compromise. For AES-GCM keys, the rotation should occur before approximately 232 encryptions have been performed by a key version, following the guidelines of NIST publication 800–38D. It is recommended that operators estimate the encryption rate and use that to determine a frequency of rotation that prevents the guidance limits from being reached. For example, if one determines that the estimated rate is 40 million operations per day, then rotating a key every three months is sufficient.” (NIST Guidance)
There is a gap in many systems in providing a standard approach/method (or encryption as a service) for encryption in transit. Otherwise, who fulfills these mathematical calculations and cryptographic tasks in your organization? Vault tokenization? Vault PKI engine? Let’s hope not Stack Overflow!
Audited & Logged
A significant challenge we face in Zero Trust Architectures is logging, auditing, and monitoring across many environments. The added complexity of multiple disparate authorization boundaries, identities, runtimes, and platforms only compounds to the problem. It is vital to leverage a centralized logging and analytics platform that can ingest data in any format, any volume, any source, at scale. A centralized logging strategy correlates data from many different sources and provides a single pane of glass view across multiple environments. Examples of such platforms are Elastic and Splunk, which are included in the Continuous Diagnostics and Mitigation (CDM) program. Splunk can index data in any human-readable format (no binary), no matter the source or volume. In addition, events can be correlated via its search language and enable us to fingerprint activity across Zero Trust Architectures.
For example, a DevOps practitioner needs to deploy a multi-cloud application in AWS, Google, Oracle and Azure. They’ll need to request credentials for each cloud service provider from HashiCorp Vault but before that can happen, Vault will verify their identity. Once verified, Vault — as the identity broker — can generate just-in-time (dynamic), time bound, credentials for each CSP with principles of least privilege to allow the practitioner to perform work within scope, and nothing else. Once the credentials expire, they can no longer be used. In this example, event logs from the identity provider, Vault audit logs, AWS CloudWatch/CloudTrail events, and Azure Monitor logs may individually tell different stories. But by aggregating these disparate logs into a centralized platform, simple search commands, machine learning, and pattern matching help us tie the practitioner’s identity with the CSP credentials generated in Vault and paint a full picture of the activities across environments.
The NSA and CISA just updated the Kubernetes Hardening Guide, and it provides expert insight on key zero-trust principles in Kubernetes. I would imagine in the future guides like this will continue to be updated faster and faster as complexity across workloads increases. The constant will be the universal principles of zero-trust across hybrid/multi-cloud and different runtime environments. I am open to opinions on the best practices here based on the scope defined for this principle. Let me know in the comments if you have any suggestions for this.
I am looking forward to collaborating with NIST on this topic. Thanks to Chris Hughes for the inspiration into the topic of ICAM. Thanks to those who contributed, Tim Olson, Jay Aware, and several members from the NIST working group. Please subscribe and follow on Medium. Join our public working group (link in the comments). Find me and follow me on Linkedin if you have any feedback.