Securing and Hardening an Amazon EKS Cluster
Kubernetes has revolutionized application deployment and management, but running production clusters securely requires diligent effort. In this post, I’ll share best practices for hardening Amazon EKS based on my experience as a cloud architect. A hardened cluster goes a long way in preventing breaches and outages.
Tightly Control Access
To start, focus on access controls like IAM and security groups. For IAM, create a custom role for your worker nodes with precisely the permissions required and nothing more.
Here is an example of the minimum IAM policies needed for an EKS worker node role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"eks:DescribeCluster"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "ec2:DescribeInstances",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "ec2:DescribeSecurityGroups",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeSubnets",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeRouteTables"
],
"Resource": "*"
}
]
}
This includes:
- eks:DescribeCluster — to connect to the cluster control plane
- ec2:Describe* — minimum permissions to describe VPC resources
You would also need to allow the specific EKS cluster name in the “Resource” for eks:DescribeCluster.
This is a good starting point for a minimum policy. Review EKS documentation and lock down further based on your architecture. The key is to avoid overly permissive policies that grant more access than required.
For security groups, segment traffic so only necessary communication is allowed between VPC subnets and security groups for the control plane endpoint, worker node groups, and external resources accessed by your applications. For example, worker nodes do not need direct public internet access in most cases. Lock down ingress and egress to just what is essential.
Implement Robust Logging
Make observability a priority as well. Kubernetes audit logging provides a forensic record of all control plane requests by users, accounts, pods, and more. This creates accountability and supports incident investigation. Enable audit logging and stream events to CloudWatch Logs for secure long-term retention and analysis.
Don’t forget container stdout/stderr logs either for application troubleshooting and monitoring. Use a daemonset to deploy a log collector like Fluentd on each node to aggregate logs centrally. Ship to CloudWatch or a tool like Coralogix for processing and querying.
Encrypt Everything Sensitive
Encryption is critical for protecting sensitive artifacts like secret manifests containing API keys, passwords, and other credentials. These secrets can easily be exposed if accidentally committed to source control. Use AWS Key Management Service (KMS) to encrypt secret data, rotating keys periodically.
EKS also offers an integrated secrets encryption provider that transparently encrypts Secret objects before storing them in the etcd key-value database. This envelope encryption provides an additional layer of defense in depth.
Scan Images and Fix Vulnerabilities
Automating vulnerability management via CI/CD integration is imperative. Use tools like Trivy, Anchore, or Amazon Inspector to scan Docker images for CVEs during build. Fail pipelines if any critical or high severity vulnerabilities are found until they can be patched. Rebuilding images frequently with the latest security patches reduces your exposure window.
For base images, consider building your own hardened Debian, Alpine, or Amazon Linux base images.
Here is an example python base image:
FROM amazonlinux:2022 AS base
LABEL Maintainer="Robert Kozak <robert.kozak@emburse.com>"
ENV PYENV_ROOT="/opt/pyenv"
ENV PATH="${PYENV_ROOT}/shims:${PYENV_ROOT}/bin:$PATH"
# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8
COPY . .
# runtime dependencies
RUN set -eux && \
yum install -y git && \
rm -rf /var/lib/apt/lists/*_dists_*
RUN set -eux; \
curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash; \
git clone https://github.com/momo-lab/xxenv-latest \
${PYENV_ROOT}/plugins/xxenv-latest; \
pyenv update
# > =============================================================== <
FROM base AS builder
# runtime dependencies
RUN set -eux; \
yum groupinstall "Development Tools" -y && \
yum install -y \
gcc \
make \
perl-core \
zlib-devel \
bzip2 \
bzip2-devel \
readline-devel \
sqlite \
sqlite-devel \
wget \
tk-devel \
libffi-devel \
xz-devel \
openssl-devel \
; \
rm -rf /var/lib/apt/lists/*_dists_*
# > =============================================================== <
FROM builder AS build-all
ARG PYENV_VERSIONS="3.11.2 3.10.10 3.9.16 3.8.16 3.7.16 3.6.15"
SHELL ["/bin/bash", "-c"]
RUN set -eux; \
for version in ${PYENV_VERSIONS}; do \
pyenv install ${version}; \
done; \
pyenv global $(pyenv versions --bare | tac); \
pyenv versions; \
find ${PYENV_ROOT}/versions -depth \
\( \
\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
-o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' -o -name '*.a' \) \) \
\) -exec rm -rf '{}' +
# > =============================================================== <
FROM base
ARG USER_UID="1000"
ARG USER_GID="1000"
ARG USER_NAME="python"
COPY --from=base use /usr/local/bin/use
COPY --from=build-all ${PYENV_ROOT}/versions/ ${PYENV_ROOT}/versions/
RUN groupadd -g $USER_GID $USER_NAME && \
useradd -m -s /bin/bash -g $USER_GID -u $USER_UID $USER_NAME
RUN chown -R $USER_UID:$USER_GID /opt/pyenv
USER $USER_UID:$USER_GID
RUN set -eux; \
pyenv rehash; \
pyenv global 3.9.16
WORKDIR /home/python
Maintaining your own optimized base images allows enforcing security standards.
Monitor and Enforce at Runtime
Kubernetes opens up many controls to pods by default for flexibility. Realize runtime security by implementing controls like:
- Falco or Sysdig Falco to monitor kernel and system calls for anomalous activity indicating potential threats
- Open Policy Agent (OPA) to enforce custom admission policies when creating resources
- Pod Security Policies to restrict privileged access, host mounting, port binding, etc.
- Run pods with non-root users and appropriate group memberships for least privilege
Lock Down Network Traffic
Finally, use Kubernetes NetworkPolicies to restrict communications between pods based on namespaces, labels, IP addresses, and ports. This limits potential lateral movement if a pod is compromised. Adopt a “default deny” approach to limit risk from overly permissive policies.
The CNCF landscape provides excellent open source security tooling like OPA that complement the hardening techniques discussed above.
Enforce Security Policies with Open Policy Agent
Open Policy Agent (OPA) is an open source, general-purpose policy engine that unifies policy enforcement across your stack (Getting Started with OPA .) With OPA, you can define admission control policies to validate Kubernetes resources on create, update, delete. For example:
- Require pods to run as non-root
- Prevent use of privileged containers
- Match required labels are present
- Limit use of certain volumes or volume mounts
- Validate annotation values
- Check container resource requests and limits
OPA gives you a lot of flexibility to enforce custom policies beyond what tools like Pod Security Policies offer. Policies can be written in Rego language.
To integrate OPA with EKS:
- Deploy OPA as a sidecar container in pods across your cluster
- Configure OPA sidecar to listen on localhost for policy decisions
- Register OPA with your Kubernetes API server for admission webhook
- Define Rego policies and load into OPA sidecars
Now when the API server receives a request to create or modify a resource, it will call the OPA sidecar for admission approval based on your policies. Requests are denied if policies fail.
Benefits of OPA:
- Unified policy engine for flexible policy definition
- Decouples policy from Kubernetes for portability
- Enables default deny and whitelist approaches
- Allows integration with external data sources
- Can watch config maps and secrets for auto policy reloads
Here is an example OPA policy that prevents creating pods with the ability to run as root:
# Deny pods with securityContext.runAsNonRoot: false
package kubernetes.admission
import data.kubernetes.pods
deny[msg] {
pods.review.object.spec.securityContext.runAsNonRoot == false
msg := "Pod cannot disable runAsNonRoot"
This policy checks if the Pod spec defines securityContext.runAsNonRoot as false. If so, it will deny the create request and return a denial message.
To integrate this with OPA:
- Save the policy as a .rego file (e.g. require-nonroot.rego)
- Load the policy into OPA using the REST API /v1/policies endpoint
- Configure the admission webhook to call the OPA sidecar for policy decisions
Now when the API server receives a Pod create request, OPA will evaluate the policy against the Pod spec. If runAsNonRoot
is false, the create will be denied with the provided message.
In conclusion
Securing Kubernetes is an essential responsibility when running production workloads. While EKS provides a scalable foundation, hardening your cluster requires proactive effort. Use strong access controls, encryption, vulnerability management, and runtime monitoring to implement defense in depth. Leverage Kubernetes native security features like pod security policies and network policies to adhere to least privilege principles. And consider powerful tools like OPA to enforce custom policies across your stack. With vigilance and a layered approach, you can benefit from the automation of EKS while protecting your applications and data. Kubernetes empowers developers to ship code quickly but that velocity must be tempered with diligent security measures. As cloud architects and stewards of our organizations’ infrastructure, we must guide our users towards securely embracing the promise of containers.