Securing and Hardening an Amazon EKS Cluster

Published in

The Emburse Tech Blog

6 min readAug 3, 2023

Kubernetes has revolutionized application deployment and management, but running production clusters securely requires diligent effort. In this post, I’ll share best practices for hardening Amazon EKS based on my experience as a cloud architect. A hardened cluster goes a long way in preventing breaches and outages.

Tightly Control Access

To start, focus on access controls like IAM and security groups. For IAM, create a custom role for your worker nodes with precisely the permissions required and nothing more.

Here is an example of the minimum IAM policies needed for an EKS worker node role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "eks:DescribeCluster"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "ec2:DescribeInstances",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "ec2:DescribeSecurityGroups",
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeSubnets",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribeRouteTables" 
      ],
      "Resource": "*"
    }
  ]
}

This includes:

eks:DescribeCluster — to connect to the cluster control plane
ec2:Describe* — minimum permissions to describe VPC resources

You would also need to allow the specific EKS cluster name in the “Resource” for eks:DescribeCluster.

This is a good starting point for a minimum policy. Review EKS documentation and lock down further based on your architecture. The key is to avoid overly permissive policies that grant more access than required.

For security groups, segment traffic so only necessary communication is allowed between VPC subnets and security groups for the control plane endpoint, worker node groups, and external resources accessed by your applications. For example, worker nodes do not need direct public internet access in most cases. Lock down ingress and egress to just what is essential.

Implement Robust Logging

Make observability a priority as well. Kubernetes audit logging provides a forensic record of all control plane requests by users, accounts, pods, and more. This creates accountability and supports incident investigation. Enable audit logging and stream events to CloudWatch Logs for secure long-term retention and analysis.

Don’t forget container stdout/stderr logs either for application troubleshooting and monitoring. Use a daemonset to deploy a log collector like Fluentd on each node to aggregate logs centrally. Ship to CloudWatch or a tool like Coralogix for processing and querying.

Encrypt Everything Sensitive

Encryption is critical for protecting sensitive artifacts like secret manifests containing API keys, passwords, and other credentials. These secrets can easily be exposed if accidentally committed to source control. Use AWS Key Management Service (KMS) to encrypt secret data, rotating keys periodically.

EKS also offers an integrated secrets encryption provider that transparently encrypts Secret objects before storing them in the etcd key-value database. This envelope encryption provides an additional layer of defense in depth.

Scan Images and Fix Vulnerabilities

Automating vulnerability management via CI/CD integration is imperative. Use tools like Trivy, Anchore, or Amazon Inspector to scan Docker images for CVEs during build. Fail pipelines if any critical or high severity vulnerabilities are found until they can be patched. Rebuilding images frequently with the latest security patches reduces your exposure window.

For base images, consider building your own hardened Debian, Alpine, or Amazon Linux base images.

Here is an example python base image:

FROM amazonlinux:2022 AS base
LABEL Maintainer="Robert Kozak <robert.kozak@emburse.com>"

ENV PYENV_ROOT="/opt/pyenv"
ENV PATH="${PYENV_ROOT}/shims:${PYENV_ROOT}/bin:$PATH"

# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8

COPY . .

# runtime dependencies
RUN set -eux && \
    yum install -y git && \
    rm -rf /var/lib/apt/lists/*_dists_*

RUN set -eux; \
    curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash; \
    git clone https://github.com/momo-lab/xxenv-latest \
        ${PYENV_ROOT}/plugins/xxenv-latest; \
    pyenv update

# > =============================================================== <

FROM base AS builder

# runtime dependencies
RUN set -eux; \
    yum groupinstall "Development Tools" -y && \
    yum install -y \
        gcc \
        make \
        perl-core \
        zlib-devel \
        bzip2 \
        bzip2-devel \
        readline-devel \
        sqlite \
        sqlite-devel \
        wget \
        tk-devel \
        libffi-devel \
        xz-devel \
        openssl-devel \
    ; \
    rm -rf /var/lib/apt/lists/*_dists_*

# > =============================================================== <

FROM builder AS build-all

ARG PYENV_VERSIONS="3.11.2 3.10.10 3.9.16 3.8.16 3.7.16 3.6.15"

SHELL ["/bin/bash", "-c"]

RUN set -eux; \
    for version in ${PYENV_VERSIONS}; do \
            pyenv install ${version}; \
    done;  \
    pyenv global $(pyenv versions --bare | tac); \
    pyenv versions; \
    find ${PYENV_ROOT}/versions -depth \
        \( \
           \( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
            -o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' -o -name '*.a' \) \) \
        \) -exec rm -rf '{}' +

# > =============================================================== <

FROM base

ARG USER_UID="1000"
ARG USER_GID="1000"
ARG USER_NAME="python"

COPY --from=base use /usr/local/bin/use
COPY --from=build-all ${PYENV_ROOT}/versions/ ${PYENV_ROOT}/versions/

RUN groupadd -g $USER_GID $USER_NAME && \
        useradd -m -s /bin/bash -g $USER_GID -u $USER_UID $USER_NAME

RUN chown -R $USER_UID:$USER_GID /opt/pyenv

USER $USER_UID:$USER_GID

RUN set -eux; \
    pyenv rehash; \
    pyenv global 3.9.16

WORKDIR /home/python

Maintaining your own optimized base images allows enforcing security standards.

Monitor and Enforce at Runtime

Kubernetes opens up many controls to pods by default for flexibility. Realize runtime security by implementing controls like:

Falco or Sysdig Falco to monitor kernel and system calls for anomalous activity indicating potential threats
Open Policy Agent (OPA) to enforce custom admission policies when creating resources
Pod Security Policies to restrict privileged access, host mounting, port binding, etc.
Run pods with non-root users and appropriate group memberships for least privilege

Lock Down Network Traffic

Finally, use Kubernetes NetworkPolicies to restrict communications between pods based on namespaces, labels, IP addresses, and ports. This limits potential lateral movement if a pod is compromised. Adopt a “default deny” approach to limit risk from overly permissive policies.

The CNCF landscape provides excellent open source security tooling like OPA that complement the hardening techniques discussed above.

Enforce Security Policies with Open Policy Agent

Open Policy Agent (OPA) is an open source, general-purpose policy engine that unifies policy enforcement across your stack (Getting Started with OPA .) With OPA, you can define admission control policies to validate Kubernetes resources on create, update, delete. For example:

Require pods to run as non-root
Prevent use of privileged containers
Match required labels are present
Limit use of certain volumes or volume mounts
Validate annotation values
Check container resource requests and limits

OPA gives you a lot of flexibility to enforce custom policies beyond what tools like Pod Security Policies offer. Policies can be written in Rego language.

To integrate OPA with EKS:

Deploy OPA as a sidecar container in pods across your cluster
Configure OPA sidecar to listen on localhost for policy decisions
Register OPA with your Kubernetes API server for admission webhook
Define Rego policies and load into OPA sidecars

Now when the API server receives a request to create or modify a resource, it will call the OPA sidecar for admission approval based on your policies. Requests are denied if policies fail.

Benefits of OPA:

Unified policy engine for flexible policy definition
Decouples policy from Kubernetes for portability
Enables default deny and whitelist approaches
Allows integration with external data sources
Can watch config maps and secrets for auto policy reloads

Here is an example OPA policy that prevents creating pods with the ability to run as root:

# Deny pods with securityContext.runAsNonRoot: false

package kubernetes.admission

import data.kubernetes.pods

deny[msg] {
  pods.review.object.spec.securityContext.runAsNonRoot == false
  msg := "Pod cannot disable runAsNonRoot"

This policy checks if the Pod spec defines securityContext.runAsNonRoot as false. If so, it will deny the create request and return a denial message.

To integrate this with OPA:

Save the policy as a .rego file (e.g. require-nonroot.rego)
Load the policy into OPA using the REST API /v1/policies endpoint
Configure the admission webhook to call the OPA sidecar for policy decisions

Now when the API server receives a Pod create request, OPA will evaluate the policy against the Pod spec. If runAsNonRoot is false, the create will be denied with the provided message.

In conclusion

Securing Kubernetes is an essential responsibility when running production workloads. While EKS provides a scalable foundation, hardening your cluster requires proactive effort. Use strong access controls, encryption, vulnerability management, and runtime monitoring to implement defense in depth. Leverage Kubernetes native security features like pod security policies and network policies to adhere to least privilege principles. And consider powerful tools like OPA to enforce custom policies across your stack. With vigilance and a layered approach, you can benefit from the automation of EKS while protecting your applications and data. Kubernetes empowers developers to ship code quickly but that velocity must be tempered with diligent security measures. As cloud architects and stewards of our organizations’ infrastructure, we must guide our users towards securely embracing the promise of containers.